How I Added Tagging Using PHP and MySQL

On FONTSMACK, I just enabled the ability to view fonts by tag. The tags, I suppose, are more like keywords as there is currently no user intervention to tag fonts with their own keywords. In any case, my implementation is fairly straightforward and I hope this will help others looking at implementing something similar.

The Database

The database for FONTSMACK is straightforward. All fonts are stored in a fonts table. One of those fields in that table stores the tags in a varchar field. It's a space-delimited list of words. "sans-serif futuristic computer" is a prime example. When displaying it with the font details, I just pull it out of the database and display it on the page as-is.

Outputting the list of tags

Thanks to PHP and its array functions, outputting a unique list of keywords is quite straightforward:

$types = array();
$rs = $db->select("select tags from fonts");
if($db->numRows()>0)
{
  while($ftypes = mysql_fetch_array($rs))
  {
    $types = array_merge($types, explode(" ", $ftypes[0]));
  }

  $types = array_unique_compact($types);

  for($i=0;$i<count($types);$i++)
  {
    echo '<a href="/fonts?t='.$types[$i].'">'. $types[$i].'</a>';
    if($i<count($types)-1) echo ",\n";
  }
}

The while loops through all rows in the table and explodes the tags into an array which gets appended onto a master types array. Once that is done, I strip out all duplicate values and compact the array using a custom PHP function called array_unique_compact.

function array_unique_compact($a){
  $tmparr = array_unique($a);
  $i=0;
  foreach ($tmparr as $v) {
    $newarr[$i] = $v;
    $i++;
  }
  return $newarr;
}

With my compacted and unique array, I loop through each tag in the array and output it to the page.

Searching

To pull a list of items out of the database with a specific tag is now pretty simple. Your WHERE clause in your SQL should like something like this, where $tag is the keyword that we are looking for:

..."tags LIKE '% $tag %' OR tags LIKE '$tag %' OR tags LIKE '% $tag' OR tags = '$tag'"...

In SQL, % is a wildcard character. What the first LIKE statement says is to find the keyword with a space on both sides anywhere in the string. The second says to find the keyword at the beginning of the field. Similarly, the third LIKE statement tries to find the keyword at the end of the string. Finally, we check if the tag is exacly equal to what is in the database.

Notes

What I glossed over in this example is data validation but I can't stress it enough: validate your data. It's easy to get lazy and not check the values for things like quotes that shouldn't be there (Yup, I can be lazy, too).

UPDATE: I've added a post on weighting the tag list.

Published July 11, 2005 · Updated April 30, 2006

Conversation

30 Comments · RSS feed
Korbo said on July 12, 2005

I use about the same idea but to make a cleaner SQL call I decided to add spaces (or, in my case, semi-colons) at the start and end of the tag string. Thus, the SQL is "LIKE %;$tag;%".
I therefore only need to strip the first and last element of the array when exploding... In my opinion php is better suited than SQL for this small overhead, maybe I'm wrong.
Hope this helps.

Richard@Home said on July 12, 2005

You can simplify the where clause a little:

WHERE concat(' ', tags, ' ') LIKE ' %$tag% '

A minor word of warning: any like clause that starts with a wildcard (LIKE '%...') cannot use indexes.

So if you have 10,000 rows in your table, the database will have to compare ALL 10,000 rows.

Would it be better for your tagging schema to use a many-to-many schema for the tags?

You might want to read this pretty thorough look at various tagging schema.

Jeremy said on July 12, 2005

Great job! Maybe tell us the advantages of this? I'm working with Ruby on Rails now, but tagging seems to be my next project... after my webcalendar and other mini projects...

Jonathan Snook said on July 12, 2005

Richard@Home: I had read that article when it originally came out so I didn't see the update regarding indexing issues using LIKE. Since I'm only dealing with fonts, it would be a long time until I hit 10,000 records.

All things considered, I think I'll use Korbo's suggestion (something I had thought of but disregarded as unnecessary) to minimize the number of LIKE's that are being done. Being able to use spaces is certainly an advantage because I can use the PHP function trim to remove the whitespace on either side before exploding it into an array.

Jeremy: I would normally recommend using a properly normalized database as it is more flexible in the long term. I fully intended this to be a straightforward solution and didn't need any major scalability. Simplicity was key and being able to store the tags in a single field was definitely easier.

?ime said on July 20, 2005

Can you describe the process just a little more please?

Also, it would be cool to see how've you done the weighted tags list?

Jeff Wheeler said on July 20, 2005

I'm not sure about the pros and cons of each, but I prefer to use two tables.

1. The fonts table.
2. The tags table

Inside 2, I have 3 columns. First is the record id, then the font id, then the tag. I create a separate row for each tag.

Marcello said on July 27, 2005

This would add complexity, but you can make your queries faster by using three tables. Plus, it would be easier to manage and get better search results.

1) fonts
2) tags
3) a font-tag transition table.

tags table has only two columns id and name

font-tag table has two columns
font_id
tag_id

Jeff Wheeler said on July 27, 2005

That's basically what I said, but more complex.

Jonathan Snook said on July 28, 2005

Marcello: while performance would be faster in doing a search, I'm not sure it results in better search results (unless MySQL would drop matched results). Also, considering that the site will unlikely hit more than a few thousand records, performance seems less of a factor.

Down the road, if I need to go that route, it won't be too hard to create a script that would normalize the databse for me.

?ime said on July 31, 2005

Hello Jonathan,

I managed to do almost everything, writing and reading from database, outputting tags, weighted list, but I can't do simpliest thing: *pull a list of items with a specific tag*.

Can you describe that? I did a separate file "tag.php", so, what php code I must enter in it?

Jonathan Snook said on August 02, 2005

?ime: Start with the basic SQL to pull a list of items from the database:

SELECT fieldname FROM tablename

then tack on the LIKE statements from the article as part of the WHERE clause.

SELECT fieldname FROM tablename WHERE tags LIKE '% $tag %' OR tags LIKE '$tag %' OR tags LIKE '% $tag' OR tags = '$tag'

jake olsen said on August 18, 2005

hey, thanks for the cool code. It got me by for the time being until I put all of my tags into their own table.

I had a problem with null members in the array, and at the same time decided to limit the array members to more than 2 characters, here's the small but of code I've struggled over!

...

$tags = array_merge($tags, explode(" ", $ftags[0]));
}

//remove empties

foreach($tags as $key => $value) {
if($value == "" || strlen($value) < 3) {
unset($tags[$key]);
}
}

ghani said on September 19, 2005

Thanks! This helped in a pinch. Also, I love your design and the innovative comment placement!

Amy said on October 28, 2005

Does anyone bother filtering/ translating/condensing tags? For example, Hyperactive and Hyperactivity might be condensed into one tag. Or, if someone searched on "Run" the results wouldn't include "Running" (when the where clause includes spaces around the tag).

Jonathan Snook said on October 28, 2005

Amy: Data quality is certainly a valid concern and the answer is usually, "No". Most systems that let users do the tagging don't consolidate into known groupings. To avoid these kinds of things, sites like del.icio.us try to offer suggestions so as to steer the user into providing relevant data under the same tags as other sites.

Rohit Kumar said on October 29, 2005

Very good

Asim said on November 18, 2005

good site...can n e 1 send me post a comment code?

nader said on December 04, 2005

i've used your tags-script for my brands selectable by industry.

cool, thanks again! :-)

http://www.brandinfection.com/brands/

how do you like it ?

best regards, nader

asifr said on March 21, 2006

hey Jonathan,
nice work on the tagging script, i've gotten almost everything to work properly except pulling a list of specific tags

i have a seperate file "display-tags.php" and my database has a table with columns id, title, tags; and tags are comma seperated

this is a link to the display-tags.php source: http://asifr.com/display-tags.txt

hope you can help out, thanks

said on April 07, 2006

hey, thanks for the cool code. It got me by for the time being until I put all of my tags into their own table.

I had a problem with null members in the array, and at the same time decided to limit the array members to more than 2 characters, here's the small but of code I've struggled over!

...

$tags = array_merge($tags, explode(" ", $ftags[0]));
}

//remove empties

foreach($tags as $key => $value) {
if($value == "" || strlen($value) unset($tags[$key]);
}
}

Mirza Mujahid Ali Baig said on September 12, 2006

pl. send me the solutions for using wildcard char in php
best regards
mirza

BXT said on October 23, 2006

It's similar to what I did. But I used ", " to seperate Tags. Thus I'm able to use words containig spaces as tags.

Bihi said on December 22, 2006

Really cool tutorial. Thx very much. Will use it in the next version of my site!

Danny Dawson said on January 04, 2007

It sounds like you're working with a small dataset where the difference would be negligible, but for the sake of others working with larger projects: in your array_unique_compact() function, would it be more efficient to use array_values() instead of re-keying the entire array with a foreach statement?

Jonathan Snook said on January 04, 2007

Danny: Yes, actually, array_values() would have been a much better option, even for my small example, since they're doing exactly the same thing (re-keying the array).

s. said on March 02, 2007

Hello,
Will this code work efficiently with large databases? I have a community based web site and I would like to create a tagging system. The problem is that my database will most likely have more than 10,000 records over a short period of time, so I'm concerned if the above code will be able to handle large databases.
Hopefully someone will answer, I know it's an old post.

Jonathan Snook said on March 03, 2007

@s.: it might not be practical but it does depend entirely on how you plan to use the data. It worked for me on this project as I was dealing with a smaller dataset. If you need to do some aggregation, it'd likely be best to store it in its own table.

Mark Garbers said on April 04, 2007

Thanks for the tutorial... I've been wanting to implement tagging on my site for ages, but have been unable to decide on the best method. Your method is one that I hadn't thought of yet and seems to be the best so far... thanks!

Obchodné spoločnosti said on April 10, 2007

Very interesting solution... and do you know relation tables?

Zenon said on May 31, 2007

ah non c bon

Sorry, comments are closed for this post. If you have any further questions or comments, feel free to send them to me directly.