Suppose I have a "tags" table with two columns: tagid and contentid. Each row represents a tag assigned to a piece of content. I want a query that will give me the contentid of every piece of content which is tagged with tagids 334, 338, and 342.
The "easy" way to do this would be (pseudocode):
select contentid from tags where tagid = 334 and contentid in ( select contentid from tags where tagid = 338 and contentid in ( select contentid from tags where tagid = 342 ) )
However, my gut tells me that there's a better, faster, more extensible way to do this. For example, what if I needed to find the intersection of 12 tags? This could quickly get horrendous. Any ideas?
EDIT: Turns out that this is also covered in this excellent blog post.
WHERE tagID in (334, 338, 342)
GROUP BY contentID
HAVING COUNT(DISTINCT tagID) = 3
WHERE tagID in (...) --taglist
GROUP BY contentID
HAVING COUNT(DISTINCT tagID) = ... --tagcount
Here's a solution that has worked much faster than the for me on a very large database of objects and tags. This is an example for a three-tag intersection. It just chains many joins on the object-tag table (
objtags) to indicate the same object and stipulates the tag IDs in the
FROM objtags t0
INNER JOIN objtags t1 ON t1.objid=t0.objid
INNER JOIN objtags t2 ON t2.objid=t1.objid
I have no idea why this runs faster. It was inspired by the search code in the MusicBrainz server. Doing this in Postgres, I usually get a ~8-10x speedup over the
HAVING COUNT(...) solution.
The only alternative way i can think of is:
select a.contentid from tags a
inner join tags b on a.contentid = b.contentid and b.tagid=334
inner join tags c on a.contentid = c.contentid and c.tagid=342
I don't know if this is better but it might be more maintainable
select contentid from tags where tagid = 334
select contentid from tags where tagid = 338
select contentid from tags where tagid = 342
You'd have to build it dynamically which wouldn't be as bad as your original solution.
What type of SQL? MS SQL Server, Oracle, MySQL?
In SQL Server doesn't this equate to:
select contentid from tags where tagid IN (334,338,342)