Database Joins

As somewhat of a follow-up to my previous post, Introduction to Developing a Relational Database, I’ll touch on the exciting subject of joining tables together. This should help tie in some of the “relational” aspects of the relational database.

Let’s start off with a quick example using the blog I used in the previous article. In this example, you need to display a list of five of the most recent comments on the home page. Each comment will need to link to the post on which the comment was made.

First we need to grab the most recent list of comments:

SELECT postid, title
FROM comments ORDER BY createdate DESC LIMIT 5

MS SQL Server and others use the TOP keyword to pull out a limited recordset.

SELECT TOP 5 postid, title 
FROM comments ORDER BY createdate DESC

The comment title was grabbed for display on the page but we also use the postid to link to that particular post. Some blogs link to id, my site being one of them, but what if we needed the post title, too? You could do two queries. The first pulls out the postid on the comment and then a call is made for each record, using the postid to pull out the post.

SELECT title FROM posts WHERE postid = $postid 

That’s a lot of back and forth. You would end up making 6 calls to the database just to display the recent comments on the home page. A better way to do this is to join the content together into one call:

SELECT postid, posts.title, comments.title 
FROM comments INNER JOIN posts ON posts.postid = comments.postid LIMIT 5

... or alternatively ...

SELECT postid, posts.title, comments.title 
FROM comments, posts WHERE posts.postid = comments.postid LIMIT 5

You’ve now taken 6 calls to the database and turned it into 1. Both of these statements do exactly the same thing but I actually prefer to use the INNER JOIN syntax. As the complexity of your database increases so does the complexity of your calls. It wouldn’t be unusual to start to see three, four or more tables become joined together. When that happens, the INNER JOIN syntax just makes things clearer.

I look at an SQL query as pulling from one table with each additional JOIN merely tacking on additional information to that one call.

INNER JOIN

So what exactly is an INNER JOIN? An INNER JOIN will pull rows only if there are results from both tables that match the results. If there is a many-to-one relationship between the two tables then you will get duplicated data from the “one” table. In the case of our blog, one blog post will have many comments.

SELECT postid, posts.title, commentid comments.title 
FROM posts INNER JOIN comments ON posts.postid = comments.postid

This query would return a result set similar to this:

postid posts.title commentid comments.title
1 This is my title 7 This article is SO good.
1 This is my title 9 I know what you mean!

If there is no match then the row is not used. For most queries, this is exactly what we want and is by far the most popular type of join.

What other types of joins are there?

OUTER JOIN

There are three different types of outer joins: the LEFT, RIGHT and FULL. A LEFT OUTER JOIN will pull all rows from the left side of a join regardless of whether there is a match on the right side of the join. Anything missing on the right side will be replaced by NULL.

SELECT postid, posts.title, commentid comments.title 
FROM posts LEFT OUTER JOIN comments ON posts.postid = comments.postid

...would produce a result set like this...

postid posts.title commentid comments.title
1 This is my title 7 This article is SO good.
1 This is my title 9 I know what you mean!
2 I love blogging NULL NULL

A RIGHT OUTER JOIN is simply the opposite of a LEFT OUTER JOIN. Anything missing on the left side is null. As a result, I almost never use a RIGHT OUTER JOIN but leave it to a LEFT to do the trick.

A FULL OUTER JOIN returns everything from both tables regardless.

postid posts.title commentid comments.title
1 This is my title 7 This article is SO good.
1 This is my title 9 I know what you mean!
2 I love blogging NULL NULL
NULL NULL 10 I’m a lost comment!

As you can see from the example table, if properly developed and maintained, lost comments wouldn’t happen. In fact, I’ve never had the need for a FULL OUTER JOIN.

CROSS JOIN

The CROSS JOIN is a crazy join and, like the FULL, I have never had to use it. The CROSS JOIN returns a row for each combination of rows between the two tables. If you had 5 rows in each table, you’d end up with 25 rows in the result set.

Summary

When joining tables, there really are only two types of joins to consider: the INNER JOIN and the LEFT OUTER JOIN. Use the INNER JOIN when you need only the matched records between the two tables and use the LEFT OUTER JOIN when you need all the rows from one table whether or not rows in the other table exist or not.

Published August 21, 2005 · Updated September 14, 2006
Categorized as SQL
Short URL: https://snook.ca/s/399

Conversation

23 Comments · RSS feed
Henrik said on August 22, 2005

I'm currently having a look at sql like this. Two tables are easy. Now, I got these tables: "entries, writers, comments, categories". In entries the foreign keys are "writerID" and "CategoryID" the primary key is of course "EntryID".
In writers I have one interesting column: "writerID". Writers.writerID <-> Entries.writerID is as one is to many.

Alright. Third table is the comments table:
Comments table:
CommentID
[...]
EntryID (many to the one in Entries table)

For each post (i.e. for each call to the database) I want to fetch the name of the author/writer from Writers table and count the number of comments in Comments, giving me the columns in the end:
Writer (a string), Category (a string) Comments (in int), EntryID (an int), Entry (as PM/BLOB/TEXT = string), Date (simple: from the Entries table).

I also only want the top five records ordered by date.

So something like:
"SELECT TOP 5 Entries.EntryID, Entries.CategoryID, Entries.Entry, Comments.CommentID , Writers.WriterID, Writers.Name
FROM Entries, Writers, Comments LEFT OUTER JOIN Comments
ON Entries.EntryID = Comments.EntryID AND Entries.WriterID = Writers.WriterID
ORDER BY Date DESC"
but it doesn't really work :s
outer since it ought to display regardless if any comments have been made...

fabien. said on August 22, 2005

Nice follow-up Jonathan. Getting the INNER/OUTER JOIN syntax use clear in a simple way with proper examples.

Jonathan Snook said on August 22, 2005

Henrik, your code is a perfect example of why I prefer to use the INNER JOIN syntax. Each join should only link in one table at a time. It's also good to test your query each time you add something new to it.

SELECT TOP 5 Entries.EntryID, Entries.CategoryID, Entries.Entry, Comments.CommentID , Writers.WriterID, Writers.Name
FROM Entries
INNER JOIN Writers ON Entries.WriterID = Writers.WriterID
LEFT OUTER JOIN Comments ON Entries.EntryID = Comments.EntryID
ORDER BY Date DESC

My only thought about this query is that if you have five comments for the first post, the query will only return the one post. The TOP keyword limits the entire recordset and not just this particular table.

In any case, I hope this helped!

Henrik said on August 22, 2005

Thanks for your quick response Jonathan :). It seems like I've dropped my brand new mobile phone and some theif have picked it up and thrown away my SIM card, leaving me without my mobile (and quite large repayments on it). Damn...

Anyhow ^^. I'm using Access (since I can't afford MS SQL and I don't like to set up relational databases in MySQL, I find it hard...). I modified the sql u gave me a little bit (since it didn't work):

SELECT Entries.Title, Entries.EntryID,  Entries.Abstract, Entries.Date, Writers.Name AS writer,Comments.Entry AS Comment
FROM Writers
INNER JOIN (Entries LEFT OUTER JOIN Comments ON Comments.EntryID = Entries.EntryID) ON Entries.WriterID = Writers.WriterID
WHERE Entries.Featured = false
ORDER BY Entries.Date DESC;

That works alright. I'm getting the name of the author, and I also get the abstract (the lure, so to speak) for each post that's not featured (then it's on the very front page).

The problem is that it selects all comments made on a specific post and puts each comment on a new row in the resulting table along with the abstract, entryid, date and name.
I'm trying to select the number of comments made in a specific post, so instead of
Comments.Entry AS Comment
I'd like to have something along these lines:
DISTINCT COUNT(Comments.Entry) AS Comments...
You don't happen to have a great solution in store for me :)?

Ryan said on August 22, 2005

Henrik: you probably want to use nested LEFT JOINS. Take a look at my post here: Nested LEFT JOINs to link three or more tables. Hope it helps!

Ryan said on August 22, 2005

Oh, looks like you beat me to it Henrik :-)

Henrik said on August 22, 2005

*lol*
appriciate the help, really - so thanks Ryan! really close too!
If you have lots and loads of more time, I still have a little problem, described in the post above :)

Henrik said on August 22, 2005

Oh, yes. I found these schematic pictures to be really really helpful, despite my deep and honest loathing towards set theory in school ^^.

Henrik said on August 22, 2005

Aahh, the joy:

SELECT Entries.Title, Entries.EntryID, Entries.Abstract, Entries.Date, Writers.Name AS writer, COUNT(Comments.Entry) AS NbrComments
FROM Writers INNER JOIN (Entries LEFT JOIN Comments ON Comments.EntryID=Entries.EntryID) ON Entries.WriterID=Writers.WriterID
WHERE Entries.Featured=False
GROUP BY Entries.Title, Entries.EntryID, Entries.Abstract, Entries.Date, Writers.Name
ORDER BY date;
Jonathan Snook said on August 22, 2005

okay, you know what... I should have read your comment more closely. Nesting still wouldn't be the approach I would take (I've never has to nest a JOIN).

Allow me to revise my original SQL:

SELECT TOP 5 Entries.Title, Entries.EntryID, Entries.Abstract, Entries.Date, Writers.Name AS writer COUNT(Comments.CommentID) AS NumComments
FROM Entries
INNER JOIN Writers ON Entries.WriterID = Writers.WriterID
LEFT OUTER JOIN Comments ON Entries.EntryID = Comments.EntryID
GROUP BY Entries.Title, Entries.EntryID, Entries.Abstract, Entries.Date, writer
ORDER BY Date DESC

The COUNT is an aggregate function but in order for it to work you have to use GROUP BY to group the non-aggregate fields used in the result set. (aggregates may be my next topic!)

PAUL NGANGA said on August 29, 2005

what is a primary key and how does in help in joining tables

Jonathan Snook said on August 29, 2005

A primary key is a unique identifier. Most often, it's an auto-incrementing integer but could also be anything you want, as long as it's unique. Because it is unique, you store that value in another table to establish that "link". That field in the other table is called the foreign key.

Nicolas Chachereau said on August 29, 2005

Since you are using the same fieldname for the primary key and the foreign key, is there any particular reason for not using:

INNER JOIN comments USING (postid)

instead of

INNER JOIN comments ON posts.postid=comments.postid

?

Jonathan Snook said on August 29, 2005

I believe USING is a MySQL-specific feature. A quick search through the MS SQL docs indicates it's not available. Thanks for the info, Nicolas... it's good to know!

Gilberto Creque said on August 30, 2005

You are the man. That made the most sense out of everything I've ever read on JOIN in SQL. Thank you.

Eddyyanto said on December 06, 2006

Nice articles!
But it seemed you're missing commas (, ) in the code example of "INNER JOIN" and "OUTER JOIN"

Thomas Heizung said on January 07, 2007

when i generally want to start with SQL what book can you recommend to me. maybe books which are well known and possibly available in german.

Jonathan Snook said on January 07, 2007

Thomas: unfortunately I couldn't even begin to point you in the direction of any German books, and admittedly, my SQL knowledge is self-taught. My recommendation would be to figure out which database server you plan to be using on a regular basis and find a book on that. Something like MySQL, MS SQL Server, Oracle, or PostgreSQL.

sony said on March 03, 2007

hello
i am doing a project called report engine where based on queries it generates reports.........but then i want to implement joins in it but unable to do it...could u plz help me.....but tlling me ho to do it................iam doing my project in Mysql,java (eclipse 3.0),jsp(dreamweaver)i am not able to understand how to integrate will all of them..............plz help me...
thank u

Bikash Borah said on March 31, 2007

Can any body tell how to join tables placed in different database for ex :- table_A in database db1 and table_B in database db2. I would like to join table_A and table_B.

Jonathan Snook said on April 01, 2007

Bikash: normally you can join tables from different databases using dot notation. For example, "SELECT * FROM db1.table1 INNER JOIN db2.table2"

srinivas said on July 13, 2007

hello
i am doing a project called report engine where based on queries it generates reports.........but then i want to implement joins in it but unable to do it...could u plz help me.....but tlling me ho to do it................iam doing my project in Mysql,java (eclipse 3.0),jsp(dreamweaver)i am not able to understand how to integrate will all of them..............plz help me...
thank u

andreas said on September 14, 2007

In other news: This website has a really good design!

Sorry, comments are closed for this post. If you have any further questions or comments, feel free to send them to me directly.

Want to learn about scaling CSS for large projects?

I'm available for full and half-day workshops on scalable CSS architecture. I can provide on-site training for your team. Interested?
Get in touch.