Database Joins
As somewhat of a follow-up to my previous post, Introduction to Developing a Relational Database, I’ll touch on the exciting subject of joining tables together. This should help tie in some of the “relational” aspects of the relational database.
Let’s start off with a quick example using the blog I used in the previous article. In this example, you need to display a list of five of the most recent comments on the home page. Each comment will need to link to the post on which the comment was made.
First we need to grab the most recent list of comments:
SELECT postid, title
FROM comments ORDER BY createdate DESC LIMIT 5
MS SQL Server and others use the TOP keyword to pull out a limited recordset.
SELECT TOP 5 postid, title
FROM comments ORDER BY createdate DESC
The comment title was grabbed for display on the page but we also use the postid to link to that particular post. Some blogs link to id, my site being one of them, but what if we needed the post title, too? You could do two queries. The first pulls out the postid on the comment and then a call is made for each record, using the postid to pull out the post.
SELECT title FROM posts WHERE postid = $postid
That’s a lot of back and forth. You would end up making 6 calls to the database just to display the recent comments on the home page. A better way to do this is to join the content together into one call:
SELECT postid, posts.title, comments.title
FROM comments INNER JOIN posts ON posts.postid = comments.postid LIMIT 5
... or alternatively ...
SELECT postid, posts.title, comments.title
FROM comments, posts WHERE posts.postid = comments.postid LIMIT 5
You’ve now taken 6 calls to the database and turned it into 1. Both of these statements do exactly the same thing but I actually prefer to use the INNER JOIN syntax. As the complexity of your database increases so does the complexity of your calls. It wouldn’t be unusual to start to see three, four or more tables become joined together. When that happens, the INNER JOIN syntax just makes things clearer.
I look at an SQL query as pulling from one table with each additional JOIN merely tacking on additional information to that one call.
INNER JOIN
So what exactly is an INNER JOIN? An INNER JOIN will pull rows only if there are results from both tables that match the results. If there is a many-to-one relationship between the two tables then you will get duplicated data from the “one” table. In the case of our blog, one blog post will have many comments.
SELECT postid, posts.title, commentid comments.title
FROM posts INNER JOIN comments ON posts.postid = comments.postid
This query would return a result set similar to this:
postid | posts.title | commentid | comments.title |
---|---|---|---|
1 | This is my title | 7 | This article is SO good. |
1 | This is my title | 9 | I know what you mean! |
If there is no match then the row is not used. For most queries, this is exactly what we want and is by far the most popular type of join.
What other types of joins are there?
OUTER JOIN
There are three different types of outer joins: the LEFT, RIGHT and FULL. A LEFT OUTER JOIN will pull all rows from the left side of a join regardless of whether there is a match on the right side of the join. Anything missing on the right side will be replaced by NULL.
SELECT postid, posts.title, commentid comments.title
FROM posts LEFT OUTER JOIN comments ON posts.postid = comments.postid
...would produce a result set like this...
postid | posts.title | commentid | comments.title |
---|---|---|---|
1 | This is my title | 7 | This article is SO good. |
1 | This is my title | 9 | I know what you mean! |
2 | I love blogging | NULL | NULL |
A RIGHT OUTER JOIN is simply the opposite of a LEFT OUTER JOIN. Anything missing on the left side is null. As a result, I almost never use a RIGHT OUTER JOIN but leave it to a LEFT to do the trick.
A FULL OUTER JOIN returns everything from both tables regardless.
postid | posts.title | commentid | comments.title |
---|---|---|---|
1 | This is my title | 7 | This article is SO good. |
1 | This is my title | 9 | I know what you mean! |
2 | I love blogging | NULL | NULL |
NULL | NULL | 10 | I’m a lost comment! |
As you can see from the example table, if properly developed and maintained, lost comments wouldn’t happen. In fact, I’ve never had the need for a FULL OUTER JOIN.
CROSS JOIN
The CROSS JOIN is a crazy join and, like the FULL, I have never had to use it. The CROSS JOIN returns a row for each combination of rows between the two tables. If you had 5 rows in each table, you’d end up with 25 rows in the result set.
Summary
When joining tables, there really are only two types of joins to consider: the INNER JOIN and the LEFT OUTER JOIN. Use the INNER JOIN when you need only the matched records between the two tables and use the LEFT OUTER JOIN when you need all the rows from one table whether or not rows in the other table exist or not.
Conversation
I'm currently having a look at sql like this. Two tables are easy. Now, I got these tables: "entries, writers, comments, categories". In entries the foreign keys are "writerID" and "CategoryID" the primary key is of course "EntryID".
In writers I have one interesting column: "writerID". Writers.writerID <-> Entries.writerID is as one is to many.
Alright. Third table is the comments table:
Comments table:
CommentID
[...]
EntryID (many to the one in Entries table)
For each post (i.e. for each call to the database) I want to fetch the name of the author/writer from Writers table and count the number of comments in Comments, giving me the columns in the end:
Writer (a string), Category (a string) Comments (in int), EntryID (an int), Entry (as PM/BLOB/TEXT = string), Date (simple: from the Entries table).
I also only want the top five records ordered by date.
So something like:
"SELECT TOP 5 Entries.EntryID, Entries.CategoryID, Entries.Entry, Comments.CommentID , Writers.WriterID, Writers.Name
FROM Entries, Writers, Comments LEFT OUTER JOIN Comments
ON Entries.EntryID = Comments.EntryID AND Entries.WriterID = Writers.WriterID
ORDER BY Date DESC"
but it doesn't really work :s
outer since it ought to display regardless if any comments have been made...
Nice follow-up Jonathan. Getting the INNER/OUTER JOIN syntax use clear in a simple way with proper examples.
Henrik, your code is a perfect example of why I prefer to use the INNER JOIN syntax. Each join should only link in one table at a time. It's also good to test your query each time you add something new to it.
My only thought about this query is that if you have five comments for the first post, the query will only return the one post. The TOP keyword limits the entire recordset and not just this particular table.
In any case, I hope this helped!
Thanks for your quick response Jonathan :). It seems like I've dropped my brand new mobile phone and some theif have picked it up and thrown away my SIM card, leaving me without my mobile (and quite large repayments on it). Damn...
Anyhow ^^. I'm using Access (since I can't afford MS SQL and I don't like to set up relational databases in MySQL, I find it hard...). I modified the sql u gave me a little bit (since it didn't work):
That works alright. I'm getting the name of the author, and I also get the abstract (the lure, so to speak) for each post that's not featured (then it's on the very front page).
The problem is that it selects all comments made on a specific post and puts each comment on a new row in the resulting table along with the abstract, entryid, date and name.
I'm trying to select the number of comments made in a specific post, so instead of
Comments.Entry AS Comment
I'd like to have something along these lines:
DISTINCT COUNT(Comments.Entry) AS Comments
...You don't happen to have a great solution in store for me :)?
Henrik: you probably want to use nested LEFT JOINS. Take a look at my post here: Nested LEFT JOINs to link three or more tables. Hope it helps!
Oh, looks like you beat me to it Henrik :-)
*lol*
appriciate the help, really - so thanks Ryan! really close too!
If you have lots and loads of more time, I still have a little problem, described in the post above :)
Oh, yes. I found these schematic pictures to be really really helpful, despite my deep and honest loathing towards set theory in school ^^.
Aahh, the joy:
okay, you know what... I should have read your comment more closely. Nesting still wouldn't be the approach I would take (I've never has to nest a JOIN).
Allow me to revise my original SQL:
The COUNT is an aggregate function but in order for it to work you have to use GROUP BY to group the non-aggregate fields used in the result set. (aggregates may be my next topic!)
what is a primary key and how does in help in joining tables
A primary key is a unique identifier. Most often, it's an auto-incrementing integer but could also be anything you want, as long as it's unique. Because it is unique, you store that value in another table to establish that "link". That field in the other table is called the foreign key.
Since you are using the same fieldname for the primary key and the foreign key, is there any particular reason for not using:
instead of
?
I believe
USING
is a MySQL-specific feature. A quick search through the MS SQL docs indicates it's not available. Thanks for the info, Nicolas... it's good to know!You are the man. That made the most sense out of everything I've ever read on JOIN in SQL. Thank you.
Nice articles!
But it seemed you're missing commas (, ) in the code example of "INNER JOIN" and "OUTER JOIN"
when i generally want to start with SQL what book can you recommend to me. maybe books which are well known and possibly available in german.
Thomas: unfortunately I couldn't even begin to point you in the direction of any German books, and admittedly, my SQL knowledge is self-taught. My recommendation would be to figure out which database server you plan to be using on a regular basis and find a book on that. Something like MySQL, MS SQL Server, Oracle, or PostgreSQL.
hello
i am doing a project called report engine where based on queries it generates reports.........but then i want to implement joins in it but unable to do it...could u plz help me.....but tlling me ho to do it................iam doing my project in Mysql,java (eclipse 3.0),jsp(dreamweaver)i am not able to understand how to integrate will all of them..............plz help me...
thank u
Can any body tell how to join tables placed in different database for ex :- table_A in database db1 and table_B in database db2. I would like to join table_A and table_B.
Bikash: normally you can join tables from different databases using dot notation. For example, "SELECT * FROM db1.table1 INNER JOIN db2.table2"
hello
i am doing a project called report engine where based on queries it generates reports.........but then i want to implement joins in it but unable to do it...could u plz help me.....but tlling me ho to do it................iam doing my project in Mysql,java (eclipse 3.0),jsp(dreamweaver)i am not able to understand how to integrate will all of them..............plz help me...
thank u
In other news: This website has a really good design!