Introduction to Developing a Relational Database

Creating database tables and fields can be a fairly straightforward process and even intuitive with a few key steps.

Technical database developers tend to use terms like entities (of which come acronyms like ERD which is an Entity Relationship Diagram). I prefer using tables and fields. A table is your entity or your object and a field is a property of that object.

When first thinking of what the tables and fields should be, it?s best to think of everything as objects. For example, you have a site where users can log in. The user would be your object. The fields (aka properties) of your user could include username, password, and e-mail address.

Let?s take a blog, as another example. A blog has articles and comments. Both an article and a comment would each be an object. ARTICLES and COMMENTS would be our tables. An article might have an author, a post date, along with the article content itself. Likewise, a comment might have an author, the author?s e-mail address, along with the comment itself. These will be the fields of our tables.

What?s a Primary or Foreign Key?

A primary key is a field that stores a unique value for each field in that table. A primary key is usually a number that automatically increments for each new record entered into the table to ensure that the field is unique. A foreign key is the primary key value stored in a secondary table and defines a relationship between the two tables. A primary key should also be indexed. Most databases automatically create an index on a field marked as the primary key. An index is a lookup system that the database server uses to find results faster.

Our user table, for example, would have a field called id. This is the primary key. If that user were to post a new article on the site, to identify that user as having written that article, the id of that user is stored in a record in the article table. This links that particular article to that particular user. The userid field in the article table is the foreign key.

Table and Field Naming Convention

Those who have come from an Oracle database development background tend to name their tables singularly. The table that stores all the users would be the USER table. My personal preference is to actually name the tables plural. In this case, it would be USERS.

I also like to include a table prefix that relates to the project. This is often a good idea as it helps resolve any naming conflicts when tables for two different applications have to reside in the same database.

When it comes to field naming, I?ve noticed that most keep it simple. Field name for our USERS table would be id, username, password, and email. In the past, I have actually leaned towards using part of the table name as part of the field name. I?d have usrid, usrname, usrpass, and usremail. This tends to come in quite handy when joining a number of tables together as I find it more obvious and can avoid having to include table names when referencing my field names. I also like that my primary and foreign key fields are the same name.

For example, let?s say we have a blog. Within the database for this blog we have a posts table and a comments table. Each table has an author field. If for some reason we needed to join these two fields together, we?d see something like this:

SELECT
posts.author, comments.author FROM posts 
    INNER JOIN comments ON posts.id = comments.postid

Be using the naming convention I describe above, you could instead do the following:

SELECT
pstauthor, cmtauthor FROM posts 
    INNER JOIN comments ON posts.pstid = comments.pstid

I have steered back to the norm these days as most frameworks like Ruby on Rails tend to require the simpler naming convention and it just so happens to be what my co-workers use (things work out better when everyone on the team uses the same style). I?ll leave it up to you to decide which is easier to use. It is important to develop a process and stick with it for the course of a project.

Normalisation. What is it?

Normalising is the process of removing any redundancies that may exist in a table. Let?s go back to our user. Now we need to store additional e-mail addresses for each user (I can?t be the only one with five e-mail addresses). While it may seem tempting to simply add on fields called email1, email2, and email3, you can probably already see how difficult this would be to scale. You might think to yourself, what user would ever have more than three e-mail addresses? But by moving e-mail addresses into a table on its own, a user can have as many e-mail address associated with his or her account as the database will hold.

We create an email table which contains our primary key, the actual e-mail address and then the foreign key for the user id to link an e-mail address to a specific user.

EMAILS
  id
  email
  userid

For an in-depth description of each of the forms of normalisation, I suggest you read the Wikipedia entry.

Sometimes, it may make sense not to normalise, or rather, to denormalise some aspect of your database. Take a one-time survey as an example. If the survey has 10 questions then the field names could be question1, question2 ? question10. Each record would store a user?s selected answers. Creating one insert statement to store all answers is easier to manage than having to do an insert for each answer. In this case, 10 different insert statements would have to be executed).

If you have any tips that you'd like to share, feel free to add them to the comments.

Published August 02, 2005 · Updated September 17, 2005
Categorized as SQL
Short URL: https://snook.ca/s/394

Conversation

15 Comments · RSS feed
JohnO said on August 03, 2005

Well.. you wouldn't have to do 10 insert statements:

INSERT INTO (email, userid) VALUES (email, id), (email2, id), (email3, id)....

and so on.

fabien. said on August 03, 2005

I am using the same kind of conventions has you, it seems.
Including a - plural - table prefix to the table name related to the project, separating tables from different applications in a same database. BLOG_users
Plus, I put this prefix uppercase separated by an underscore (for better & quicker visibility). Same as you for field naming, I keep a prefix made with the table name, uppercase too. USR_name

Jonathan Snook said on August 03, 2005

JohnO: not all database servers support that multi-insert format (I kept the article database generic) -- MySQL may be the only one (if anyone is aware of Oracle or Postgresql or SQLite support, I'd be glad to hear it)

Adam Ham said on August 03, 2005

To the best of my knowledge, Oracle 10g does not support multiple inserts with the syntax JohnO has used above.

Here is the Syntax for an Oracle 10g Insert Statement.

There is way to insert multiple values with one select statement in Oracle 10g; although it seems more like a hack. Use an insert select statement as shown below.

INSERT INTO users.table (field1, field2)
(SELECT 'test value', '1' FROM dual
UNION SELECT 'test value', '2' FROM dual
UNION SELECT 'test value', '3' FROM dual);

Jani Tarvainen said on August 05, 2005

About the naming conventions... Aliases help keep complex queries more readable, regardless of the table names.

"SELECT author AS postauthor FROM posts"

DD said on August 05, 2005

Great article John! I've even posted a link to my blog to this article because it is a great introduction to databases.

Might there be a part 2 to look forward to?

Wesley Walser said on August 05, 2005

I have a feeling that your putting up this article is in some way related to your final question on this article.

Jonathan Snook said on August 05, 2005

Wesley, you astute reader, you! Yes, it partially came out of the observation that introductory topics seem to generate more traffic. But more often than not my topics come from talking to friends or co-workers or poring over comments on previous posts. Issues that they run into when jumping into something become inspiration for stuff to write about (hey, it's a difficult task to come up with interesting stuff to write about all the time). And as DD has requested, a part two just might be in order. :)

Moises Kirsch said on August 08, 2005

On the naming conventions.... I use something similar:

1. All table names are plural
2. I create a 2 o 3 letter prefix for each table based on the name wich then is added to the column names using an underscore.
3. When using a foreign key you also add the name of the original table (that way you know where that key is coming from).
4. Camel Base names (to make it easy to read).

So if I have a table called 'Users' the columns names would be usr_id, usr_fullName, usr_email.

Then the Posts table coul have values like pst_id, pst_title, pst_body, pst_usrId_writer

This way you know exactly what each values means and where it comes from.

Iota said on August 08, 2005

You can also do joins solely in the WHERE clause, which I find more fluid. Using the blog example above, you might have a user table:

USERS
 id
 username

And a posts table:

POSTS
    id
    userid
    subject
    content

To join the two we could do the following:

SELECT u.username, p.subject, p.content
FROM users u, posts p
WHERE u.id = p.userid;

Which would leave us with a resultset showing username, subject and content for each post in the blog.
You may also notice how you can create aliases for table names to save typing the full table name over and over again in the select section of the query.

Geoff said on August 08, 2005

Jonathan, you write:
"A blog has articles and comments. Both an article and a comment would each be an object. AUTHORS and COMMENTS would be our tables."

Don't you mean ARTICLES and COMMENTS would be our tables?

Sorry if you find posts correcting your articles annoying, feel free to delete this.

Kevin Hale said on August 09, 2005

This is going to be a great article for a lot of people. Thanks for sharing.

Jonathan Snook said on August 09, 2005

Iota: I'll actually be going into some depth as to why I use the INNER JOIN syntax in a followup post. :)

Geoff: I've fixed the mistake. I love posts pointing out any errors I've made. Every little bit helps!

matthijs said on August 20, 2005

Indeed, interesting article Jonathan. I'm looking forward to the next one too ;)

Jude said on December 25, 2005

Re: Iota

The join using a list of tables (and no JOIN) is an implicit CROSS JOIN.

Snook calls these "Crazy Joins" in the next article. Could become very inefficient, I guess.

Sorry, comments are closed for this post. If you have any further questions or comments, feel free to send them to me directly.