Introduction to Developing a Relational Database
Creating database tables and fields can be a fairly straightforward process and even intuitive with a few key steps.
Technical database developers tend to use terms like entities (of which come acronyms like ERD which is an Entity Relationship Diagram). I prefer using tables and fields. A table is your entity or your object and a field is a property of that object.
When first thinking of what the tables and fields should be, it?s best to think of everything as objects. For example, you have a site where users can log in. The user would be your object. The fields (aka properties) of your user could include username, password, and e-mail address.
Let?s take a blog, as another example. A
blog has articles and comments. Both an article and a comment would each be an object. ARTICLES
and COMMENTS
would be our tables. An article might have an author, a post date, along with the article content itself. Likewise, a comment might have an author, the author?s e-mail address, along with the comment itself. These will be the fields of our tables.
What?s a Primary or Foreign Key?
A primary key is a field that stores a unique value for each field in that table. A primary key is usually a number that automatically increments for each new record entered into the table to ensure that the field is unique. A foreign key is the primary key value stored in a secondary table and defines a relationship between the two tables. A primary key should also be indexed. Most databases automatically create an index on a field marked as the primary key. An index is a lookup system that the database server uses to find results faster.
Our user table, for example, would have a field
called id
. This is the primary key. If that user were to post a new article
on the site, to identify that user as having written that article, the id of
that user is stored in a record in the article table. This links that particular
article to that particular user. The userid
field in the
article table is the foreign key.
Table and Field Naming Convention
Those who have come from an Oracle database
development background tend to name their tables singularly. The table that
stores all the users would be the USER
table. My personal
preference is to actually name the tables plural. In this case, it would be USERS
.
I also like to include a table prefix that relates to the project. This is often a good idea as it helps resolve any naming conflicts when tables for two different applications have to reside in the same database.
When it comes to field naming, I?ve noticed
that most keep it simple. Field name for our USERS
table would be id
, username
, password
,
and email
. In the past, I have actually leaned towards using part of the
table name as part of the field name. I?d have usrid
, usrname
, usrpass
, and
usremail
. This tends to come in quite handy when joining a number of tables
together as I find it more obvious and can avoid having to include table names
when referencing my field names. I also like that my primary and foreign key
fields are the same name.
For example, let?s say we have a blog. Within
the database for this blog we have a posts
table and a comments
table. Each table has an author
field. If for some reason we needed to join these two fields
together, we?d see something like this:
SELECT
posts.author, comments.author FROM posts
INNER JOIN comments ON posts.id = comments.postid
Be using the naming convention I describe above, you could instead do the following:
SELECT
pstauthor, cmtauthor FROM posts
INNER JOIN comments ON posts.pstid = comments.pstid
I have steered back to the norm these days as most frameworks like Ruby on Rails tend to require the simpler naming convention and it just so happens to be what my co-workers use (things work out better when everyone on the team uses the same style). I?ll leave it up to you to decide which is easier to use. It is important to develop a process and stick with it for the course of a project.
Normalisation. What is it?
Normalising is the process of removing any
redundancies that may exist in a table. Let?s go back to our user. Now we need
to store additional e-mail addresses for each user (I can?t be the only one
with five e-mail addresses). While it may seem tempting to simply add on fields
called email1
, email2
, and email3
, you can probably already see how difficult this would be to scale.
You might think to yourself, what user would ever have more than three e-mail
addresses? But by moving e-mail addresses into a table on its own, a user can
have as many e-mail address associated with his or her account as the database
will hold.
We create an email table which contains our primary key, the actual e-mail address and then the foreign key for the user id to link an e-mail address to a specific user.
EMAILS
id
email
userid
For an in-depth description of each of the forms of normalisation, I suggest you read the Wikipedia entry.
Sometimes, it may make sense not to
normalise, or rather, to denormalise some aspect of your database. Take a
one-time survey as an example. If the survey has 10 questions then the field
names could be question1
, question2
? question10
. Each record would store a user?s selected answers. Creating one
insert statement to store all answers is easier to manage than having to do an
insert for each answer. In this case, 10 different insert statements would have
to be executed).
If you have any tips that you'd like to share, feel free to add them to the comments.
Conversation
Well.. you wouldn't have to do 10 insert statements:
INSERT INTO (email, userid) VALUES (email, id), (email2, id), (email3, id)....
and so on.
I am using the same kind of conventions has you, it seems.
Including a - plural - table prefix to the table name related to the project, separating tables from different applications in a same database. BLOG_users
Plus, I put this prefix uppercase separated by an underscore (for better & quicker visibility). Same as you for field naming, I keep a prefix made with the table name, uppercase too. USR_name
JohnO: not all database servers support that multi-insert format (I kept the article database generic) -- MySQL may be the only one (if anyone is aware of Oracle or Postgresql or SQLite support, I'd be glad to hear it)
To the best of my knowledge, Oracle 10g does not support multiple inserts with the syntax JohnO has used above.
Here is the Syntax for an Oracle 10g Insert Statement.
There is way to insert multiple values with one select statement in Oracle 10g; although it seems more like a hack. Use an insert select statement as shown below.
INSERT INTO users.table (field1, field2)
(SELECT 'test value', '1' FROM dual
UNION SELECT 'test value', '2' FROM dual
UNION SELECT 'test value', '3' FROM dual);
About the naming conventions... Aliases help keep complex queries more readable, regardless of the table names.
"SELECT author AS postauthor FROM posts"
Great article John! I've even posted a link to my blog to this article because it is a great introduction to databases.
Might there be a part 2 to look forward to?
I have a feeling that your putting up this article is in some way related to your final question on this article.
Wesley, you astute reader, you! Yes, it partially came out of the observation that introductory topics seem to generate more traffic. But more often than not my topics come from talking to friends or co-workers or poring over comments on previous posts. Issues that they run into when jumping into something become inspiration for stuff to write about (hey, it's a difficult task to come up with interesting stuff to write about all the time). And as DD has requested, a part two just might be in order. :)
On the naming conventions.... I use something similar:
1. All table names are plural
2. I create a 2 o 3 letter prefix for each table based on the name wich then is added to the column names using an underscore.
3. When using a foreign key you also add the name of the original table (that way you know where that key is coming from).
4. Camel Base names (to make it easy to read).
So if I have a table called 'Users' the columns names would be usr_id, usr_fullName, usr_email.
Then the Posts table coul have values like pst_id, pst_title, pst_body, pst_usrId_writer
This way you know exactly what each values means and where it comes from.
You can also do joins solely in the WHERE clause, which I find more fluid. Using the blog example above, you might have a user table:
And a posts table:
To join the two we could do the following:
Which would leave us with a resultset showing username, subject and content for each post in the blog.
You may also notice how you can create aliases for table names to save typing the full table name over and over again in the select section of the query.
Jonathan, you write:
"A blog has articles and comments. Both an article and a comment would each be an object. AUTHORS and COMMENTS would be our tables."
Don't you mean ARTICLES and COMMENTS would be our tables?
Sorry if you find posts correcting your articles annoying, feel free to delete this.
This is going to be a great article for a lot of people. Thanks for sharing.
Iota: I'll actually be going into some depth as to why I use the INNER JOIN syntax in a followup post. :)
Geoff: I've fixed the mistake. I love posts pointing out any errors I've made. Every little bit helps!
Indeed, interesting article Jonathan. I'm looking forward to the next one too ;)
Re: Iota
The join using a list of tables (and no JOIN) is an implicit CROSS JOIN.
Snook calls these "Crazy Joins" in the next article. Could become very inefficient, I guess.