Next: One-to-Many Joins
Up: Joining Tables
Previous: Additional Join Possibilities
Choosing a Join Key
The join key is the value used to link rows between tables. For example, in
Figure , 648 is the customer key,
appearing in the customer table to uniquely identify the row, and in
the salesorder table to refer to that specific customer row.
Some people might question whether an identification number is needed. Should
the customer name be used as a join key? Using it as the join key is not a good
idea for several reasons:
- Numbers are less likely to be entered incorrectly.
- Two customers with the same name would be impossible to distinguish in a join.
- If the customer name changes, all references to that name would have to change.
- Numeric joins are more efficient than joins of long character strings.
- Numbers require less storage space than character strings.
In the statename table, the two-letter state code is probably a good
join key for the following reasons:
- Two-letter codes are easy for users to remember and enter.
- State codes are always unique.
- State codes do not change.
- Joins of short two-letter codes are not significantly slower than integer joins.
- Two-letter codes do not require significantly more storage space than integers.
Essentially, two choices for join keys exist: identification numbers and short
character codes. If an item is referenced repeatedly, it is best to use a short
character code as a join key. You can display this key to users and allow them
to refer to customers and employees using codes. Users prefer to identify items
by short, fixed-length character codes containing numbers and letters. For example,
customers might be identified by six-character codes (FLE001),
employees by their initials (BAW), and parts by
five-character codes (E7245). Codes are easy to use and remember.
In many cases, users can choose the codes, as long as they are unique.
It is possible to allow users to enter short character codes and still use identification
numbers as join keys. Adding a code column to the table accomplishes
this goal. For the customer table, a new column called code can
be added to hold the customer code. When the user enters a customer code, the
query can find the customer_id assigned to the customer code, then
use that customer_id in joins with other tables. Figure
shows a query using a customer code to find all order numbers for that customer.
SELECT order_id
FROM customer, salesorder
WHERE customer.code = 'FLE001' AND
customer.customer_id = salesorder.customer_id;
In some cases, identification numbers work well and codes are unnecessary,
as in the following cases:
- Items with short lifespans, such as order numbers
- Items without appropriate codes, such as payroll batch numbers
- Items used internally and not referenced by users
Defining codes for such values would be useless. It is better to allow the database
to assign a unique number to each item. Chapter
discusses database support for assigning unique identifiers.
No universal rule dictates when you should choose codes or identification numbers.
U.S. states are clearly better keyed on codes, because only 50
exist. The resulting codes are short, unique, and well known by most users.
At the other extreme, order numbers are best used without codes because too
many of them are possible and codes would be of little use.
Next: One-to-Many Joins
Up: Joining Tables
Previous: Additional Join Possibilities
Bruce Momjian
2001-05-09