Systems Engineering and RDBMS

Surrogate Keys vs Natural Keys for Primary Key?

Posted by decipherinfosys on February 1, 2007

This topic probably is one of those that you cannot get any two database developers/DBAs to agree upon.  Everyone has their own opinion about this and it is also one of the most discussed topics over the web when it comes to data modeling.  Rather than taking any side :-), we are just listing out our experiences when it comes to chosing between a surrogate key vs the natural keys for the tables.

Surrogate Key:

Surrogate keys are keys that have no “business” meaning and are solely used to identify a record in the table.  Such keys are either database generated (example: Identity in SQL Server, Sequence in Oracle, Sequence/Identity in DB2 UDB etc.) or system generated values (like generated via a table in the schema).

Natural Key: 

Keys are natural if the attribute it represents is used for identification independently of the database schema.  What this basically means is that the keys are natural if people use them example: Invoice-Numbers, Tax-Ids, SSN etc.

Design considerations for choosing the Primary Key:

Primary Key should meet the following requirements:

  1. It should be not null, Unique and should apply to all rows.
  2. It should be minimal (i.e. less number of columns in the PK: ideally it should be 1, if using composite keys, then make sure that those are surrogates and using integer family data-types).
  3. It should be stable over a period of time (should not change i.e. update to the PK columns should not happen).

Keeping these in mind, here are the pros and cons of Surrogate vs. Natural keys:

Surrogate Key

I prefer surrogate keys to be DB controlled rather than being controlled via a next-up table in the schema since that is a more scalable approach.

Pros:

  1. Business Logic is not in the keys.
  2. Small 4-byte key (the surrogate key will most likely be an integer and SQL Server for example requires only 4 bytes to store it, if a bigint, then 8 bytes).
  3. Joins are very fast.
  4. No locking contentions because of unique constraint (this refers to the waits that get developed when two sessions are trying to insert the same unique business key) as the surrogates get generated by the DB and are cached – very scalable.

Cons:

  1. An additional index is needed.  In SQL Server, the PK constraint will always creates a unique index, in Oracle, if an index already exists, PK creation will use that index for uniqueness enforcement (not a con in Oracle).
  2. Cannot be used as a search key.
  3. If it is database controlled, for products that support multiple databases, different implementations are needed, example: identity in SS2k, before triggers and sequences in Oracle, identity/sequence in DB2 UDB.
  4. Always requires a join when browsing the child table(s).

Natural Key

 Pros:

  1. No additional Index.
  2. Can be used as a search key.

 Cons:

  1. If not chosen wisely (business meaning in the key(s)), then over a period of time additions may be required to the PK and modifications to the PK can occur.
  2. If using strings, joins are a bit slower as compared to the int data-type joins, storage is more as well.  Since storage is more, less data-values get stored per index page.  Also, reading strings is a two step process in some RDBMS: one to get the actual length of the string and second to actually perform the read operation to get the value.
  3. Locking contentions can arise if using application driven generation mechanism for the key.
  4. Can’t enter a record until value is known since the value has some meaning.

 Choosing Surrogate vs. Natural Keys:

There is no rule of thumb in this case. It has to be evaluated table by table:

  1. If we can identify an appropriate natural key that meets the three criteria for it to be a PK column, we should use it.  Look-up tables and configuration tables are typically ok.
  2. Data-Type for the PK: the smaller the better, choose an integer or a short-character data type.   It also ensures that the joins will be faster.  This becomes even more important if you are going to make the PK as a clustered index since non-clustered indexes are built off the clustered index.  RDBMS processes integer data values faster than the character data values because it converts characters to ASCII equivalent values before processing, which is an extra step.
About these ads

11 Responses to “Surrogate Keys vs Natural Keys for Primary Key?”

  1. [...] As you know, the Identity property is typically used by database designers when they want to use a surrogate key for a table. This allows them to rely on the database engine to automatically increment the value [...]

  2. [...] this can once again lead to blocking situations related to un-committed keys. The creation of the surrogate keys should be left to the DB – for identity property columns in SQL Server and DB2 LUW as well as the [...]

  3. [...] decipherinfosys on April 1, 2008 We have blogged before about how to go about choosing between a natural key vs a surrogate key when doing data modeling work.  One of the recent questions that was raised was about using a GUID [...]

  4. [...] situation was like this: There were two tables: COMPANY and COMPANY_SECURITY. Both had a surrogate key defined as an auto-incremental ID value. COMPANY table had an alternate key defined on CUSIP – in [...]

  5. [...] For Surrogate key Vs. Natural Key Refer [...]

  6. [...] For Surrogate key Vs. Natural Key Refer [...]

  7. [...] Recordemos que una Clave primaria (PK) es un conjunto de campos que identifica de forma única un registro de una tabla. Puede ser un solo campo o varios. El debate se produce cuando tenemos que escoger qué campos formarán la clave primaria de la tabla. Podemos utilizar una clave natural o una subrogada. [...]

  8. [...] http://decipherinfosys.wordpress.com/2007/02/01/surrogate-keys-vs-natural-keys-for-primary-key/ Be the first to rate this postCurrently 0/5 Stars.12345 Tags: Categories: Actions: E-mail | Kick it! | Permalink | Comments (0) | Comment RSS [...]

  9. [...] Surrogate Keys vs Natural Keys for Primary Key? Share this:EmailPrintFacebookShareDiggRedditStumbleUpon [...]

  10. [...] of natural keys. There are quite a few objective comparison of the 2 approaches, for example by Decipher Information Systems or in Wikipedia. However, for software business, the author still feels that natural key is the way [...]

  11. […] Surrogate Keys vs Natural Keys for Primary Key? […]

Sorry, the comment form is closed at this time.

 
Follow

Get every new post delivered to your Inbox.

Join 78 other followers

%d bloggers like this: