Tuesday 14 October 2014

Are you using your Auto Numbers/Identities Effectively?

So I am starting my blog with something simple. It seems to be a very popular trend to add an auto number/identity column to most tables these days. My personal preference is too look for something in the data that makes the data unique and use that as a key.

There has been and always will be a huge debate if using auto-numbers /identities are the "best" way to model a database. I am not going to argue this point, for me having a natural key or creating an artificial/surrogate key is just one of many design techniques I have in my tool box.

There seems to be one fundamental mistake that a lot database designs have when using auto-numbers as a primary key. Essentially most databases I have seen have been configured to use only half the available range of auto-numbers/identities values.

You might see the following code snippets a lot.

MySQL Example:
CREATE TABLE IF NOT EXISTS `Transactions` 
(
    `TransactionID` INT NOT NULL AUTO_INCREMENT
     /*
         Other Column Definitions 
     */ 
     PRIMARY KEY (`TransactionID`)  
)
ENGINE = InnoDB; 
And on MSSQL(SQL Server) you probably have encountered the following:
CREATE TABLE [dbo].[Transactions]
(
    [TransactionID] [int] IDENTITY(1,1) NOT NULL,
    CONSTRAINT [PK_Transactions] PRIMARY KEY CLUSTERED 
(
    [TransactionID] ASC
    /*
       Other Column Definitions
    */
)
WITH 
(
        PAD_INDEX               = OFF,
        STATISTICS_NORECOMPUTE  = OFF, 
        IGNORE_DUP_KEY          = OFF, 
        ALLOW_ROW_LOCKS         = ON, 
        ALLOW_PAGE_LOCKS        = ON
) ON [PRIMARY]
Pretty standard design pattern here nothing new right? Well one problem both these techniques have is that they are cutting the number of entries allowed in the table by half.

Let deal with the MySQL example first. We are declaring the TransactionID as an integer data type. If you read your MySQL data types definitions you would realise that MySQL has both signed and unsigned integer data type. So by declaring your data type as integer you are saying signed. However since you don't specify the starting range MySQL will start at the integer value 1.

By default MySQL uses a signed number when declaring an int, this means the proper value for this data type is -2147483648 to 2147483648. Starting at 1 means you are starting to count halfway. If you are designing a database to be scalable why on earth would you throw away half the capacity? Essentially you are limiting the scale of your database by declaring your data type incorrectly.

The same applies to MSSQL(SQL Server). I have seen hundreds of examples where Identities start at 1 or 100 or some other arbitrary number. I know some of you use this identity/auto-number as a transaction number, reference code, order number etc. This number is typically used by a customer to identify a transaction for example. So yes it makes sense in the above mentioned example to use some arbitrary positive number as a starting point.

However in some cases these auto-numbers are used in places where people will never know their value. In such cases don't just add a auto-number with its defaults but think about it's usage.

If this auto number is never going to be used by some external party to identify the row i.e. a client quoting a reference to a help desk representative you should not have to worry about its value being negative or positive. So yes I am suggesting to start your auto-numbers/auto-increments/identities at a negative number where appropriate.

Now there is one little caveat on MySQL, it seems impossible to set a negative auto-number.For example when executing the following query:
CREATE  `foo` 
(
   `fooID` INT NOT NULL AUTO_INCREMENT ,
   PRIMARY KEY (`fooID`) 
) ENGINE=InnoBD AUTO_INCREMENT=-1000
You will see the error message:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-1000'
A simple solution to this problem is just to declare your auto-numbers/identities as a UNSIGNED INT. This means MySQL will not loose the negative range of numbers as the number will be unsigned.

The bottom line is that you need to think about the usage of the column and choose the most appropriate data type and value possible for your current and hopefully future needs. Making a simple mistake like declaring a default value or auto-numbers incorrectly can have some rather nasty consequences later on.

Act in haste and repent at leisure: Code too soon and debug forever

No comments:

Post a Comment