SQL Server Identity Quirks


If you’ve worked with SQL Server for any length of time odds are that you’ve used the IDENTITY property to create a surrogate key for a table. These auto-incrementing primary keys can be very handy, but they have a couple of quirks that have bitten me enough in the past (and recently) that I thought they warranted a blog post.

Disclaimer: None of what follows is revelatory; anyone with an even cursory knowledge SQL Server will already know everything I’m writing here. That said, I’m writing this post anyway in the hope that this knowledge might reach one or two folks that wouldn’t otherwise have been exposed to it.

Quirk #1: A Rolled Back Transaction Still Consumes An Identity Value

A column that makes use of the IDENTITY property uses two bits of metadata: a seed and an increment. The seed is the value from which the identity sequence will start. The increment is the value that is added to the identity as new rows are inserted. Most IDENTITY columns start their lives with a seed value of 1 and an increment value of 1. The first row inserted into a table with an IDENTITY column configured in this way will get the value ‘1’ in the IDENTITY column. The second row inserted will get the value ‘2’. The third row will get ‘3’, and so on. In this way you might expect that a table with an IDENTITY column that never has data updated or deleted will have a perfect sequence of identity values from 1 to N where N is the total number of rows in the table. This is technically possible, but tends to fall apart when transactions come into play.

Consider a simple database table named ‘CustomerQueue’ with the following columns:

  • Id– int IDENTITY(1,1) an identity column with a seed and increment of ‘1’
  • FirstName – nvarchar(50)
  • LastName – nvarchar(50)
  • …and so on, you get the idea

Now consider the following set of SQL statements:

BEGIN TRAN
INSERT CustomerQueue (FirstName, LastName) VALUES ('John', 'Knibb')
COMMIT
GO
 
BEGIN TRAN
INSERT CustomerQueue (FirstName, LastName) VALUES ('Susan', 'Tibbs')
COMMIT
GO
 
BEGIN TRAN
INSERT CustomerQueue (FirstName, LastName) VALUES ('Jesse', 'Taber')
ROLLBACK
GO
 
BEGIN TRAN
INSERT CustomerQueue (FirstName, LastName) VALUES ('Steven', 'Johnson')
COMMIT
GO

What would you expect the contents of the CustomerQueue table to be after these four SQL statements are executed? You might think it would look something like this:

Id FirstName LastName
1 John Knibb
2 Susan Tibbs
3 Steven Johnson

Note how the row for ‘Jesse Taber’ is missing because that transaction was rolled back instead of committed.

That set of data is pretty close, but in reality you’d end up with data that looks like this:

Id FirstName LastName
1 John Knibb
2 Susan Tibbs
4 Steven Johnson

Note how the Id value of 3 is missing. That value would have been assigned to the ‘Jesse Taber’ row had that transaction been committed, but it wasn’t and now it’s simply missing from the sequence. You might think that this is odd behavior (I certainly did the first time I observed it), but if you think about it a bit you’ll realize that it must work this way or else it wouldn’t work very well at all.

Imagine a busy deli counter at a grocery store with one of those “please take a number” ticket dispensers that is designed to ensure that customers are served in the order in which they arrived at the counter. An IDENTITY column is a bit like the roll of tickets in the dispenser: it contains a sequence of non-repeating numbers where each number is 1 higher than previous one. When customers pull a ticket out of the dispenser they have expressed intent to request some service at the deli counter. This is just like an INSERT statement that has been made in a yet-to-be-committed database transaction: it just represents an intent to insert some data into that row.

When ‘Jesse Taber’ first walked up to the deli counter and took a ticket out of the dispenser he had every intention of ordering something. After waiting for a few minutes, however, his cell phone beeped reminding him of an appointment he completely forgotten about that was supposed to start in 10 minutes. He had to hurry out of the store before being able to place an order. When he left he effectively rolled back his intention to order and the ticket that he took with him left a gap in the deli counter queue sequence. Much like you wouldn’t expect the ticket dispenser to hunt down the person(s) who took the ticket after Jesse and re-issue him or her Jesse’s number, you shouldn’t expect a SQL Server IDENTITY column to re-issue the identity value that was doled out to a transaction that ultimately got rolled back. It might be technically possible to build a system that could re-issue numbers on the fly to eliminate gaps in the sequence, but the value that would be derived from it probably wouldn’t be worth the effort to build it.

If it absolutely drives you mad to see gaps in a sequence like this I’d suggest that you take the same attitude toward it that the workers behind the counter take when they call a number for a person who had to duck out of the queue: just move on. Much the deli counter ordering system is tolerant of gaps in the queue sequence your system should be tolerant of gaps in any IDENTITY sequence.

Quirk #2: DBCC CHECKIDENT Might Reseed Unless You Tell It Not To

While it’s best to take a “set it and forget it” attitude toward IDENTITY columns you may, from time to time, feel the need to much with the identity sequence directly. Luckily for you SQL Server comes with a nice IDENTITY-related shotgun with which you can shoot yourself in the foot: the DBCC CHECKIDENT command. This command lets you both view and change the current seed value for an IDENTITY column. In its simplest form this command takes a single parameter representing the name of the table for which you’d like to check the current identity value:

DBCC CHECKIDENT ("dbo.CustomerQueue")
Judging solely by the name of command (CHECKIDENT) and the parameter that’s been provided you might think that this command would only check the current identity value for the table and not modify it. After all, there is an additional ‘RESEED’ argument that can be specified if you wish to force the current seed value of the identity to change. In most cases, just calling CHECKIDENT without the RESEED argument will not modify the current seed value. If, however, the current identity value for the table is less than the maximum value in the column, then the CHECKIDENT command will automatically reseed the identity with the maximum value from the column. I’ve always thought this behavior was odd, and I’ve never been able to come up with a good explanation for why this command was designed this way. Granted, having the seed of the IDENTITY column be lower than the current maximum value in the column carries a lot of potential risks, but I think I’d still rather see the default behavior not modify anything unless the RESEED argument is specified.

The moral of the story: if you really just want to check the current value of the identity in a table without modifying it you should use the NORESEED argument:

DBCC CHECKIDENT ("dbo.CustomerQueue", NORESEED)

Leave a comment

Your email address will not be published. Required fields are marked *