Boosting INSERT Speed by Generating Scalable Keys

Home > Grade of the Steel, OLTP, SQL Server > Boosting INSERT Speed by Generating Scalable Keys

Boosting INSERT Speed by Generating Scalable Keys

October 5, 2011 Leave a comment Go to comments

Throughout history, similar ideas tend to surface at about the same time. Last week, at SQLBits 9, I did some “on stage” tuning of the Paul Randal INSERT challenge.

It turns out that at almost the same time, a lab run was being done that demonstrated, on a real world workload, a technique similar to the one I ended up using. You can find it at this excellent blog: Rick’s SQL Server Blog.

Now, to remind you of the Paul Randal challenge, it consists of doing as many INSERT statements as possible into a table of this format (the test does 160M inserts total)

CREATE TABLE MyBigTable (
    c1 UNIQUEIDENTIFIER ROWGUIDCOL DEFAULT NEWID ()
    ,c2 DATETIME DEFAULT GETDATE ()
    ,c3 CHAR (111) DEFAULT ‘a’
    ,c4 INT DEFAULT 1
    ,c5 INT DEFAULT 2
    ,c6 BIGINT DEFAULT 42);

Last week, I was able to achieve 750K rows/sec (runtime: 213 seconds) on a SuperMicro, AMD 48 Core machine with 4 Fusion-io cards with this test fully tuned. I used 48 data files for best throughput, the subject of a future blog.

Now, I suspect Paul used NEWID() for key generation, to create as much I/O as possible with page splits. If the goal is purely to optimize throughput, why not ask: Is NEWID really the best way to generate a key?

Traditional SQL Server knowledge dictates that either IDENTITY(1,1) or NEWSEQUENTIALID are better key choices: they lay out the keys sequentially, at the end of the index. But one should not easily trust such recommendations without validating them. I did the above test on the 48 core box Fusion-io lend me, and here are the results:

This might shock you! The “Traditional, sequential layout” of the key is more than 100 times SLOWER than completely random inserts!

The CPU are NOT at 100% during this (they were when using NEWGUID). The problem shows itself when we dig into sys.dm_os_wait_stats. (by the way, below is my new: “this is how I think about it” diagram, let me know if you like this way of doing things and I will continue to use this format for illustrating troubleshooting mind processes)

We are waiting for a PAGELATCH on the last pages of the index. This wait dominates the run. The price of page splits is minor (when you have a fast I/O system) compared to the price of coordinating the last page memory access!

As Rick writes in his blog, there are ways to work around this problem by being smart about your key generation. Basically, you want a key that is NOT sequential. Rick flips the bits around in a Denali SEQUENCE objects – a very neat idea that uses a new SQL Server Denali feature.

For my test at SQLBits (which ran on 2008R2), I used a variant of the same idea: I took the @@SPID of the users session and added an increasing value (increments controlled by the app server) to each insert made. On reflecting, using the SPID is not a good idea, you should probably generate the key something like this:

INSERT Key =

[Unique App Server ID] * 10**9 + [Unique Number Generated at App Server]

I also tried the hash trick I have described here – to compare the techniques.

Below are the results of my test of Paul’s INSERT Challenge with different key strategies (I have taken out the sequential ones, since we have seen how poorly they perform)

The last column is a trick I will describe in more detail in a later blog. Suffice to say that I reached 1.1M INSERT/sec (runtime: 150 sec) using a key generation strategy very similar to Rick’s.

Edits:

The UNIT in the introduction was originally wrong. it should of course be 750K not 750M as can be seen later in the post.

For Myth busting, see the site map:

DW and Big Data

Comments (14) Trackbacks (6) Leave a comment Trackback

Kenneth M. Nielsen (@DoktorKermit)

October 6, 2011 at 10:19

Reply

Hi Thomas,

I really like the new “this is how I think about it” diagram as it gives, at least to me som insights to how you could go about analysing som long running statements.

Keep up the good work, as i really enjoy reading your blog, and find it very inspiring.

Best Regards
Kenneth M. Nielsen
- Thomas Kejser
  
  October 6, 2011 at 12:35
  
  Reply
  
  Thanks Kennet, I will try to use this more in future troubleshooting blogs
Farhan Soomro

October 6, 2011 at 19:27

Reply

Great Blog Thomas and I like the diagram.
Did yo try to insert in Heap?
Farhan Soomro

October 6, 2011 at 21:14

Reply

I take my question back after looking at the table structure 🙂
Russell Fields

October 25, 2011 at 14:22

Reply

Thomas, Could you clarify please. Your first bar chart shows newguid() as the fastest, which you report at 750M rows/sec. However the second bar chart show SPID + Affinity as faster than newguid at 1.1M inserts/sec. (From this I am guessing the first number should be 750K rows/sec, but perhaps I am missing something.)
- Thomas Kejser
  
  October 25, 2011 at 18:18
  
  Reply
  
  Sorry Russell, you are correct. the GUID is 750K.. while the SPID+affinity is indeed 1.1 milliion (and faster than the GUID)… I will correct units
  
  Thanks for finding this
Kunal

June 3, 2012 at 01:04

Reply

Very informative blog! It’d interesting to see if one lets it run for few hours at that rate @ M+ row/sec….
- Thomas Kejser
  
  June 3, 2012 at 09:12
  
  Reply
  
  I only ran the test for max one hour. Intuitively, I don’t see why running it longer should change anything – what worries you about it Kunal?
Kunal

June 4, 2012 at 05:14

Reply

“Write Cliff” on flash cards
- Thomas Kejser
  
  June 4, 2012 at 10:27
  
  Reply
  
  That’s a reasonable concern and one I should have addressed in the blog.
  
  I have not seen the write cliff happen on the Fusion cards, even under very long and sustained load (including synthethic high load harness). It really depends on the driver/controller for NAND – the enterprise grade stuff does not run into the issue.
Ed

September 14, 2012 at 08:44

Reply

Hi Thomas. Interesting article, thanks. Would you know how read performance is correlated with each approach?
- Thomas Kejser
  
  September 16, 2012 at 10:29
  
  Reply
  
  Hi Ed
  
  Good question. It would depend on the type of read you are doing. For unordered scans and singletons, Perf is pretty much the same. Fr range scans, there is a cost which will depend on I/O system etc.
datatake

December 17, 2012 at 08:41

Reply

Thomas,

I’m intrigued as to what the new sequence feature in SQL 2012 would give you with the cache option used. I believe that Rick mentioned this on his blog.

Regards,

Chris
- Thomas Kejser
  
  December 17, 2012 at 09:23
  
  Reply
  
  A bit reversed, cached, sequencer would be a very good solution to the problem. Ricks code is great and has been proven to work at very high scale.
  
  Oracle has a feature called “reverse indexes” which simply store the bits in reverse order and achieve the same effect. The query optimizer will then flip the bits when you search for a value before it digs into the index.