Home > Grade of the Steel, OLTP, SQL Server > Secret Scale Sauce : Heavy Singleton INSERT spinlocks

Secret Scale Sauce : Heavy Singleton INSERT spinlocks

Paul Randal recently ran a very interesting test on a simple INSERT workload. His results were promising and provide both good conclusions and more suggestions for research.

You should read Paul’s blog post first, because if you have not, this blog will not make much sense.

I decided to bring Paul’s test onto a DL980 box with 8 x 320 GB FusionIO cards I have access to. After a few email exchanges with Paul, I managed to run his test harness which basically is 64 times SQLCMD running this command:



SELECT @counter = 0;
SELECT @start = GETDATE ();

WHILE (@counter < 2500000)

  IF @counter % 1000 = 0 BEGIN TRAN

SELECT @counter = @counter + 1;
IF @counter % 1000 = 999 COMMIT TRAN   



INSERT INTO Results VALUES (CONVERT (INTEGER, DATEDIFF (second, @start, @end)));

Unfortunately, my current DL980 has the FusionIO drives set up for max capacity and low write speeds (and there are files on the drives I cannot move)– so the result will not yet be comparable to Paul’s – you should take these early runs with a grain of salt and not draw any conclusions based on them.


  • I can drive the CPU to 100% on my 64 core box.
  • Throughput is not steady, the CPU’s are all over the place (big fluctuations)
  • I am running on 8 files
  • Largest wait is the WRITELOG wait (about 70%) – rest are PAGELATCH.
  • There are big differences between thread completion times. The threads on the first two NUMA nodes complete much faster than the others. Runtime fluctuations are between 150 and 600 seconds (but still nearly four times faster than Paul’s run)
    The WRITELOG wait is not surprising; Paul’s run is done on write optimized drives and I have sacrificed the write optimization for additional space (I hope to get some runs in on write optimized drives later this weekend). But, it might be surprising that I hit 100% CPU while still struggling with this wait. Whenever CPU is too high compared to throughput, it is generally a good idea to dig into spinlocks:

SELECT * FROM sys.dm_os_spinlock_stats
WHERE backoffs > 0

Aha! This gives us some insights:


We are spinning on log cache structures and burning CPU on this – again, not surprising and some of this will go away as we boost the transaction log speed. We are trying to squeeze way beyond what the log will allow us to squeeze through, and there is no one in the runnable queue behind us.

But what are those other spins? OPT_IDX_STATS and SOS_OBJECT_STORE?

Well, OPT_IDX_STATS is the optimizer trying to update the DMVs sys.dm_db_index_usage_stats and sys.dm_db_missing_index_stats. You can read about this in KB 2003031. While the KB is about databases with many thousand rows in those DMV – you might see why this problem can appear for a single table that has heavy insert activity: A quick calculation (16M rows in 600 sec) shows us that we are hitting at least 270.000 inserts/sec on this run, with peaks up to 4 times that (nice, by the way). It is not strange that there is some contention on that data structure.

The knowledge base article hints at a solution: run TF2330. This has to be set at startup, not with DBCC TRACEON. Running this trace flag got rid of the wait, and it looks like I got a minor throughput increase (hard to measure under big fluctuations)

My next step is to diagnose the SOS_OBJECT_STORE, more to follow…

(By the way, I played around with naming the columns in the INSERT statement instead of using default values. This turned out to be a very bad idea, I got massive amounts of SQL_MGR spins – you can read about why I got that here.)

Category: Grade of the Steel

  1. Mark
    February 11, 2013 at 09:55

    There is a slight buglet in your loop control (which isn’t present in Paul’s).
    Your code will use an explicit transaction to insert 999 rows and then an auto transaction to insert one and then another explicit transaction for the next 999 rows. Like this:

    1000 BEGIN TRAN

    I’m pretty sure this will double your log IO/sec.

    • Thomas Kejser
      March 2, 2013 at 14:09

      You are right Mark, there is an off by one error here. Thanks for pointing it out. This actually means I am committing too much, which means I should be able to run faster.

      I will rerun the test once I get a chance to play with a big box again 🙂 I will leave the bug in the blog for now, since the results are only valid under that code.

  1. May 30, 2011 at 00:05
  2. June 20, 2011 at 12:42
  3. November 9, 2011 at 21:11
  4. January 3, 2013 at 15:13

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s