Spinlocks | Thomas Kejser's Database Blog

Synchronisation in .NET– Part 4: Partitioned Data Structures

January 5, 2014 5 comments

In this final instalment of the synchronisation series, we will look at fully scalable solutions to the problem first stated in Part 1 – adding monitoring that is scalable and minimally intrusive.

Thus far, we have seen how there is an upper limit on how fast you can access cache lines shared between multiple cores. We have tried different synchronisation primitives to get the best possible scale.

Throughput this series, Henk van der Valk has generously lent me his 4 socket machine and been my trusted lab manager and reviewer. Without his help, this blog series would not have been possible.

And now, as is tradition, we are going to show you how to make this thing scale.

Synchronisation in .NET– Part 3: Spin Locks and Interlocks/Atomics

January 4, 2014 2 comments

In the previous instalments (Part 1 and Part 2) of this series, we have drawn some conclusions about both .NET itself and CPU architectures. Here is what we know so far:

When there is contention on a single cache line, the lock() method scales very poorly and you get negative scale the moment you leave a single CPU core.
The scale takes a further dip once you leave a single CPU socket
Even when we remove the lock() and do thread unsafe operations, scalability is still poor
Going from a class to a padded struct gives a scale boost, though not enough to get linear scale
The maximum theoretical scale we can get with the current technique is around 90K operations/ms.

In this blog entry, I will explore other synchronisation primitives to make the implementation safe again, namely the spinlock and Interlocks. As a reminder, we are still running the test on a 4 socket machine with 8 cores on each socket with hyper threading enabled (for a total of 16 logical cores on each socket).

My First Course is now Available

June 25, 2012 Leave a comment

I am proud to announce that my first course: Tuning – Diagnosing and Fixing Hard Problems is now available. This is the first in what I expect to be a series of 5 courses that I am developing to share my knowledge with the field (for a price this time).

You can find details of the course on my Courses Page.

Thread Synchronization in SQL Server

November 9, 2011 4 comments

Any code optimized for highly concurrent workloads must worry about thread synchronization. SQL Server is no exception, because in a database system, synchronization is one of the core functionalities you rely on the engine to provide (the noSQL crowd may ponder that a bit). In this blog post, I will describe the synchronization primitives, like locks, latches and spinlocks, used by SQL Server to coordinate access to memory regions between threads.

Latch and Spinlock Papers Published on Microsoft

July 1, 2011 Leave a comment

I am happy to announce that my team mates, Ewan Fairweather and Mike Ruthruff have published two excellent whitepapers on latch and spinlock diagnosis. You can find them here:

Pay special attention to the spinlock scripts, you will find an interesting trace flag in there that sheds more light on my intentionally vague dodging of the call stack subject in my blog entry here. I am sorry that I did not provide more details in the blog, but I did not want to give the plot away Smile

I am very proud to have Ewan take over the OLTP tuning on the SQLCAT team here in EMEA. Buy this guy a drink next time you meet him at a conference and have a chat with him about scalability.

Diagnosing and fixing SOS_OBJECT_STORE spins for Singleton INSERTS

May 30, 2011 2 comments

Following up on my previous post, my next target for “optimization”, while I am waiting for an even faster I/O system, is the SOS_OBJECT_STORE spin.

Recall that I am having high waits for WRITELOG, but still see 100% CPU, which indicates that spins may be our sinner. The big spin on the system is still LOGCACHE_ACCESS – but until we get hardware to work on that – we might as well stay greedy and learn a bit about SQL Server in the process. We just got rid of the OPT_IDX_STATS spin by running TF2330.

Unfortunately, the documentation available on our next spin: SOS_OBJECT_STORE is rather sparse. It is one of the SQLOS internal data structure used many places inside SQL Server. But there are ways, even for the public (which is why I can share it here), to get more information about what is going on. You can capture the call stacks of SQL Server when it does this spin and use publicly available symbols to lookup the function names inside the code.

One way to do this is to run an Xperf trace of the sqlservr.exe, another is with WinDbg. Anything that can collect and bucketize call stacks can help you. I will not get into more details here, but follow the links in this paragraph to learn more. I also have an article on SQLCAT that should help you get started on setting public symbol paths.

Suffice to say that I got hold of the sqlservr.pdb file (the publicly available symbols) and had a look at the call stacks that leads to SOS_OBJECT_STORE spins:

SpinlockBase::Sleep
LockManager::GetLockBlocks
lck_lockInternal
GetLock
PageContext::AcquireLock
IndexDataSetSession::AcquireLocksForInsert
IndexDataSetSession::InsertSmallRecord
IndexDataSetSession::InsertRowInternal
DatasetSession::InsertRow
RowsetNewSS::InsertRow

Aha! So this is related to the lock manager acquiring a lock on a page. Now, you may then ask: how can we influence this, surely we cannot control the way locks are acquired.

Well, as a matter of fact, we DO have a knob that gives us a tiny bit of influence. How about building the index like this:

CREATE CLUSTERED INDEX MyBigTable_cl
ON dbo.MyBigTable (c1)
WITH (ALLOW_PAGE_LOCKS = OFF)

That should get rid of one level of the lock hierarchy (ROW/PAGE/TABLE), restricting us to either table level locks or row locks. Since we are playing OLTP system here – who needs page locks anyway? Total leftover from old times Smile … (I am only half kidding here)

Interestingly, this was just what was needed, the SOS_OBJECT_STORE spin is now gone:

But throughput has not changed at all. This is not surprising, given the much larger amount of spins on LOGCACHE_ACCES. But we learned something new: Disabling PAGE level locks can save some CPU cycles by eliminating some of the code paths – we can speculate that this might lead to increased throughput once other bottlenecks are gone.

At this time, before I am moving to a faster transaction log, these are my waits:

Notice the high SOS_SCHEDULER_YIELD up there right after the WRITELOG? I have a feeling those spins are not yet done tormenting us….

Secret Scale Sauce : Heavy Singleton INSERT spinlocks

May 28, 2011 6 comments

Paul Randal recently ran a very interesting test on a simple INSERT workload. His results were promising and provide both good conclusions and more suggestions for research.

You should read Paul’s blog post first, because if you have not, this blog will not make much sense.

Thomas Kejser's Database Blog

Archive

Synchronisation in .NET– Part 4: Partitioned Data Structures

Synchronisation in .NET– Part 3: Spin Locks and Interlocks/Atomics

My First Course is now Available

Thread Synchronization in SQL Server

Latch and Spinlock Papers Published on Microsoft

Diagnosing and fixing SOS_OBJECT_STORE spins for Singleton INSERTS

Secret Scale Sauce : Heavy Singleton INSERT spinlocks

Categories