Programming | Thomas Kejser's Database Blog

Synchronisation in .NET– Part 4: Partitioned Data Structures

January 5, 2014 5 comments

In this final instalment of the synchronisation series, we will look at fully scalable solutions to the problem first stated in Part 1 – adding monitoring that is scalable and minimally intrusive.

Thus far, we have seen how there is an upper limit on how fast you can access cache lines shared between multiple cores. We have tried different synchronisation primitives to get the best possible scale.

Throughput this series, Henk van der Valk has generously lent me his 4 socket machine and been my trusted lab manager and reviewer. Without his help, this blog series would not have been possible.

And now, as is tradition, we are going to show you how to make this thing scale.

Synchronisation in .NET– Part 3: Spin Locks and Interlocks/Atomics

January 4, 2014 2 comments

In the previous instalments (Part 1 and Part 2) of this series, we have drawn some conclusions about both .NET itself and CPU architectures. Here is what we know so far:

When there is contention on a single cache line, the lock() method scales very poorly and you get negative scale the moment you leave a single CPU core.
The scale takes a further dip once you leave a single CPU socket
Even when we remove the lock() and do thread unsafe operations, scalability is still poor
Going from a class to a padded struct gives a scale boost, though not enough to get linear scale
The maximum theoretical scale we can get with the current technique is around 90K operations/ms.

In this blog entry, I will explore other synchronisation primitives to make the implementation safe again, namely the spinlock and Interlocks. As a reminder, we are still running the test on a 4 socket machine with 8 cores on each socket with hyper threading enabled (for a total of 16 logical cores on each socket).

Synchronisation in .NET– Part 2: Unsafe Data Structures and Padding

December 27, 2013 9 comments

In the previous blog post we saw how the lock() statement in .NET scales very poorly when there is a contention on a data structure. It was clear that a performance logging framework that relies on an array with a lock on each member to store data will not scale.

Today, we will try to quantify just how much performance we should expect to get from the data structure if we somehow solve locking. We will also see how the underlying hardware primitives bubble up through the .NET framework and break the pretty object oriented abstraction you might be used to.

Because we have already proven that ConcurrentDictionary adds to much overhead, we will focus on arrays as the backing store for the data structure in all future implementations.

Synchronisation in .NET– Part 1: lock(), Dictionaries and Arrays

December 25, 2013 10 comments

As part of our tuning efforts at Livedrive, I ran into a deceptively simple problem that beautifully illustrates some of the scale principles I have been teaching to the SQL Server community for years.

In this series of blog entries, I will use what appears to be a simple .NET class to explore how modern CPU architectures handle high speed synchronisation. In the first part of the series, I set the stage and explore the .NET lock() method of coordinating data.

Thomas Kejser's Database Blog

Archive

Synchronisation in .NET– Part 4: Partitioned Data Structures

Synchronisation in .NET– Part 3: Spin Locks and Interlocks/Atomics

Synchronisation in .NET– Part 2: Unsafe Data Structures and Padding

Synchronisation in .NET– Part 1: lock(), Dictionaries and Arrays

Categories