Home > BigData, Data Warehouse, Grade of the Steel, Modeling, OLTP > Reading Material: Abstractions, Virtualisation and Cloud

Reading Material: Abstractions, Virtualisation and Cloud

Two SocketsWhen speaking at conferences, I often get asked questions about virtualization and how fast databases will run on it (and even if they are “supported” on virtualised systems).  This is complex question to answer. Because it requires a very deep understanding of CPU caches, memory and I/O systems to fully describe the tradeoffs.

Let us first look at the political reasons for virtualising: Operation teams, for very good reasons, often try to push developers towards virtualised systems – cloud is just the latest in this ongoing trend. They try to provide an abstraction between application code and the nasty, physical logistics of data centers – making their job easier. The methods of the operation teams employ take many forms: VLAN, SAN, Private clouds and VMWare/HyperV to quote a few examples. Virtualising will increase their flexibility – and drive down their cost per machine, which looks great in the balance sheet. However, this is flexibility comes at a very high cost. It has been said that:

 

All non-trivial abstractions, to some degree, are leaky

Joel Spolsky

In the case of virtualisation, the abstraction provided is very non-trivial indeed and the leaking is sometimes equally extreme. Traditionally, the issue with virtualisation has been slowdown of I/O or network – though this has gotten a lot better with hardware support for virtual hosts (though SAN still haunts us). Over provisioned memory is another good example of virtualisation wrecking havoc with performance. All of these seems to be surmountable though and this is driving cloud forward.

However, lately it is becoming increasingly clear that scheduling, NUMA and L2/L3 cache misses are potentially an even larger problem and one that will surface once you take I/O out of the bottleneck club.

As we grow our data centers to cloud massive scale and pay for compute power by the hour, every machine counts and will figure in the balance sheet. It should also be clear that a important optimisation will be to focus on the performance on individual scale nodes – to make the best use of the expensive power.

This morning, I ran into some fascinating research in this area (Barret Rhoden, Kevin Klues, David Zhu, Eric Brewer) who take this to another level:

Improving Per-Node Efficiency in the Datacenter with New OS Abstractions” (pdf)

To whet your appetite, here is a quote from the abstract (my highlight).

“We believe datacenters can benefit from more focus on per-node efficiency, performance, and predictability, versus the more common focus so far on scalability to a large number of nodes. Improving per-node efficiency decreases costs and fault recovery because fewer nodes are required for the same amount of work. We believe that the use of complex, general-purpose operating systems is a key contributing factor to these inefficiencies.”

A highly recommend read and a good primer on some of the things that concern me a lot these days.

Kejser’s Law

I think it is time for me to state my own law (or trivial insight if you will) of computing. Though I stand here at the shoulders of giants, I will steal a bit of the fame. I think it is appropriate that I state one of the things I aim to show people at conferences:

 

“Any shared resource in a non-trivial scale workload, will eventually bottleneck”

Advertisements
  1. May 2, 2012 at 13:21

    Hi Thomas,

    Thanks for another very good blog post.

    On the subject of leaky abstractions, to get the maximum value out of your underlying hardware, particularly with things such as the SQL 2012 optimizer batch mode, it helps if you can turn hyper threading off, so that L2 cache lines are not split between threads. I would presume that most cloud vendors have hyper threading turned on. Another very interesting point concerns cloud vendors that provision their services as PaaS as opposed to IaaS. Specifically, if the services you have bought jump from one server to another, your end users may get the perception that your software / service / platform has slowed down due to the server processors across the vendors data centres being of varying specs.

    Having exclusive use of L2 shared caches and multi tennancy environments is yet another point, along with context switching and at a guess read aheads performed by the storage subsystem.

    Regards,

    Chris

    • Thomas Kejser
      May 2, 2012 at 15:13

      Hi Chris

      With regards to hyperthreading: I dont think turning it off should be a blanket guidance. I very much think the verdict is still out on that. Hyperthreads share a common L2 cache and if you can co-locate work that need the same cache lines there, it can provide a large benefit. Of course, this requires both the operating system, the database and more importantly, the code you run on it to be HT aware – this is a HARD problem.

      Readahead is indeed another issue. By sharing, you can quickly turn a nice sequential workload into a fully random one. But perhaps this will matter less in these NAND days. One could even argue that Readahead at the I/O level can lead to performance degradation in some scenarios (since you may be fetching things you dont need). There is an interesting, structurally similar, concern about memory prefetching to think about too when sharing

      • May 2, 2012 at 15:24

        Ah !, I’ve posted twice because I thought my initial comment had been lost. Regarding the splitting of the L2 cache, my understanding was that because the optimizer’s batch mode uses this to cache data being passed between execution plan tasks, it was a good idea not to split this. I’m not saying this should be blanket advice to used across the board.

        There certainly seems to be a lot of issues around shared tenancy cloud infrastructures that a lot of people may not be aware of up front.

  2. May 2, 2012 at 14:10

    Hi Thomas,

    This is another very interesting blog post. The issue of abstraction comes to the fore with things around exclusive use of the L2 cache in shared tenancy environments, whether hyper threading is turned on, context switching and you your services jumping from one server to another, the point being that the processors specs are unlikely to be the same across all the vendors data centres.

    Regards,

    Chris

    • Thomas Kejser
      May 2, 2012 at 15:17

      Chris, it is an interesting angle to tihnk of what it means that data center vendors have different machine specs. It begs the question: Should the Cloud fabric expose the underlying architecture (so your code can react accordingly) or hide the CPU architecture from you. I dont think the “abstraction at all cost” is always a good idea and this raises some interesting debates on proper PaaS and SaaS strategies on top of cloud infrastructure.

      The big question becomes: WHICH abstraction will you host and how leaky will you make it (because it WILL leak).

  3. May 2, 2012 at 15:49

    As chip makes hit the limits of the Moore’s law curve, they may come up with more exotic architectures to squeeze the last few drops of juice out of it, in which case the disparity between different processors in the data centre may become more pronounced. In the here and now AMD and Intel have completely contrasting approaches to threading with their module and hyper threading based designs.

    • Thomas Kejser
      May 2, 2012 at 19:13

      That is indeed a worry (or opportunity, depending on how you see it). It is a very open problem with a lot of variables, something I look forward to following closely in the future.

  1. May 6, 2012 at 15:34
  2. May 6, 2012 at 16:14

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s