Posts Tagged ‘Tuning’

Bottleneck Diagnosis on SQL Server – New Scripts

April 11, 2013 21 comments

Finally, I have found some time with my good colleagues at Fusion-io to work on some of my old SQL Scripts.

Our first script queries the servers for wait stats – a very common task for nearly all SQL Server DBAs. This is the first thing I do when I meet a new SQL Server installation and I have generally found that I apply the same filters over and over again. Because of this, I worked with Sumeet Bansal to standardise our approach to this.

You can watch Sumeet introduce the script in this YouTube video: A TV star is born!

We have already used this script at several customers in a before/after Fusion-io situation. As you may know, the tuning game changes a lot when you remove the I/O bottleneck from the server.

Based on our experiences so far, I wanted to share some more exotic waits and latches we have been seeing lately.

Read more…

VM-ware Shared folders are really Slow

December 12, 2012 1 comment

I am currently waiting for some code to compile and found a bit of time to type up a quick blog.

As described in a previous post, I am have set up my laptop to host Windows in a virtual machine guest with Mac OX (Mountain Lion) as the host.

Today, I needed to compile some code and thinking I was being clever, I put the code on the host OS (OSX) and shared the folder via VM-ware to the Windows guest OS.

Compiling from inside VM-ware

I ran my msbuild process from the shared folder, and it took FOREVER to compile. The obvious choice here is of course to blame virtualisation itself – after all, I only have 2 cores allocated to the virtual.

But not so fast! Our tuning knowledge comes in quite handy here. Have a look at the CPU pattern while I am compiling:


Just like with SQL Server, I start my tuning at a very high level (in this case, task manager) and dig in from there.

The first question we ask ourselves as tuners is: Does what I see make sense?

In this case, it obviously doesn’t. MSBUILD is set up to build highly parallelised, it should be using my cores and there are no obvious I/O bottleneck in the system. Having 50% of two cores busy (and with high kernel times) looks a lot like a single threaded bottleneck to me. The build was taking over 15 minutes, which was much longer than expected.

Diagnosing the Problem – our friend Xperf

Normally, I use xperf to troubleshoot servers. But it sure comes in handy for misbehaving client machines too.

Task manager only shows that the time is spend in the process d.exe – which is part of the build process. Is the compiler bad or must we look elsewhere? Sure would be surprising if the compiler used all the kernel time wouldn’t it?

Here is the quick and dirty CPU “zoom in” xperf command to get the details we need:

  • xperf –on latency –stackwalk profile
  • …wait a bit
  • xperf –d <myfile>.etl

This captures a sample of the stack and CPU usage of each process and kernel module. From here, it is quite clear what is going on – let me walk you through the analysis.

First, open the trace with xperfview. I recommend staying with the Win7 version of xperfview, as the Win8 interface is… well… a Win8 interface.

Pick the CPU Sampling per CPU, right click and choose Summary Table:


From here, pick the columns: Module, CPU and % Weight which allows you to summarise by module. On my box, it looks like this:


Aha!… Most of the CPU burn goes in vmci.sys (just ignore intelppm.sys). This isn’t a part of Windows. Its relatively easy to trace this file back to VM-ware.

So, who calls into this kernel module? Adding the stack column after the module, we can see that too:


Eureka: It is file system access that is causing the slowdown. See the call stack? Starts from GetFileAttributesW and ends up inside vmci.sys.

Fixing the problem

Now, before you go ahead and conclude that VMware adds a horrible overhead to I/O, lets just try to move the source files into the guest OS itself. Recall that my machine is using VM-ware shared folders to access the source code. It might simply be the sharing framework that is acting strange…

The results of using the guest OS’s file system is staggering. Running the build process now looks like this at the CPU level:


And the total build time is down from over 15 minutes to less than 3 minutes.

Thank you xperf…








At this point, we are reduced to guessing what is going on

Configuring Kernel Debugging with WinDbg and a NULL modem

December 5, 2012 6 comments

imageLately, I have been digging deep into Windows to get really low level with the the I/O path SQL Server takes (yep, there is an even deeper layer to understand fully).

Once you start playing around with the Windows Kernel, you will at some point need kernel level debugging set up. Traditionally, this is something I have used a Windows machine for, and even there, it can be painful to get working. As you may know, I have switched to Mac as my client machine and my Windows utilities (including WinDbg) now run in VMWare Fusion – it looked like I was heading into an interop nightmare…

Windows 8 allows kernel debugging directly over the network card, but how does one configure kernel debugging with a Windows 2008R2 target from a Mac with VM Ware?

I found a very cheap solution today. You need:

  • A USB to Serial (RS-232) converter
  • A NULL modem (I would recommend getting a long one, so you don’t have to sit next to the server)
  • A  target server that you want to debug
  • A serial port on the target server
  • WinDbg from the Windows SDK in a virtual machine on the Mac
  • Symbols set up as per my previous post

A USB converter and a NULL modem is a dirt cheap way to get the required hardware for kernel debugging. I sourced my cables from Maplins (Thanks @SQLServerMonkey) in the UK – for a total of around 20 GBP.

Step 1: Prepare the Client/Debugger

A Macbook air, like most other lightweight laptops, does not have a serial port. So, we have to use a USB/Serial converter. It is possible to debug directly over USB, but good luck with that from a Mac, I didn’t have the courage.

Make sure VMWare routes the USB device to the Windows Guest OS and not the Mac. It will look something like this in VMWare Fusion 5:


Now, install WinDbg and set up your symbol paths if they are not set up already.

Check your Device Manager in the client to see which COM port the USD device created. As you can see below, my laptop mounted the USB/Serial converter as COM3 



After installing the device, I had to restart my virtual machine before VMWare would let me mount it – but what can you expect from a 10 GBP component? Your mileage may vary depending on the serial/USB driver you have.

Step 2: Connect Client and Server

Using the NULL modem, connect the USB/Serial converter to the server’s Serial port.

Make sure the server has the serial port enabled in the BIOS (my Dell box had it disabled, had to re-enable).

Step 3: Configure Server/Debugee for kernel debugging

Log into the server, start a command line as administrator

First, copy the default startup options into a new boot option:

  • BCDEDIT /copy {current} /d DebugMode

This will create a new entry in the boot list when the server starts. Make a note of the GUID returned or copy it to the clipboard (using the ever so annoying copy/paste function in the Windows command prompt)

Next, configure the parameters for serial cable debugging:

  • BCDEDIT /set <GUID> debugport <port #>
  • BCDEDIT /set <GUID> debugtype serial
  • BCDEDIT /set <GUID> baudrate 115200

Replace the <GUID> the guid returned previously. Set <port #> to the COM port the NULL modem is connected to in your server (NOT the COM port of the client). For example, if the server has the NULL modem in COM2, set <port #> to 2.

Finally, enable debugging for on the newly created boot option:

  • BCDEDIT /debug <GUID> ON

Validate that your configuration works by running:

  • BCDEDIT /v.

You should get an output somewhat like this (this is also how you find the GUID if you didn’t note it down before):


If you want to make sure debugging is always turned on (great for a sandbox machines where you explore stuff in the kernel) you can use BCDEDIT /default <GUID> to make debugging the default startup option.

Step 3: Start Debugging

You are now ready to start debugging the windows kernel on the server. Here is how:

On the client, start WinDbg and choose File—>Kernel Debug (or CTRL+K) and set up the com port you got in step 1:

. image

Press OK, and reboot the server. If you didn’t select the debug configuration as the default boot option, make sure you pick it when the server starts.

If you have done things right, you will get something like this in WinDbg (below, I broke execution with CTRL+C)


One thing to note when you are debugging the kernel: Not all your typical WinDbg commands work as they normally do (for some good reasons), but that is outside of scope for this blog entry.

Time to dig in even deeper… my Macbook Air to a Windows box :-)…Happy hacking everyone.

The Effect of CPU Caches and Memory Access Patterns

June 14, 2012 15 comments

In this blog, I will provide you with some “Grade of the Steel” background information that will help you understand CPU caches and their effects better.

As we move towards a future where power consumption in the server room begins to make a big dent in the balance sheet – it becomes important for programmers to fully understand hardware and make the best possible use of it, instead of floating around in the leaky abstractions of the virtual systems. Even when we take the old enemy of I/O out of the equation, there are other bottlenecks we need to worry about: DRAM access time being one of the most important.

Read more…

Setting Yourself up for Debugging

March 14, 2012 5 comments

Lately, I have been doing a fair amount of debugging, and helped other people set up their debuggers.

It can be very painful to make this work. To assist up and coming debuggers, I put my notes together in this blog. Here is how to get things going.

Read more…

Latch and Spinlock Papers Published on Microsoft

July 1, 2011 Leave a comment

I am happy to announce that my team mates, Ewan Fairweather and Mike Ruthruff have published two excellent whitepapers on latch and spinlock diagnosis. You can find them here:

Pay special attention to the spinlock scripts, you will find an interesting trace flag in there that sheds more light on my intentionally vague dodging of the call stack subject in my blog entry here. I am sorry that I did not provide more details in the blog, but I did not want to give the plot away Smile

I am very proud to have Ewan take over the OLTP tuning on the SQLCAT  team here in EMEA. Buy this guy a drink next time you meet him at a conference and have a chat with him about scalability.

Factors to consider when performance tuning cubes and MDX

August 7, 2006 1 comment
Before we move on the the wonders of subselects I would like to give you a few pointers on tuning MDX queries and cubes in general. During my time as a performance tuner on Analysis Services – these are the most important factors you should consider in a 2005 environment:

Read more…