Archive for the ‘Utilities’ Category

VM-ware Shared folders are really Slow

December 12, 2012 1 comment

I am currently waiting for some code to compile and found a bit of time to type up a quick blog.

As described in a previous post, I am have set up my laptop to host Windows in a virtual machine guest with Mac OX (Mountain Lion) as the host.

Today, I needed to compile some code and thinking I was being clever, I put the code on the host OS (OSX) and shared the folder via VM-ware to the Windows guest OS.

Compiling from inside VM-ware

I ran my msbuild process from the shared folder, and it took FOREVER to compile. The obvious choice here is of course to blame virtualisation itself – after all, I only have 2 cores allocated to the virtual.

But not so fast! Our tuning knowledge comes in quite handy here. Have a look at the CPU pattern while I am compiling:


Just like with SQL Server, I start my tuning at a very high level (in this case, task manager) and dig in from there.

The first question we ask ourselves as tuners is: Does what I see make sense?

In this case, it obviously doesn’t. MSBUILD is set up to build highly parallelised, it should be using my cores and there are no obvious I/O bottleneck in the system. Having 50% of two cores busy (and with high kernel times) looks a lot like a single threaded bottleneck to me. The build was taking over 15 minutes, which was much longer than expected.

Diagnosing the Problem – our friend Xperf

Normally, I use xperf to troubleshoot servers. But it sure comes in handy for misbehaving client machines too.

Task manager only shows that the time is spend in the process d.exe – which is part of the build process. Is the compiler bad or must we look elsewhere? Sure would be surprising if the compiler used all the kernel time wouldn’t it?

Here is the quick and dirty CPU “zoom in” xperf command to get the details we need:

  • xperf –on latency –stackwalk profile
  • …wait a bit
  • xperf –d <myfile>.etl

This captures a sample of the stack and CPU usage of each process and kernel module. From here, it is quite clear what is going on – let me walk you through the analysis.

First, open the trace with xperfview. I recommend staying with the Win7 version of xperfview, as the Win8 interface is… well… a Win8 interface.

Pick the CPU Sampling per CPU, right click and choose Summary Table:


From here, pick the columns: Module, CPU and % Weight which allows you to summarise by module. On my box, it looks like this:


Aha!… Most of the CPU burn goes in vmci.sys (just ignore intelppm.sys). This isn’t a part of Windows. Its relatively easy to trace this file back to VM-ware.

So, who calls into this kernel module? Adding the stack column after the module, we can see that too:


Eureka: It is file system access that is causing the slowdown. See the call stack? Starts from GetFileAttributesW and ends up inside vmci.sys.

Fixing the problem

Now, before you go ahead and conclude that VMware adds a horrible overhead to I/O, lets just try to move the source files into the guest OS itself. Recall that my machine is using VM-ware shared folders to access the source code. It might simply be the sharing framework that is acting strange…

The results of using the guest OS’s file system is staggering. Running the build process now looks like this at the CPU level:


And the total build time is down from over 15 minutes to less than 3 minutes.

Thank you xperf…








At this point, we are reduced to guessing what is going on

Configuring Kernel Debugging with WinDbg and a NULL modem

December 5, 2012 6 comments

imageLately, I have been digging deep into Windows to get really low level with the the I/O path SQL Server takes (yep, there is an even deeper layer to understand fully).

Once you start playing around with the Windows Kernel, you will at some point need kernel level debugging set up. Traditionally, this is something I have used a Windows machine for, and even there, it can be painful to get working. As you may know, I have switched to Mac as my client machine and my Windows utilities (including WinDbg) now run in VMWare Fusion – it looked like I was heading into an interop nightmare…

Windows 8 allows kernel debugging directly over the network card, but how does one configure kernel debugging with a Windows 2008R2 target from a Mac with VM Ware?

I found a very cheap solution today. You need:

  • A USB to Serial (RS-232) converter
  • A NULL modem (I would recommend getting a long one, so you don’t have to sit next to the server)
  • A  target server that you want to debug
  • A serial port on the target server
  • WinDbg from the Windows SDK in a virtual machine on the Mac
  • Symbols set up as per my previous post

A USB converter and a NULL modem is a dirt cheap way to get the required hardware for kernel debugging. I sourced my cables from Maplins (Thanks @SQLServerMonkey) in the UK – for a total of around 20 GBP.

Step 1: Prepare the Client/Debugger

A Macbook air, like most other lightweight laptops, does not have a serial port. So, we have to use a USB/Serial converter. It is possible to debug directly over USB, but good luck with that from a Mac, I didn’t have the courage.

Make sure VMWare routes the USB device to the Windows Guest OS and not the Mac. It will look something like this in VMWare Fusion 5:


Now, install WinDbg and set up your symbol paths if they are not set up already.

Check your Device Manager in the client to see which COM port the USD device created. As you can see below, my laptop mounted the USB/Serial converter as COM3 



After installing the device, I had to restart my virtual machine before VMWare would let me mount it – but what can you expect from a 10 GBP component? Your mileage may vary depending on the serial/USB driver you have.

Step 2: Connect Client and Server

Using the NULL modem, connect the USB/Serial converter to the server’s Serial port.

Make sure the server has the serial port enabled in the BIOS (my Dell box had it disabled, had to re-enable).

Step 3: Configure Server/Debugee for kernel debugging

Log into the server, start a command line as administrator

First, copy the default startup options into a new boot option:

  • BCDEDIT /copy {current} /d DebugMode

This will create a new entry in the boot list when the server starts. Make a note of the GUID returned or copy it to the clipboard (using the ever so annoying copy/paste function in the Windows command prompt)

Next, configure the parameters for serial cable debugging:

  • BCDEDIT /set <GUID> debugport <port #>
  • BCDEDIT /set <GUID> debugtype serial
  • BCDEDIT /set <GUID> baudrate 115200

Replace the <GUID> the guid returned previously. Set <port #> to the COM port the NULL modem is connected to in your server (NOT the COM port of the client). For example, if the server has the NULL modem in COM2, set <port #> to 2.

Finally, enable debugging for on the newly created boot option:

  • BCDEDIT /debug <GUID> ON

Validate that your configuration works by running:

  • BCDEDIT /v.

You should get an output somewhat like this (this is also how you find the GUID if you didn’t note it down before):


If you want to make sure debugging is always turned on (great for a sandbox machines where you explore stuff in the kernel) you can use BCDEDIT /default <GUID> to make debugging the default startup option.

Step 3: Start Debugging

You are now ready to start debugging the windows kernel on the server. Here is how:

On the client, start WinDbg and choose File—>Kernel Debug (or CTRL+K) and set up the com port you got in step 1:

. image

Press OK, and reboot the server. If you didn’t select the debug configuration as the default boot option, make sure you pick it when the server starts.

If you have done things right, you will get something like this in WinDbg (below, I broke execution with CTRL+C)


One thing to note when you are debugging the kernel: Not all your typical WinDbg commands work as they normally do (for some good reasons), but that is outside of scope for this blog entry.

Time to dig in even deeper… my Macbook Air to a Windows box :-)…Happy hacking everyone.

What is the Best Sort Order for a Column Store?

July 27, 2012 6 comments

As hinted at in my post about how column stores work, the compression you achieve will depend on how many repetitions (or “runs”) exist in the data for the sort order that is used when the index is built. In this post, I will provide you more background on the effect of sort order on compression and give you some heuristics you can use to save space and increase speed of column stores. This is a theory that I am still only developing, but the early results are promising.

Read more…

How to Properly Wipe and Return a Laptop

June 19, 2012 3 comments

imageThis week, I had the chance to experience something that is a rare occurrence: returning a laptop to my employer. Microsoft has been good to me, and I wanted to make sure I properly returned the laptop to them and saved both them and me from hassle.

In this blog, I will describe a process for properly wiping a laptop of data and making it ready for a new owner. You may want to use this advice if you plan to sell an old machine to a friend too.

Read more…

Setting Yourself up for Debugging

March 14, 2012 5 comments

Lately, I have been doing a fair amount of debugging, and helped other people set up their debuggers.

It can be very painful to make this work. To assist up and coming debuggers, I put my notes together in this blog. Here is how to get things going.

Read more…

Running Many Batch Statements in Parallel

January 8, 2012 6 comments

imageWhen designing highly scalable architectures for modern machines, you will often need to do some form of manual parallelism control. Managing this is not always easy, but in this blog I will give you one piece of my toolbox to help you.

Let us walk through an example together, a tiny case study. This is a problem which many of you will be familiar with.

Read more…

Implementing MurmurHash and CRC for SQLCLR

December 7, 2011 2 comments

As we saw in my previous post, the build in hash functions of SQL Server were either expensive with good distribution, or cheap, but with poor distribution. As a breath of fresh air, let us look at a useful magic quadrant:



Read more…