Home > SQL Server, Utilities > Running Many Batch Statements in Parallel

Running Many Batch Statements in Parallel

imageWhen designing highly scalable architectures for modern machines, you will often need to do some form of manual parallelism control. Managing this is not always easy, but in this blog I will give you one piece of my toolbox to help you.

Let us walk through an example together, a tiny case study. This is a problem which many of you will be familiar with.

Let us say you have 16 files that you want to load into the same table in your database in an automated manner. The naïve approach will do something like this:

BULK INSERT MyTarget FROM ‘C:\temp\MyFile1’ WITH
  (FIELDTERMINATOR = ‘;’, ROWTERMINATOR = ‘\n’)

BULK INSERT MyTarget FROM ‘C:\temp\MyFile2’ WITH
  (FIELDTERMINATOR = ‘;’, ROWTERMINATOR = ‘\n’)

… etc…

BULK INSERT MyTarget FROM ‘C:\temp\MyFile16’ WITH
  (FIELDTERMINATOR = ‘;’, ROWTERMINATOR = ‘\n’)

Now here is the problem with this approach: it executes one statement at a time. Sequential execution is BAD, you need to stop thinking about the world like that if you want to scale on a modern architecture.

Lets assume we have enough hardware resources (in this case, it would take a blade server and a decent I/O system). What we really want is to run every one of these statements in parallel. Unfortunately, SQL server does not have a command to start up new connection from inside T-SQL… what to do?

Getting to the Command Line

Because you cannot execute more than one command on a single connection at a time, we will need multiple connections to SQL Server and this mean we have to go back to the command line. Let us start by creating a little batch file Worker.Cmd with this content:

REM Worker.Cmd File

CALL SQLCMD –S.\MyServer –q”BULK INSERT MyTarget FROM ‘C:\Temp\MyFile%1’ …EXIT

This allows us to invoke a bulk load for the first file by executing: Worker.Cmd 1

Unfortunately, we still cannot start multiple connections without manually firing up a lot of command prompts. The coders in the audience may at this point reach for their favorites programming language to write a little utility that can spawn multiple copies of an executable.

However, there is a problem with such a home made executable: you cannot generally rely on a server having the necessary runtime libraries. Typical comments might be:

 “No, we don’t have .NET 4.0 here, this is not yet certified by our infrastructure department. Could you recompile it for 1.1 please?”.

“Power Shell is much too fancy for us, what is wrong with running this on Windows 2000?”

Perhaps this customer is just skeptical about letting you run your executable on a server. This may sound silly, but I have seen this happen too many times to make assumptions.

Start to the Rescue

There is a very nice little utility for the good old command prompt that allows you to fire up new processes: START.EXE. This comes with all versions of Windows and it takes any command line executable as input, fires it up in a new thread and returns control back to the caller.

Using start.exe, we can write batch script that fire up multiple copies of the same executable. It looks like this:

REM SpawnMany.cmd

REM Author: Thomas Kejser
REM Purpose: Spawns many copies of the same executable. Useful for running many things in parallel

@ECHO OFF
ECHO Spawning %2 copies of %1

FOR /L %%i IN (1, 1, %2) DO (
    ECHO Spawning thread %%i
    START "Worker%%i" /Min %1 %%i %2
)

Each new process is started in a minimized window and we pass the thread number and the total number of threads to it. Using this little batch script, we can now do this:

  SpawnMany.exe Worker.Exe 16

This starts 16 workers, each with their own thread number assigned. Very useful for running stuff in parallel in a quick and dirty way. For example, I use this to run the TPC-H data generator dbgen.exe highly parallelized.

Notice that I added the EXIT command at the end of the worker.cmd batch. This makes sure that the window closes itself when done executing.

Summary

In this blog, I have shown you how to write a little batch script to fire up multiple threads, from the command line. each doing their own work in parallel. The script is “zero dependency” which makes it ideal for server use and for hacking together quick and dirty parallelism for test scenarios.

I mentioned that SQL server does not have a way to start up new connections from T-SQL. This is not strictly true. Sorry for leading you astray, but I wanted you to see how to do this from the command line first (and go through the pain Devil ). There is a way to hack SQL Server and implement a stored procedure I like to call sp_executesql_async. This will be the subject of a future blog, but since I am heading into a lab for a few weeks, you just have to wait for it.

More about Warehousing at: DW and Big Data

Advertisements
  1. lucazav
    February 2, 2012 at 14:54

    Really a useful post! I’ll implement this technique in a BI project I’m working to.
    Thank you for sharing that Thomas!

  2. Martin Hansen
    February 25, 2013 at 14:21

    So what is the hack to implement sp_executesql_async?

    The immediate possibilities that I see are:
    – Dynamical create an SQL agent job pr. task that needs to be executed
    – Use the service broker – create a job queue and have a number of processes monitoring that queue for tasks that are to be executed

    Or do you have something else in mind?

    /Martin

  3. Martin Hansen
    February 25, 2013 at 15:36

    Just thought of one more option: Use CLR to spawn the process used to execute the async call

    /Martin

    • Thomas Kejser
      March 2, 2013 at 14:04

      Hi Martin

      Sorry to reply late, I have been travelling.

      The trick is to user service broker endpoint that points back to localhost and set up subscriptions to the queue. I hope to publish the code soon, once I get time to write up a blog entry about it. In the meantime, if you want a copy: Just send me an email.

      • Martin Hansen
        March 4, 2013 at 22:08

        It is fine with me that you just give your basic idea – I’ll most likely learn a lot more by wrting my solution from scratch 🙂

        /Martin

      • Thomas Kejser
        March 5, 2013 at 11:29

        Let me know if you want the source as you play with it. Would be interesting to compare solutions

        (on a mobile device. Spelling mistakes and brevity may happen)

        On 4 Mar 2013, at 22:08, “Thomas Kejser’s Database Blog”

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s