Home > SQL Server > Doing FIRST and LAST aggregates in SQL Server 2005

Doing FIRST and LAST aggregates in SQL Server 2005

Users of Microsoft Access may be familiar with the aggregation functions FIRST and LAST. Basically, what you want from these aggregates is to scan the tables in a sorted order. The first or last value encountered in each group is send to the output – much like the existing SQL Server MIN/MAX function.

Let me illustrate with some pseudo code:

 

SELECT Col

  , FIRST(Value) AS FirstV

  , LAST(Value) AS LastV

  , SUM(Value) AS SumV

FROM Table

GROUP BY Col

On a table with these rows:

Col Value
A 3
A 2
A 6
B 5
B 1

…You want this output:

Col FirstV LastV SumV
A 3 6 11
B 5 1 12

Notice how there in an implicit assumption of a row ordering. You probably want some sort of identity column to order your rows. Implicitly – this is a very ISAM way of looking at the world. Now, how do we do this inside SQL Server?

 

Lets make some test data based on AdventureWorksDW:

SELECT

s.*, IDENTITY(int, 1,1) AS ID

INTO

#Agg

FROM

dbo.FactInternetSales s

CROSS

JOIN (SELECT TOP 30 * FROM sys.objects) scaleUpFactor

 

CREATE

UNIQUE INDEX IX_FirstLast ON #Agg (ProductKey, ID, OrderQuantity)

 

Well… My colleage Jesper Rasmussion came up with a brilliant answer (try this on the testdata from above):

 

SELECT

I.ProductKey

  , F.OrderQuantity AS FirstQuantity

  , L.OrderQuantity AS LastQuantity

  , I.SumQuantity AS SumQuantity

FROM

  (SELECT ProductKey

   , min(ID) AS FirstID

   , max(ID) AS LastID

   , sum(OrderQuantity) AS SumQuantity

   FROM #Agg

   GROUP BY ProductKey) I

INNER

JOIN #Agg F ON F.ID = I.FirstID AND F.ProductKey = I.ProductKey

INNER

JOIN #Agg L ON L.ID = I.LastID AND L.ProductKey = I.ProductKey

 

This query has some very interesting properties. If you check the executions statistics you will see that SQL Server only does very few I/O requests in the index created (not more than 10-20% more than it takes for a full scan – probably around log(size(#agg) ). Since all rows HAVE to be visited to answer the requiest – this means that the query is very efficient.

 

Now, if only we could make a first and last aggregate that does just one table scan… 🙂

 

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s