Statistics | Thomas Kejser's Database Blog

What is the Best Sort Order for a Column Store?

July 27, 2012 6 comments

As hinted at in my post about how column stores work, the compression you achieve will depend on how many repetitions (or “runs”) exist in the data for the sort order that is used when the index is built. In this post, I will provide you more background on the effect of sort order on compression and give you some heuristics you can use to save space and increase speed of column stores. This is a theory that I am still only developing, but the early results are promising.

Why “Date BETWEEN FromDate AND ToDate” is a dangerous join criteria

June 29, 2012 10 comments

I have been meaning to write this blog post for some time and the discussion about Data Vault finally prompted me to do it.

Sometimes, you find yourself in situations where you have to join a table that has a structure like this:

CREATE TABLE TemporalTracking (
SomeKey INT
, FromDate DATETIME
, ToDate DATEIME
, <more columns>
)

The join criteria is expressed:

FROM <OtherTable> OT
INNER JOIN TemporalTracking T
ON OT.SomeTimeColumn BETWEEN T.FromDate AND T.ToDate
AND OT.SomeKey = T.SomeKey

Or more commonly, this variant with a semi open interval:

FROM <OtherTable> OT
INNER JOIN TemporalTracking T
ON OT.SomeTimeColumn >= T.FromDate
AND OT.SomeTimeColumn < T.ToDate
AND OT.SomeKey = T.SomeKey

Data models that promote these types of joins are very dangerous to relational optimizers and you have to step carefully when executing queries with many of these joins. Let us have a look at why this is so.

When Statistics are not Enough – Search Patterns

February 21, 2012 4 comments

Co-author: Lasse Nedergaard

Yesterday, Lasse ran into an issues with a query pattern in the large database that he is responsible for. Based on our conversation, we wrote up this blog and created a repro.

The troublesome query we were debugging executed like this:

Find a list of keys values to search for
Insert these keys in a temp table – lets call this the SearchFor table
Join the temp table to a large table (lets call it BigTable) and retrieve the full row from the large table

Why not use a correlated sub query in step 2? In this case, the customer in question had multiple code paths (including one accepting XML queries) that all needed to pass thousands of key to a final search procedure. They wanted a generic way to pass these key filters to to the final access of BigTable.

The Ascending Column Problem in Fact Tables –Part two: Stat Job!

July 7, 2011 3 comments

In my last post I described a common problem with statistics in data warehouses. I also directed you to a trace flag that partially solves the problem. There is an issue with this trace flag: at the time of writing this blog, it does not work on a partitioned tables. Considering that large tables tend to be partitioned, this could be an issue.

The Ascending Key Problem in Fact Tables– Part one: Pain!

July 1, 2011 4 comments

Time for another deep dive into SQL Server. In the last couple of days, I have been having coffee with Joe Chang in the afternoon, who some of you may know. He is visiting Sweden a lot by the way, so shoot him an email if you are interested in having him present. Joe and I ended up talking about an interesting problem in data warehousing: the ascending key problem. I think it is best illustrated by an example.

Thomas Kejser's Database Blog

Archive

What is the Best Sort Order for a Column Store?

Why “Date BETWEEN FromDate AND ToDate” is a dangerous join criteria

When Statistics are not Enough – Search Patterns

The Ascending Column Problem in Fact Tables –Part two: Stat Job!

The Ascending Key Problem in Fact Tables– Part one: Pain!

Categories