Why “Date BETWEEN FromDate AND ToDate” is a dangerous join criteria
I have been meaning to write this blog post for some time and the discussion about Data Vault finally prompted me to do it.
Sometimes, you find yourself in situations where you have to join a table that has a structure like this:
CREATE TABLE TemporalTracking (
SomeKey INT
, FromDate DATETIME
, ToDate DATEIME
, <more columns>
)
The join criteria is expressed:
FROM <OtherTable> OT
INNER JOIN TemporalTracking T
ON OT.SomeTimeColumn BETWEEN T.FromDate AND T.ToDate
AND OT.SomeKey = T.SomeKey
Or more commonly, this variant with a semi open interval:
FROM <OtherTable> OT
INNER JOIN TemporalTracking T
ON OT.SomeTimeColumn >= T.FromDate
AND OT.SomeTimeColumn < T.ToDate
AND OT.SomeKey = T.SomeKey
Data models that promote these types of joins are very dangerous to relational optimizers and you have to step carefully when executing queries with many of these joins. Let us have a look at why this is so.
What Structure does your Data Have?
I am currently thinking about measuring compression rates for different column store indexing strategies. In order for me to get some realistic data, I am looking for people who can share a very small anonymised data sample with me.
When Statistics are not Enough – Search Patterns
Co-author: Lasse Nedergaard
Yesterday, Lasse ran into an issues with a query pattern in the large database that he is responsible for. Based on our conversation, we wrote up this blog and created a repro.
The troublesome query we were debugging executed like this:
- Find a list of keys values to search for
- Insert these keys in a temp table – lets call this the SearchFor table
- Join the temp table to a large table (lets call it BigTable) and retrieve the full row from the large table
Why not use a correlated sub query in step 2? In this case, the customer in question had multiple code paths (including one accepting XML queries) that all needed to pass thousands of key to a final search procedure. They wanted a generic way to pass these key filters to to the final access of BigTable.
The Ascending Column Problem in Fact Tables –Part two: Stat Job!
In my last post I described a common problem with statistics in data warehouses. I also directed you to a trace flag that partially solves the problem. There is an issue with this trace flag: at the time of writing this blog, it does not work on a partitioned tables. Considering that large tables tend to be partitioned, this could be an issue.