Archive

Posts Tagged ‘Dimensional Models’

Modeling Dimensions with History Tracked, Generic Attributes

July 6, 2012 5 comments

Sometimes, you need to model a database in such a way that you can dynamically extend the model without altering any tables. Perhaps the attributes changes faster than you can add new columns or the data you store has a “ragged” structure that does not lend itself well to being described as a traditional table structure.

The typical approach taken in these cases is to “pivot” the required “flexible columns” of a table into rows instead and dynamically reconstruct the schema at runtime. When you model like this, adding new attributes to the model is simply a question of inserting rows into the database.

As I sure you are aware, there are issues with this approach. Tuning columns into rows  can cause interesting issues for database engines. The flexibility requirement and performance loss to be balanced carefully. In this blog, I will walk you through an example of how to handle this generic case in a star schema.

Read more…

Why “Date BETWEEN FromDate AND ToDate” is a dangerous join criteria

June 29, 2012 10 comments

I have been meaning to write this blog post for some time and the discussion about Data Vault finally prompted me to do it.

Sometimes, you find yourself in situations where you have to join a table that has a structure like this:

CREATE TABLE TemporalTracking (
  SomeKey INT
, FromDate DATETIME
, ToDate DATEIME
, <more columns>
)

The join criteria is expressed:

FROM <OtherTable> OT
INNER JOIN TemporalTracking T
  ON OT.SomeTimeColumn BETWEEN T.FromDate AND T.ToDate
  AND OT.SomeKey = T.SomeKey

 

Or more commonly, this variant with a semi open interval:

FROM <OtherTable> OT
INNER JOIN TemporalTracking T
  ON OT.SomeTimeColumn >= T.FromDate
  AND OT.SomeTimeColumn < T.ToDate
  AND OT.SomeKey = T.SomeKey

Data models that promote these types of joins are very dangerous to relational optimizers and you have to step carefully when executing queries with many of these joins. Let us have a look at why this is so.

Read more…

The Data Vault vs. Kimball – Round 2

June 26, 2012 Leave a comment

Here we go again, the discussion about the claimed benefits of the Data Vault. Thomas Christensen has written some great blog posts about his take on the Vault method. Dan Linstedt has been commenting.

The discussions are a good read to track:

Apologies in advance for my spelling errors in the comments, the posts are written while travelling.

As with most complex subjects, it is often hard to have people state with clarity what EXACTLY their claim is and what their supporting arguments are. It seems that the Vault is no exception – I hope the discussions lead somewhere this time.

For your reference, here are some of the posts I have previous done on how to approach the “integration” problem the Vault claims to solve:

Notice that there are some very interesting claims being made about normalization creating more load and query parallelism in the comments on Thomas Christensen’s Blog by Sanjay. I personally look forward to hearing the argument for that.

And here is the big picture architecture I am arguing for:

Why Surrogate Keys are not Good Keys

October 22, 2011 33 comments

History tracking in warehouses is a controversial discipline. I this post, I will begin to unravel some of the apparent complexities by taking apart the history tracking problem, piece by piece.

Read more…

Transforming Source Keys to Real Keys – Part 2: Using Maps To Fix Key Problems

August 15, 2011 11 comments

KeysIn part 1 of this post, I introduced the idea of map tables. These tables serve as an abstraction between the source systems and the entities in the data warehouse. In this post, I will describe how you can use the maps to correct for source key errors.

Using the taxonomy of key pathologies described earlier, I will walk you through some examples of map usage.

Read more…

Analysis Services Performance Guide in Final Draft

August 14, 2011 Leave a comment

To the Analysis Services crowd out there: Last week I finished the last section of the Analysis Services 2008R2 Performance Guide. Furthermore, I reviewed my co-author Denny Lee’s sections. Denny is currently polishing off a few minor details and we expect to send the draft to our editor early the coming week. I would like the take the opportunity to give you some background on what happens to a whitepaper before you can read it on MSDN.

Read more…

Microsoft Announces Plans to Introduce Hadoop Interoperability

August 12, 2011 2 comments

For those of you who have not yet seen it, Microsoft recently announced that they will be looking at Hadoop connectivity to the database stack:

Some of you may have wondered why I have not yet mentioned the BigData movement as part of my DW articles. In my defense I will say that this a big trend, something I had to give a lot of thought to position correctly. Before I can talk more about how it fits into the full DW/BI architecture – I have to argue a bit more for my warehousing approach.

I can reveal that I see traditional data warehousing (especially dimensional modeling) and BigData compliment each other in a way that solves some of the common complaints of warehouse builders across the world. I hope you will find my thoughts on BigData fit nicely into the picture I will be painting of the warehousing world going forward in this blog.