Posts Tagged ‘Keys’

Clustered Indexes vs. Heaps

January 12, 2014 21 comments

At Stack Overflow the other day, I once again found myself trying to debunk a lot of the “revealed wisdom” in the SQL Server community. You can find the post here: Indexing a PK GUID in SQL Server 2012 to read the discussion. However, this post is not about GUID or sequential keys, which I have written about elsewhere, it is about cluster indexes and the love affair that SQL Server DBAs seem to have with them.

Read more…

Why You Need to Stop Worrying about UPDATE Statements

April 27, 2012 4 comments

There seems to be a myth perpetuated out there in the database community that UPDATE statements are somehow “bad” and should be avoided in data warehouses.

Let us have a look at the facts for a moment and weigh up if this myth has any merit.

Read more…

Exploring Hash Functions in SQL Server

November 6, 2011 16 comments

Hash distributing rows is a wonderful trick that I often apply. It forms one of the foundations for most scale-out architectures. It is therefore natural to ask which hash functions are most efficient, so we may chose intelligently between them.

In this blog post, I will benchmark the build in function in SQL Server. I will focus on answering two questions:

  • How fast is the hash function?
  • How well does the hash function spread data over a 32-bit integer space

Read more…

Why Surrogate Keys are not Good Keys

October 22, 2011 33 comments

History tracking in warehouses is a controversial discipline. I this post, I will begin to unravel some of the apparent complexities by taking apart the history tracking problem, piece by piece.

Read more…

Physically Placing the Maps in the architecture

August 19, 2011 2 comments

Before we leave the maps behind, I need to live up to my promise of describing the storage characteristics of tables visited during the journey through the warehouse architecture. This must include the physical location of maps. Since believe form must follow function in a DW, let us just recall their function:

From a functional perspective, I have shown you how map tables can be used to both track and correct source system keys. Maps are not visible to the end user, but they are a necessary part of the data’s journey from the source to the final data model. Maps also provide the abstraction of, or interface to, master data sources. In the absence of those sources – the maps can even serve as a makeshift master data repository.

Read more…

Transforming Source Keys to Real Keys – Part 2: Using Maps To Fix Key Problems

August 15, 2011 11 comments

KeysIn part 1 of this post, I introduced the idea of map tables. These tables serve as an abstraction between the source systems and the entities in the data warehouse. In this post, I will describe how you can use the maps to correct for source key errors.

Using the taxonomy of key pathologies described earlier, I will walk you through some examples of map usage.

Read more…

Transforming Source Keys to Real Keys – Part 1: Introducing Map tables

August 4, 2011 17 comments

I have ranted enough about the key quality, or lack thereof, in source systems. It is time to look at how we will practically go about transforming the dirty source to the final DW model. What I shall propose is a de-coupling of the source entities from the data warehouse entities. As will be shown, this leads to an interesting ETL pattern that can either address, or properly encapsulate, a variety of source problems.

Read more…