BI | Thomas Kejser's Database Blog

The Curse of Self-Service

August 18, 2013 17 comments

These days, we seem to be high on data and data related trends. My opinion on Big Data should be well known to my readers: it is something that has to be carefully managed and largely a fad for all but a select few companies.

With data being the new black, similar trends grab the attention of modern managers. One of these is Self Service. It seems like such a logical consequence of our advanced data visualisation: democratise the data.

It’s worth noting that the notion of humans making better decisions when well served with information is rather old. Thomas Jefferson said: “whenever the people are well-informed, they can be trusted with their own government”. But what exactly does it mean to be well-informed? Another great statesman, Churchill, said: “The best argument against democracy is a five-minute conversation with the average voter”.

In this blog entry, I will argue that is does not follow that humans will make better decisions if we just give them access to more data. In fact, allowing people to self-service their data can be outright harmful.

How do Column Stores Work?

July 4, 2012 9 comments

In this blog, I will provide you with some basic information about column stores. Nothing I am writing here is vendor specific IP, but merely taken from the papers published throughout history. One of the best papers that serves as an introduction is by Stonebraker:

Stonebraker et a, Proceedings of 31st VLDB Conference, 2005: C-Store: “C-Store: A Column-oriented DBMS”

The idea is older than that though, with the first papers published in the 1970’ies.

Shamefully standing on the shoulders of giants, I will walk you through a simple example which illustrate one of the key principles of column stores: Run Length Encoding (RLE).

The Information Staircase

July 1, 2012 1 comment

With the Big Data wave rolling over us these days, it seems everyone is trying to wrap their heads around how these new components fit into the overall information architecture of the enterprise.

Not only that, there are also organisational challenges on how to staff the systems drinking the big data stream. We are hearing about new job roles such as "Data Scientist” being coined (the banks have had them for a long time, they call them Quants) and old names being brought back like “Data Steward”.

While thinking of these issues, I have tried to put together a visual representation of the different architecture layers and the roles interacting with them:

The Analysis Services 2008R2 Performance Guide is Online

October 10, 2011 2 comments

I am happy to announce that companion volumes for performance tuners and operations people are now again, for the first time since Analysis Services 2000, a reality.

As a developer, you can learn about building, tuning and troubleshooting Analysis Services 2005, 2008 and 2008R2 cubes(yes, the guides cover all three editions) in the Performance Guide.

If you are the DBA or operations team, you can read about running such cubes in production in the Operations Guide.

And finally, if you are a consultant, expert developer or cube DBA, you can learn how to build the meanest and largest cubes from the 5 day Analysis Services Maestro Course.

These three artifacts, which I am proud to have contributed to (and for the guides, leading the effort on), are a big milestone in my Analysis Services career. They represent a large amount of knowledge transfer from Microsoft to the field. The publication of the companion volumes also marks my transition into some new and exiting projects at least for the near future. I will busy be digging into more “grade of the steel” work, among it ROLAP UDM testing, and I hope to blog about over at SQLCAT.com very soon.

Thanks to everyone for the incredible feedback during the writing of these guys. The Analysis Services community is very vibrant and these guides are the result of our collaboration in the field, and I am happy to give something back.

References:

The Big Picture – EDW/DW architecture

August 30, 2011 36 comments

Now that the cat is out of the bag on the Kimball forum, I figured it would be a good idea to present the full architecture that I will be arguing for. I was hoping to build up to it slowly, by establishing each premise on its own before moving on to the conclusion.

But perhaps it is better to start from the conclusion and then work my way down to the premises and show each one in turn.

That Analysis Services 2008R2 Operations Guide is online

June 1, 2011 10 comments

It is my pleasure to announce that the Operations Guide for SQL Server Analysis Services 2008R2 (and also 2005 and 2008) is now available on MSDN. Written by Denny Lee, John Sirmon (our new SSAS CAT member) and yours sincerely.

The guide describes how to configure, test and operate Analysis Services installations in a production environment. It is more than 100 pages of good information with contributions from a long list of MVP, SSAS specialists and the product group

Here it is: The Analysis Services 2008R2 Operations Guide

It was a pleasure working with you all to get this out there.

Analysis Services Operations Guide in Draft Review

May 4, 2011 Leave a comment

The SQL Server Analysis Services Operations Guide is currently in draft review in NDA form inside Microsoft and with some partners. This means that we will soon be able to publish this long awaited document. I will let you know on this blog when it is out.

Intermezzo–Data Modeling

May 3, 2011 5 comments

From the very nice WordPress Dashboard, I can see that I now have over 100 regular readers. Comments are flowing in too.

Thanks to everyone who is listening, it is my hope that this blog can be a great place for debates about data modeling and high scale performance tuning. I was surprised that is was hard to find concrete guidance about data modeling for the warehouse on the web. Perhaps I am missing something out there?

UPDATE (6th July 2012): This post is kept here for historical reasons. For the latest updates, please see my DW and Big Data page.

Analysis Services Maestro Training–Round 2

May 3, 2011 1 comment

It is official, SQLCAT will be doing another round of the Maestro Training. This is Tech Level 500 content to prepare partners, Microsoft staff and key customers for building the largest and most scalable Analysis Services cubes.

The course is 5 days long (extended from 3 days in the old course). There is an exam that you have to pass, and very limited seating available – only a few will get in. I will let Vidas Matelis give you an an opinion from someone who was there in round 1.

You can nominate yourself here:

The Madrid course will be taught by yours sincerely.

Defining the Good Data Model

April 18, 2011 6 comments

Designing data models is fun – at least if you are a geek like me. But as much as I like the academic thrill of building something that is complex – I am aware that it is often humans that eventually must see and maintain my beautiful (data) model. How can we design something that humans can understand?

Humans are buggy! In general, they don’t deal well with complexity. You can blame modern education, you can scream and shout, or languish on the fact that the IT industry is riddled with incompetence, you may even throw Kimball or Inmon books at the wall in anger. But the empirical tests all show the same: the wetware is the final test of the model.

Older Entries

Thomas Kejser's Database Blog

Archive

The Curse of Self-Service

How do Column Stores Work?

The Information Staircase

The Analysis Services 2008R2 Performance Guide is Online

The Big Picture – EDW/DW architecture

That Analysis Services 2008R2 Operations Guide is online

Analysis Services Operations Guide in Draft Review

Intermezzo–Data Modeling

Analysis Services Maestro Training–Round 2

Defining the Good Data Model

Categories