## Time complexity of processing in Analysis Services

When I build enterprise scale solution it is crucial for me to know what sort of scalability can be expected from a server product. Is it linear? It is exponential? Should I scale-out or scale-up? Are there hard limits to what the architecture can do?

To understand these limitations and provide valid statements about scalability – you have to know the time complexity of the operations performed by your implementation.

A major challenge for large scale Analysis Services installations is to reduce the processing time for a cube. If we are understand to what constitutes a cube processing we must first look at how processing is done. Processing proceeds in three phases

**Read phase**– Rows are read from the data source and into Analysis Services.**Aggregation phase**– Based on the rows read – aggregations are build.**Index build phase**– The aggregations are indexed to provide fast query access

I need to get a deeper understanding of the index phase before embarrassing myself completely (as opposed to partially) in my blog.

For now – lets look at the first two phases in more detail:

## Read Phase

The purpose of the read phase is – as the name implies – to read the rows from the relational data source. The rows read form what I call the "**base aggregations**" – I.e. the leaf level (or key level if you like) data of the cube. It is these base aggregations that are used to answer queries at leaf level. The base aggregations also form the "base" of higher level aggregates in the next phase.

Each row from the relational source is read once. Assuming there are n rows in the source. We find that our number of IO reads must be proportional with:

IOREAD(Read Phase) = O(n)

Each row takes about the same amount of CPU time to read. Hence, our CPU usage is:

CPU(Read Phase) = O(n)

How about the writes, once could assume **O(n)** – but this would not be completely true:

As you may be aware – a relational fact table can contain "duplicate" data at the leaf level. Let me provide you with a few examples:

*Example 1: I go into a supermarket today and buy a bottle of coke. Being a caffeine junkie – I later visit the same store – buying yet another bottle of coke. This will generate two rows in the Point Of Sales system – one of each bottle. Depending on your warehouse design – you may see both of these rows in your relational source – hence observing a "duplicate" row. The interesting data from the cube perspective (assuming no drillthrough) is one row with 2 bottles of coke – not two rows with 1 each*

*Example 2: You have a relational fact table with 5 dimension columns. However, for you analysis services cube you only want to use 4 of these dimensions. When you read the fact table you will almost certainly read duplicate leaf level data. Why? Compare the two statements:*

*(1) SELECT DISTINCT D1, D2, D3, D4 , D5 FROM <FactTable>*

*(2) SELECT DISTINCT D1, D2, D3, D4 FROM <FactTable>*

*The row count of (2) is most likely smaller than (1) – why else would you even bother with D5 in the first place?*

Because the two examples happens a lot in the OLAP world, Analysis Services implements a "merge buffer" . While reading rows from the relational source- this buffer is used to merge such duplicate rows into single rows – thus reducing the storage space needed for the leaf level aggregates.

The merge buffer will not eliminate all duplicates. It has a limited size and will thus only on a subset of the fact table. In the ideal world – where the fact rows enter Analysis Services in sorted order and the merge buffer is large enough, we will eliminate all duplicates.

This brings us back to the number of IO operations needed to write base aggregation. Let **n _{d} **be the distinct count of the dimension columns we use in the cube. Assuming an optimal reading of the fact table we have:

IOWRITE(Read Phase) = O(n_{d})

The reduction of the writes compared with the read can be quite significant for later phases of processing. Essentially we are seeing a kind of "pre aggregation" here. I like to think of **n _{d}** as the size of the "non empty cube space".

Also, notice that there is an upper limit to **n _{d}** that is unaffected by you fact table size. Assuming your cube dimension,

**D**, … ,

_{1}**D**have the following leaf level sizes:

_{n}**|D**, … ,

_{1}|**|D**The upper limit for

_{n}|.**n**is:

_{d}

n<=_{d}|D, … ,*_{1}| *|D_{n}|

For cubes with a small number of dimensions this upper limit can work to your advantage – as we shall see later.

## Aggregation Phase

Once analysis services has stored the data read it must build higher level aggregates.

If you have ever tried managing you own aggregate tables in a relational database you will know that there are two ways to build an aggregate:

- Directly from the leaf level (in a relational database this would be the fact table)
- Derive the aggregate from another aggregate.

This is where things get a little complicated – but I will give it a shot. Let’s start with the first case – leaf level aggregation building:

Every attribute in you cube is a potential candidate for an aggregation. You control which ones you want to aggregate by a combination of:

- Setting the attribute properties (see my previous post)
- Running the storage design wizard. This will create some "statistically correct choices"
- Using a custom aggregation tool to manually specify which aggregates you want

For the moment – lets keep attribute relationships out of the equation. Let’s just assume you have a cube with aggregates designed, one way or the other, for a subset, **A _{Agg}**, of your attributes.

During the aggregation phase analysis services has to read your leaf level rows (remember, there were n_{d} of these). Reading these rows takes:

IOREAD(Read Leaf level data) = O(n_{d})

From my conversations with Eric Jacobsen I gather that the aggregation algorithm in AS 2005 is an external disk sort (as compared to the other common alternative – a hash/bucket sort). What does this tell us? Well, for those you that don’t already own it – get a copy of Donald E. Knuths masterpiece: "The Art of Computer Programming". From this work we can lookup the complexity of different sort algoritms and how to implement them. I fair assumption is that the external sort uses quicksort on its memory buffer the time complexiy: O(n lg(n)). Each attribute that is aggregated must be sorted – so we have:

CPU(Aggregation Phase) = O( A_{Agg}* n_{d}* lg(n_{d}))

**Aggregation phase – summary so far**

**Time complexity:** Not surprisingly – CPU usage in the aggregation phase is proportional to the number of attributes you are aggregating. Notice however, that it is *not* proportional to the amount of rows in your relational database. Its proportional to the DISTINCT count of dimension values (assuming your merging in the read phase is optimal).

**Possible optimization (untested):** This is pure speculation – but there may be some benefit of sorting your fact table using a clustered index to optimize the "pre aggregation" in the read phase.

**Cubes with a small number of dimension:** If you cube has a very small number of dimension with relatively few manbers you may be able to leverage the fact that **n _{d}** has an upper limit. This is pure theory – but it seems to be that some cubes could always be fully aggregated in reasonable time – irrespective of the number of source rows in the underlying relational table

It’s time to post this first part. Next part: Attribute relationships and how they are used to build higher level aggregates.