Quantifying the Cost of Compression
Last week, at SQL Saturday Exeter, I did a new Grade of Steel experiment: to quantify just how expensive it is to do PAGE compression.
The presentation is a 45 minute show, but for those of you who were not there, here is a summary of the highlights of the first part.
My tests were all run on the TPC-H LINEITEM table at scale factor 10. That is about 6.5GB of data.
Test: Table Scan of Compressed Data
My initial test is this statement:
SELECT MAX(L_SHIPDATE)
, MAX(L_DISCOUNT)
, MAX(L_EXTENDEDPRICE)
, MAX(L_SUPPKEY)
, MAX(L_QUANTITY)
, MAX(L_RETURNFLAG)
, MAX(L_PARTKEY)
, MAX(L_LINESTATUS)
, MAX(L_TAX)
, MAX(L_COMMITDATE)
, MAX(L_RECEIPTDATA)
FROM LINEITEM
Because the statement only returns one row, the result measured does not drown in client transfer time. Instead, the raw cost of extracting the columns from the compressed format can be quantified.
The result was quite shocking on my home 12 core box:
Even when doing I/O, it still takes quite a bit longer to scan the compressed format. And when scanning from DRAM, the cost is a whopping 2x.
A quick xperf run shows where the time goes when scanning from memory
Indeed, the additional CPU cost explains the effect. The code path is simply longer with compression.
Test: Singleton Row fetch
By sampling some rows from LINEITEM, it is possible to measure the cost of fetching pages in an index seek. This is the test:
SELECT MAX(L_SHIPDATE)
, MAX(L_DISCOUNT)
, MAX(L_EXTENDEDPRICE)
, MAX(L_SUPPKEY)
, MAX(L_QUANTITY)
, MAX(L_RETURNFLAG)
, MAX(L_PARTKEY)
, MAX(L_LINESTATUS)
, MAX(L_TAX)
, MAX(L_COMMITDATE)
, MAX(L_RECEIPTDATA)
FROM LI_SAMPLES S
INNER LOOP JOIN LINEITEM L ON S.L_ORDERKEY = L.L_ORDERKEY
OPTION (MAXDOP 1)
This gives us the plan:
Which has the desired characteristics of having the runtime dominated by the seek into the LINEITEM table.
The numbers again speak for themselves:
And again, the xperf trace shows that this runtime difference can be fully explained from longer code paths.
Test: Singleton UPDATE
Using the now familiar pattern, we can run a test that updates the rows instead of selecting them. By updating a column that is NOT NULL and an INT, we can make sure the update happens in place. This means we pay the price to decompress, change and recompress that row – which should be more expensive than reading. And indeed it is:
Summary
Quoting a few of my tests from my presentation, I have shown you that PAGE compression carries a very heavy CPU cost for each page access. Of course, not every workload is dominated by accessing data in pages – some are more compute heavy on the returned data. However, in these days when I/O bottlenecks can easily be removed, it is worth considering if the extra CPU cycles to save space are worth it.
It turns out that it is possible to also show another expected results: that locks are held longer when updating compressed pages (Thereby limiting scalability if the workload contains heavily contended pages). But that is the subject of a new blog entry.
What is the Best Sort Order for a Column Store?
As hinted at in my post about how column stores work, the compression you achieve will depend on how many repetitions (or “runs”) exist in the data for the sort order that is used when the index is built. In this post, I will provide you more background on the effect of sort order on compression and give you some heuristics you can use to save space and increase speed of column stores. This is a theory that I am still only developing, but the early results are promising.
What Structure does your Data Have?
I am currently thinking about measuring compression rates for different column store indexing strategies. In order for me to get some realistic data, I am looking for people who can share a very small anonymised data sample with me.