How do Column Stores Work?
In this blog, I will provide you with some basic information about column stores. Nothing I am writing here is vendor specific IP, but merely taken from the papers published throughout history. One of the best papers that serves as an introduction is by Stonebraker:
- Stonebraker et a, Proceedings of 31st VLDB Conference, 2005: C-Store: “C-Store: A Column-oriented DBMS”
- The idea is older than that though, with the first papers published in the 1970’ies.
Shamefully standing on the shoulders of giants, I will walk you through a simple example which illustrate one of the key principles of column stores: Run Length Encoding (RLE).
Implementing Message Queues in Relational Databases
At the last SQL Bits X I held the FusionIO fireside chat during the launch party. During this presentation, I demonstrated how it is possible to build a table structure inside a relational engine that will act is a message queue and deliver nearly 100K messages/second.
Why You Need to Stop Worrying about UPDATE Statements
There seems to be a myth perpetuated out there in the database community that UPDATE statements are somehow “bad” and should be avoided in data warehouses.
Let us have a look at the facts for a moment and weigh up if this myth has any merit.
Why Surrogate Keys are not Good Keys
History tracking in warehouses is a controversial discipline. I this post, I will begin to unravel some of the apparent complexities by taking apart the history tracking problem, piece by piece.
Good keys, what are they like?
A central value add of data warehouses is their ability to restore the sanity that comes from using good keys. Taking a model-agnostic view of keys, they refer to “something” that is uniquely identifiable. Depending on what you are modeling, those “somethings” have different names, for example: entities, rows, tuples, cells, members, documents, attributes, object instances, and relations between the any of the latter. Because this article is about relational databases and because I don’t want to move up to the “logical” level with you (as I promised before), I will use the term: “row” as the “something” that a key refers to, and the term “column” as the data structure that implements the key.
Moving on, the definition of a “good key” is a column that has the following properties:
- It forced to be unique
- It is small
- It is an integer
- Once assigned to a row, it never changes
- Even if deleted, it will never be re-used to refer to a new row
- It is a single column
- It is stupid
- It is not intended as being remembered by users