INSERT performance in column stores

| | Comments (4)
The most common question I am asked about column stores is, "Isn't INSERT performance poor?" The rationale for this question stems from the fact that in a column store a new tuple must be 1) split into its component column values, and 2) each such value must then be written to a different place (file). This would seemingly result in writing a large number of different disk blocks for every insertion. Furthermore, if the physical representation of the column is sorted and compressed, preserving this will only add to the overhead of an INSERT. While there is some truth to this line of reasoning, the problem can be overcome with the proper implementation.


Overcoming the INSERT performance penalty


One approach to significantly mitigating the performance problem is to batch INSERTs and to perform the sorts, compression, and disk writes in large groups. By doing this, existing data is kept on disk in its sorted and compressed form and new tuples are batched in a separate memory space or cache. This cache is maintained as columns in insertion order. Periodically, an asynchronous process runs and writes a batch of tuples, merging them into the disk-based storage system. In this way, the performance cost of sorting and compression is shared and amortized across many tuples. The expected number of disk writes per INSERT will also decrease as the batch size grows. Further, if the insertion-order cache is stored in main memory, this structure can be quickly scanned.

A fair question to ask is whether this approach would mean that the answers to queries would be stale. The answer is no ... if the query evaluator looks in both places (the disk and the cache). To do so would require the query optimizer to generate two plans since the data is structured differently in each location, but the extra query planning work is worth the trouble because of the boost in INSERT performance.

It should also be pointed out that column stores can partition tables across a collection of shared-nothing nodes in a cluster. If INSERTs are randomly distributed on the partitioning key, then the load introduced by high INSERT rates is distributed evenly across the cluster. If the INSERT rate grows, more nodes can be added to cope with the increase.


A note about ACID

Of course, in order to support ACID transactions in this setting, there must be a safe way to allow committed data to reside in main memory. This can be accomplished by keeping redundant copies in multiple distributed main memories. In general, one can achieve k-safety, where k is the number of nodes that can fail without losing any work, by keeping data copies on k+1 different machines. All INSERTs will be sent to all k+1 relevant sites and stored in their main memory caches. Once all these copies are installed, the tuple is stable (subject to the k-safety constraints).


INSERT Performance Benchmarks

By implementing all these strategies, it is possible to have a column store with INSERT performance that is at least competitive in performance with that of the major row stores. In fact, in many cases, benchmarks have shown that load performance for a column store is typically better than that of a row store.

 

Categories

4 Comments

Xirium said:

Column stores on shared-nothing architectures are especially good for OLAP. For the dimension tables, you've got one indexed column and a PK, so its a single column anyhow. For fact tables with multiple attributes, each attribute is stored separately. When you want to perform a query on one attribute, the data is packed more tightly. When you want to perform a query on multiple attributes while INSERTs occur, shared-nothing column stores give maximum performance.

So, any pointers to those benchmark results? I would be really interested.

It appears to me that there are a lot of interesting ways to store data and optimize access to it, but there appears to be that hard, physical boundary to overall performance. You can choose your tradeoffs for different scenarios such that your particular scenario runs faster, but there is something of a hard border.

Column stores appear to be a particularly nice optimization strategy for table-scan causing business intelligence like queries, but there certainly are tradeoffs to them.

Frank Ch. Eigler said:
Of course, in order to support ACID transactions in this setting, there must be a safe way to allow committed data to reside in main memory. This can be accomplished by keeping redundant copies in multiple distributed main memories.

Does this mean that column databases that want to support ACID and yet provide decent performance (via caching as described) therefore all have to be clustered?

Curt Monash said:

Hiya!

As I noted in http://www.dbms2.com/2008/02/08/load-speeds-and-related-issues-in-columnar-dbms/ , a big question is the performance implications of this strategy, in areas like ELT and operational BI (to pick two overused buzzphrases).

It's obviously the right strategy, but is it good ENOUGH to make you fully competitive with the row stores?

Best,

CAM

Leave a comment

About this Post

This page contains a single post by Stan Zdonik published on February 6, 2008 9:50 AM.

MapReduce II was the previous entry in this blog.

Responding to Monash's recent post on diversity of database systems is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.