Supporting Column Store Performance Claims
We commonly encounter questions related to column store performance from those considering moving away from their current DBMS solution. In this entry, I want to share my thoughts on this topic.
Issue No. 1: Addressing the Performance Claim
There is a well-known adage: "If it's not broken, don't fix it." Any client who is satisfied with his current data warehouse solution would be ill-advised to change it. However, Vertica sees enormous pain in the data warehouse market, due to combinations of the following factors:
If the user is in serious pain with his current warehouse solution, then the obvious answer is "find a better one."
In summary, performance is either black or white. Either it is good enough or it isn't. And if performance is important--column databases have demonstrated orders better magnitude performance (50x in round numbers) than row-stores in customer benchmarks and TPC-H benchmarks. Industry experts, such as Gartner, have validated these results (click on this link to launch a Vertica-Gartner podcast on this topic).
We see column databases out-perform row stores by large margins in customer benchmark settings on a frequent basis. Here are some results a customer measured very recently:
And remember -- it doesn't have to be an either-or decision. As Don Feinberg of Gartner suggests in his podcast, using a column database in conjunction with an enterprise data warehouse (EDW) can provide users with better analytic performance and also to offload certain analyses from the EDW in order to improve its performance without costly upgrades or re-designs.
Issue No. 2: Of Connectivity and Automatic Design Tools
My second point concerns the perceived connectivity advantages of row stores. Vertica (and other column-oriented databases) use ODBC/JDBC interfaces. As such, they get connectivity to all of the 3rd party tools that row stores utilize. Hence, "connectivity" is a wash between row stores and column stores. Both kinds of products connect to most -- if not all -- of the popular tools.
Lastly, there is a perception that column database introduce additional complexity for DBAs. This is untrue. Vertica includes an automatic physical database designer that helps a DBA set all of the performance options in Vertica. Hence, there is no "complexity" factor; manual optimization by a human is a thing of the past. DB2 has a similar tool. The real question is, "How good is the automatic tool from any given vendor?" We are confident in Vertica's ability to automatically generate a good physical design; it would be interesting to conduct a comparative "out-of-the-box" performance benchmark that measured automatic tool effectiveness.
Issue No. 1: Addressing the Performance Claim
There is a well-known adage: "If it's not broken, don't fix it." Any client who is satisfied with his current data warehouse solution would be ill-advised to change it. However, Vertica sees enormous pain in the data warehouse market, due to combinations of the following factors:
- Increasing query complexity. The size of data warehouses are going up faster than disks are getting cheaper. An increasing number of people are being trained and equipped to analyze information, and they desire to correlate more and more data. Since query complexity goes up more than linearly with warehouse size, this means that warehouse problems are getting harder over time - not easier.
Many warehouse DBAs can predict with some precision when they will "hit the wall" with their current solution. The result of hitting the wall is an expensive guided tour through the enterprise wallet for more hardware, different software, or both. - The desire for real-time warehouses. Most warehouses are loaded periodically and are out of date by ½ of the length of this periodicity. But many enterprises want more timely business intelligence. The obvious solution is to "trickle load" data in parallel with user queries. However, this is impossible in many current warehouse products.
- The desire for timely answers. In many current products, an ad-hoc query requires one to go out to lunch before the answer is returned. Sometimes response time is even worse than this. The result of delayed answers is lost human productivity and a move to "batch thinking" rather than "interactive thinking."
If the user is in serious pain with his current warehouse solution, then the obvious answer is "find a better one."
In summary, performance is either black or white. Either it is good enough or it isn't. And if performance is important--column databases have demonstrated orders better magnitude performance (50x in round numbers) than row-stores in customer benchmarks and TPC-H benchmarks. Industry experts, such as Gartner, have validated these results (click on this link to launch a Vertica-Gartner podcast on this topic).
We see column databases out-perform row stores by large margins in customer benchmark settings on a frequent basis. Here are some results a customer measured very recently:
And remember -- it doesn't have to be an either-or decision. As Don Feinberg of Gartner suggests in his podcast, using a column database in conjunction with an enterprise data warehouse (EDW) can provide users with better analytic performance and also to offload certain analyses from the EDW in order to improve its performance without costly upgrades or re-designs.Issue No. 2: Of Connectivity and Automatic Design Tools
My second point concerns the perceived connectivity advantages of row stores. Vertica (and other column-oriented databases) use ODBC/JDBC interfaces. As such, they get connectivity to all of the 3rd party tools that row stores utilize. Hence, "connectivity" is a wash between row stores and column stores. Both kinds of products connect to most -- if not all -- of the popular tools.
Lastly, there is a perception that column database introduce additional complexity for DBAs. This is untrue. Vertica includes an automatic physical database designer that helps a DBA set all of the performance options in Vertica. Hence, there is no "complexity" factor; manual optimization by a human is a thing of the past. DB2 has a similar tool. The real question is, "How good is the automatic tool from any given vendor?" We are confident in Vertica's ability to automatically generate a good physical design; it would be interesting to conduct a comparative "out-of-the-box" performance benchmark that measured automatic tool effectiveness.
Categories
Database architecture1 Comments
Leave a comment

Do you guys belive figure and facts provided are correct ? seems bookish knowlege working here instead of a practicle one....