One size fits all: A concept whose time has come and gone

| | Comments (10) | TrackBacks (7)
The current major relational DBMSs (DB2, SQLserver, Oracle) all share the following characteristics:

  1. They are direct descendants of System R and Ingres and were architected more than 25 years ago

  2. They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs

In the 1970s when System R and Ingres were architected, the DBMS market had the following characteristics:

  1. There was only a single DBMS application area - business data processing (OLTP)

  2. Hardware characteristics were very different from today

Since the 1970s, new DBMS application areas have emerged with very different requirements than OLTP. These include data warehouses, scientific and intelligence DBMSs, text and semi-structured data.

In addition, hardware has become radically cheaper. This has changed the major cost of DBMS applications from iron to people. In addition, it has led to applications requiring previously unthinkable features, such as high availability and disaster recovery.

In short, the world of 2007 is radically different from the world of the late 1970s. However, none of the major vendors have performed a complete redesign to deal with this changed landscape. As such they should be considered legacy technology, more than a quarter of century in age and "long in the tooth".

In every major application area I can think of, it is possible to build a SQL DBMS engine with vertical market-specific internals that outperforms the "one size fits all" engines by a factor of 50 or so. Vertica is an example of this claim in the data warehouse market. It achieves blindingly fast warehouse performance by:

  • Column orientation - i.e. rotate your thinking 90 degrees

  • Aggressive data compression

  • Query executor runs against compressed data

In addition, it provides built-in features appropriate to the needs of 2007 customers. These include:

  • Linear scalability over a shared-nothing hardware grid

  • Automatic high availability

  • Automatic use of materialized views

  • "No knobs"  -- minimum DBA requirements

As such Vertica can be set-up and data loaded, typically in one day. The major vendors require weeks.  Hence, the "out of box" experience is much friendlier. Also, Vertica beats all row stores on the planet - typically by a factor of 50.  This statement is true for software only row stores  as well as row stores with specialized hardware (e.g. Netezza, Teradata, Datallegro). The only engines that come closer are other column stores, which Vertica typically beats by around a factor of 10.

Hence, my prediction is that column stores will take over the warehouse market over time, completely displacing row stores. Since many warehouse users are in considerable pain (can't load in the available load window, can't support ad-hoc queries, can't get better performance without a "fork-lift" upgrade), I expect this transition to column stores will occur fairly quickly, as customers search for better ways to improve performance.

In the longer term, I expect a transition of the same sort to occur in other markets where there is great user pain and the possibility of radical performance improvement from a specialized software architecture.

 

Categories

,

7 TrackBacks

Listed below are links to blogs that reference this entry: One size fits all: A concept whose time has come and gone.

TrackBack URL for this entry: http://www.databasecolumn.com/blog/mt-tb.cgi/9

» columnstores non relational? from comp.databases.theory

I couldn't read the article (site was down) which is supposed to claim that RDBMSs "should be considered Read More

» The Power of Compression from Full Table Scan

I got a look at some more Vertica benchmarks yesterday, and I continue to be impressed. The results were consistently surprising - in a good way - on both scaled-up and scaled-down datasets. This is something of a holy grail for analytics application Read More

» Database Blog from Anant Jhingran's Musings

My friends and colleagues, Mike Stonebraker (well, my advisor at Berkeley) and Don Haderle (my predecessor at IBM) have started a new collective blog, The Database Column, alongwith several other database folks. I applaud their intention. Their challen... Read More

» First the mainframe died, now someone thinks DB2 is on its way out from Getting the Most out of DB2 for z/OS and System z

Neither of the above is even remotely true. However, people seemed bound determined to preach the end of all things that work well (or that I work on… LOL). This all has to do with an article in Computerworld about... Read More

The opening keynote for this year's VLDB was a great presentation by Amazon CTO Werner Vogels , describing Read More

» Our father, who shall write a blog from An Ingres Blog

Picked up from reading Andy Astor’s (EnterpriseDB’s CEO) blog. It would appear that Michael Stonebraker, one of the founders of INteractive Graphics REtrieval System project at UCB has started a column, on a topic close to his heart, column... Read More

» Business Intelligence Utopia - Enabler 5: Extensible Data Models from Business Intelligence – A Practitioner’s View

Enabler 5 in my list for Business Intelligence Utopia are the ubiquitous, hard-working “Data Models”. Data Model is the heart of any software system and at a fundamental level provides placeholders for data elements to reside. Business Intelligence sys... Read More

10 Comments

ken farmer said:

A couple of thoughts:

First - this is the same claim that object databases made. And that effort died due to some limitations (sequential scans) but also because the entrenched relational players added just enough oo functionality. Same thing with XML databases. Why wouldn't that happen here as well?

Secondly - a column-oriented approach should beat the pants off a very poorly designed warehouse. But with a standard star-schema your central large table has little besides a few dozen integer key columns. And with relational database compression, very fast row-oriented loading, automatic query rewrite against summary tables, and hash and range partitioning - performance in the row-world is great. So, is the main benefit of column-orientation that while range partitioning no longer works well, compression rates are improved? And is that a worthwhile trade-off?

I also expressed similar thoughts on RDBMs a week or so ago (http://www.rgoarchitects.com/nblog/2007/08/21/TheRDBMSIsDead.aspx)
However, it is nice to see someone with your experience that thinks the same

anonymous said:

I have to question - when you open this blog initially touting Vertica, with a panel of contributors that mostly appear to be associated with Vertica, and sporting a copyright at the bottom of the page by Vertica - is "The Database Column" intended to be an impartial discussion on technology? Or is this simply a marketing exercise?

Admin said:

In response to "anonymous" who provided feedback on 9/8, one of the initial posts, Welcome to the Database Column, notes how the contributors are both associated with Vertica but also experts in the database field. The Vertica connection is not intended to be hidden or downplayed, as you can see by the various links and comments in the posts and throughout the site.

As for intent, the blog is intended to be an idea forum and conversation place for database innovation and technology. As with any research paper, blog posting, media article, conference speech, or discussion about technology, the impartiality of the author may or may not have any impact on the value of the message he or she delivers. You can judge for yourself the credentials of the contributors and then read their posts to make up your own mind on the value of any or all posts on this blog

anonymous said:

Being a supplier of a Column Store database ourselves, we completely agree with your performance claims against "classic" database engines. Although I'm not sure we would claim column-wise is the best in all circumstances (I note that even some Vertica Whitepapers suggest that row-wise is better in some cases)
However, the claim "The only engines that come closer are other column stores, which Vertica typically beats by around a factor of 10. " implicitly suggests that other CW engines are only 5x quicker than row-wise, which is much lower than than other vendors claims.
I'm sure you will have run internal tests against some of the others, but for truly impartial claims perhaps it is time for CW community to create a TPC-H style benchmark that is not quite so tilted towards "classic" database engines and permits each vendor to submit performance claims in a level playing field.

Flybean said:

There is no one single method which could resolve all troubles.
Row-wise is better for OLTP, while Column-wise is better for OLAP.
I noticed it seems that Stonebraker has the same idea.

If "this blog is not intended to be a corporate mouthpiece simply repackaging marketing material", as indicated in the About page, then each author should disclose their connection with Vertica in every single article in which the company is mentioned. For added believability, I'd also suggest that the photos at the top of the blog not be the very same ones on the Vertica website.

Jeoff Wilks said:

WOW, just rotate your thinking 90 degrees and everything gets faster!

An inline explanation would have been nice. For that I referred to http://en.wikipedia.org/wiki/Column-oriented_DBMS

My conclusion is that your run-of-the-mill RDBMS can easily provide a new table type that uses column-oriented storage, without varying anything else (same SQL, same drivers, same integrations).

That leaves Vertica and its column-oriented friends in a dicey position: as soon as money starts flowing towards columns instead of rows, the big boys wake up and add a column-oriented storage option. Then everyone sighs and upgrades to Oracle 12p or whatever they end up calling it.

Chris Jennings said:

"My conclusion is that your run-of-the-mill RDBMS can easily provide a new table type that uses column-oriented storage, without varying anything else (same SQL, same drivers, same integrations"

Did the Vertica business plan take this into account do you think?

I certainly agree with part of your comment,

"In the longer term, I expect a transition of the same sort to occur in other markets where there is great user pain and the possibility of radical performance improvement from a specialized software architecture."

However, I do not believe the "radical performance improvements" will come from a specialized software architecture.

It is now possible with the advancements in FPGA's to embed a column store database engine into transistors. This innovation will revolutionize what is possible with regards to data base performance and make column-oriented, software only database deployments irrelevant.

MySQL is lucky to have C2 Appliance working on just such a technology, which will be released some time next year.

Check out www.c2app.com

Leave a comment

About this Post

This page contains a single post by Michael Stonebraker published on September 4, 2007 10:58 AM.

Welcome to "The Database Column" was the previous entry in this blog.

Good things come in small packages: The advantage of compression in column databases is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.