50 X

All Things Distributed Now Go Build! Articles @werner

September 28, 2007 • 431 words

In the recent years Mike Stonebraker has been advocating that the current commercial databases have become one-size-fits-all tools that are so general and heavy-weight that they do not excel in any particular area. Mike has written some papers on this topic of comparing general databases with specialized storage engines in areas of data-warehousing, text search, stream processing and scientific processing. In these papers he demonstrates that these specialized engines can run orders of magnitude faster than the commercial databases, using commodity hardware. I am not sure all the comparisons are fair, but he makes a compelling case. Of course Mike is a principal in Vertica which directly competes with what he calls “The Elephants”. Given that I have been close in the past to another example about how commercial goals can cloud academic judgment, am I rather cautious in evaluating these papers.

This week at VLDB Mike gave an invited talk on this topic in the Industrial Research track. His talk centered on that while he had previously proven that specialized approaches could run circles around the Elephants, he now could also demonstrate that OLTP, which is the bread-and-butter of the database industry, could be greatly outperformed by new architectures. In the paper and in his talk he put out a challenge to the research community, and to graduate students in particular, to take a particular interesting application area and build specialized solutions that provide at least a 50X improvement over the current products.

I like this challenge, given that 50X is likely to be able to make impact, where 2-4X in general can be easily compensated for by the next generation hardware. But something bugs me about the challenge and also about some of the demonstrations in the papers; 50X is still focused on scaling-up, just as many of the current database systems do, instead of scaling out, which is what the world really needs. The evidence in the paper is indeed about single box performance. This continuing N=1 thinking will never yield systems that can break through the current scalability limitations of enterprise software, regardless whether it runs 50 times faster or not.

If I could rewrite the challenge I would go for asking for “demonstrating performance at scale”; That one can achieve the rock solid performance and reliability guarantees by just incrementally increasing the components in the system, without any limitations. And every scaling axis needs to be satisfied; request rates, request complexity, data sets size, etc.

Only focusing on 50X just gives you faster Elephants, not the revolutionary new breeds of animals that can serve us better.