Traditionally records in a database were stored as such: the data in a row was stored together for easy and fast retrieval. Not everybody agreed that the "N-ary Storage Model" (NSM) was the best approach for all workloads but it stayed dominant until hardware constraints, especially on caches, forced the community to revisit some of the alternatives. Combined with the rise of data warehouse workloads, where there is often significant redundancy in the values stored in columns, and database models based on column oriented storage took off. The first practical modern implementation is probably C-Store by Stonebraker, et al. in 2005. There is a great tutorial by Harizopoulos, Abadi and Boncz from VLDB 2009 that takes you through the history, trade-off's and the state of the art. Many of the modern high-performance data warehouses such as Amazon Redshift are based on column stores.

But the groundwork for Column Oriented Databases was laid in 1985 when George Copeland and Setrag Koshafian questioned the NSM with their seminal paper on a "Decomposition Storage Model" (DSM). From the abstract:

There seems to be a general consensus among the database community that the n-ary approach is better This conclusion is usually based on a consideration of only one or two dimensions of a database system The purpose of this report is not to claim that decomposition is better Instead, we claim that the consensus opinion is not well founded and that neither is clearly better until a closer analysis is made along the many dimensions of a database system The purpose of this report is to move further in both scope and depth toward such an analysis We examine such dimensions as simplicity, generality, storage requirements, update performance and retrieval performance

A Decomposition Storage Model, George P. Copeland and Setrag N. Khoshafian, in the Proceedings of the 1985 SIGMOD International Conference on Management of Data

Comments

blog comments powered by Disqus