I am in the midst of my South America tour in the beautiful but very cold Santiago, Chile.
The AWS team launched this week Amazon Glacier, a cold storage archive service at the very low price point of $0.01 per GB/month. Which makes this week a good moment to read up on some of the historical work around the costs of data engineering.
For this purpose I have picked work based on two papers by Jim Gray, the brilliant IBM / Tandem / Microsoft researcher, who won a Turing award for his contributions to data and transaction processing. The papers are from 1987, 1997 and 2007.
The first paper is a true classic; in “The 5 Minute Rule” Jim together with Gianfranco Putzolu explores the cost trade-offs between holding pages in memory versus doing IO every time the page data would be accessed. Using the true dollar cost of memory and disk drives to come to his conclusions.
The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time, Jim Gray and Gianfranco Putzolu, Proceedings of the ACM SIGMOD Conference, pp. 395–398, 1987
In 1997 Jim revisted his caculations with the help of Goetz Graefe, and it details the impact of 10 years of hardware and pricing progress. Amazingly the 5 Minutes still was in effect.
The Five-Minute Rule Ten Years Later, and Other Computer Storage Rules of Thumb, Jim Gray and Goetz Graefe, ACM SIGMOD Record 26 (4): 63–68, 1997
In 2007 Goetz revisited the results, this time also including SSD storage into the mix, which not surprisingly did change everything.
The Five-Minute Rule 20 Years Later: and How Flash Memory Changes the Rules. Goetz Graefe, ACM Queue 6(4): 40-52 (2008)