Expanding the Cloud - Amazon S3 Reduced Redundancy Storage

All Things Distributed Now Go Build! Articles @werner

Expanding the Cloud - Amazon S3 Reduced Redundancy Storage

May 18, 2010 • 781 words

Today a new storage option for Amazon S3 has been launched: Amazon S3 Reduced Redundancy Storage (RRS). This new storage option enables customers to reduce their costs by storing non-critical, reproducible data at lower levels of redundancy. This has been an option that customers have been asking us about for some time so we are really pleased to be able to offer this alternative storage option now.

Durability in Amazon S3

Amazon Simple Storage Service (S3) was launched in 2006 as "Storage for the Internet" with the promise to make web-scale computing easier for developers. Four years later it stores over 100 billion objects and routinely performs well over 120,000 storage operations per second. Its use cases span the whole storage spectrum; from enterprise database backup to media distribution, from consumer file storage to genomics datasets, from image serving for websites to telemetry storage for NASA.

Core to its success has been its simplicity: no matter how many objects you want to store, how small or big those objects are, or what object access patterns you have to deal with, you can rely on Amazon S3 to take away the headaches of dealing with the hard issues typically associated with big storage systems. All the complexities of scalability, reliability, durability, performance and cost-effectiveness are hidden behind a very simple programming interface.

Under the covers Amazon S3 is a marvel of distributed systems technologies. It is the ultimate incrementally scalable system; simply by adding resources it can handle scaling needs in storage and performance dimensions. It also needs to handle every possible failure of storage devices, of servers, networks and operating systems, all while continuing to serve hundreds of thousands of customers.

The goal for operational excellence in Amazon S3 (and for all other AWS services) is that it should be "indistinguishable from perfect". While individual components may be failing all the time, as is normal in any large scale system, the customers of the service should be completely shielded from this. For example to ensure availability of data, the data is replicated over multiple locations such that failure modes are independent of each other. The locations are chosen with great care to achieve this independence and if one or more of those locations becomes unreachable, S3 can continue to serve its customers. Some of the biggest innovations inside Amazon S3 have been how to use software techniques to mask many of the issues that would easily have paralyzed every other storage system.

The same goes for durability; core to the design of S3 is that we go to great lengths to never, ever lose a single bit. We use several techniques to ensure the durability of the data our customers trust us with, and some of those (e.g. replication across multiple devices and facilities) overlap with those we use for providing high-availability. One of the things that S3 is really good at is deciding what action to take when failure happens, how to re-replicate and re-distribute such that we can continue to provide the availability and durability the customers of the service have come to expect. These techniques allow us to design our service for 99.999999999% durability.

Relaxing Durability

/h3 There are many innovative techniques we deploy to provide this durability and a number of them are related to the redundant storage of data. As such, the cost of providing such a high durability is an important component of the storage pricing of S3. While this high durability is exactly what most customers want, there are some usage scenarios where customers have asked us if we could relax the durability in exchange for a reduction in cost. In these scenarios, the customers are able to reproduce the object if it would ever be lost, either because they are storing another authoritative copy or because they can reconstruct the object from other sources.

We can now offer these customers the option to use Amazon S3 Reduced Redundancy Storage (RRS), which provides 99.99% durability at significantly lower cost. This durability is still much better than that of a typical storage system as we still use some forms of replication and other techniques to maintain a level of redundancy. Amazon S3 is designed to sustain the concurrent loss of data in two facilities, while the RRS storage option is designed to sustain the loss of data in a single facility. Because RRS is redundant across facilities, it is highly available and backed by the Amazon S3 Service Level Agreement.

More details on the new option and its pricing advantages can be found on the Amazon S3 product page. Other insights in the use of this option can be read on the AWS developer blog.