December 2008 Archives
I wrote a first version of this posting on consistency models about a year ago, but I was never happy with it as it was written in haste and the topic is important enough to receive a more thorough treatment. ACM Queue asked me to revise it for use in their magazine and I took the opportunity to improve the article. This is that new version.
Eventually Consistent - Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability.
At the foundation of Amazon's cloud computing are infrastructure services such as Amazon's S3 (Simple Storage Service), SimpleDB, and EC2 (Elastic Compute Cloud) that provide the resources for constructing Internet-scale computing platforms and a great variety of applications. The requirements placed on these infrastructure services are very strict; they need to score high marks in the areas of security, scalability, availability, performance, and cost effectiveness, and they need to meet these requirements while serving millions of customers around the globe, continuously.
Under the covers these services are massive distributed systems that operate on a worldwide scale. This scale creates additional challenges, because when a system processes trillions and trillions of requests, events that normally have a low probability of occurrence are now guaranteed to happen and need to be accounted for up front in the design and architecture of the system. Given the worldwide scope of these systems, we use replication techniques ubiquitously to guarantee consistent performance and high availability. Although replication brings us closer to our goals, it cannot achieve them in a perfectly transparent manner; under a number of conditions the customers of these services will be confronted with the consequences of using replication techniques inside the services.
One of the ways in which this manifests itself is in the type of data consistency that is provided, particularly when the underlying distributed system provides an eventual consistency model for data replication. When designing these large-scale systems at Amazon, we use a set of guiding principles and abstractions related to large-scale data replication and focus on the trade-offs between high availability and data consistency. In this article I present some of the relevant background that has informed our approach to delivering reliable distributed systems that need to operate on a global scale. An earlier version of this text appeared as a posting on the All Things Distributed weblog in December 2007 and was greatly improved with the help of its readers.
A question I get asked frequently is how working in industry is different from working in academia. My answer from the beginning has been that the main difference is teamwork. While in academia there are collaborations among faculty and there are student teams working together, the work is still rather individual, as is the reward structure. In industry you cannot get anything done without teamwork. Products do not get build by individuals but by teams; definition, implementation, delivery and operation are all collaborative processes that have many people from many different disciplines working together.
As such the Information Week's Chief of the Year award cannot be my award. It is an award for all the Amazonians who in the past years have developed technologies and processes that are so innovative that they have defined a whole business landscape: first in ecommerce and now with Amazon Web Services they are defining Cloud Computing through the delivery of Infrastructure as a Service. Compared to the immense work that was needed to make all of this work, my involvement has been small.
A relentless focus on innovation by all Amazonians has made this possible: from new hardware development to the definition of new business models, from building ultra-reliable storage services to a massively scalable compute cloud, from pervasive monitoring and performance control to revolutionary efficient software architectures. At a scale and with reliability, performance and cost-effectiveness that is unparalleled in today's technology world. All these advances are based on 13 years of experience with building the world's most customer centric ecommerce operation, and as such the success of AWS is absolutely not the work of a single individual but the success of all Amazonians.
But this is only the beginning. We are intent on building the world's most customer-centric cloud computing operation and, as we have done with ecommerce, we will not accept the old norms of what must be done. We will always focus on what our customers need and work backwards from there. We will continue to innovate and roll out services and features that address the real needs of our customers.
It is still only Day One...
Starting today the Amazon Elastic Computing Cloud (EC2) supports the ability to launch instances in multiple geographically distinct regions. The new EU region enables users to launch instances in Europe.
This addresses the requests from many our European customers and from companies that want to run instances closer to European customers. Over the past year I have visited with many of our European customers and frequently they remarked "if only we had EC2 in Europe". We heard their requests loud and clear and have worked very hard to roll out the European Region. This is a very important milestone on the road to local access to all our services.
These are three of the main drivers for the requests by our customers
- Lower latency from EC2 instances to their clients. The European Region can be accessed with low latency from all major European network hubs.
- Low latency access to data stored in the Amazon Simple Storage Service (S3). A large number of customers have stored data into the European Region of Amazon S3. With the new European region this data can now be accessed with low latency from within EC2 at no cost
- Regulatory requirements may require that data be stored in the EU and/or processing take place within the EU. With the European Regions of Amazon S3 and Amazon EC2 developers now can address those requirements.
The new European Region will also contain two Availability Zones such that developers can build applications that can tolerate a variety of failure scenarios. One can even develop fail-over scenarios that will span multiple continents. Amazon Elastic Block Storage will also be available to our customers that launch instances in the European Region.
With the European Regions of Amazon EC2, S3 and SQS, combined with Amazon CloudFront, developers now have a full set of services that can help them address the European market.
I am very excited about the launch of the Amazon EC2 in European and I am looking forward to work with our European partners and customers to roll out their applications and services in the EU Region.
More details on the Amazon EC2 detail page , the AWS blog and at RightScale