Persistent Storage for Amazon EC2

All Things Distributed Now Go Build! Articles @werner

Persistent Storage for Amazon EC2

April 13, 2008 • 607 words

I would like to introduce to you the newest feature of Amazon EC2: Persistent local storage. This has been very high on the request list of EC2 customers and I believe that combined with the Availability Zones and Elastic IP Address features released earlier this month this makes EC2 the ideal environment for building highly scalable and reliable applications.

Significant innovation has gone into this feature: Instead of restricting developers to the use of a particular (distributed) file-system we once again decided to look at what is the most fundamental building block and how we could offer that in the most scalable and reliable manner.

Persistent storage for Amazon EC2 will be offered in the form of storage volumes which you can mount into your EC2 instance as a raw block storage device. It basically looks like an unformatted hard disk. Once you have the volume mounted for the first time you can format it with any file system you want or if you have advanced applications such as high-end database engines, you could use it directly.

Developers can create any number of volumes they want, in size ranging from 1 GB to 1TB. This volume will be created within a specified Availability Zone and will be accessible by your EC2 instances running in that Availability Zone. As to be expected with a volume abstraction only one instance can have the volume mounted at any given time. Volumes can migrate and be reattached to other instances if necessary for failure handling or application migration reasons.

The consistency of data written to this device is similar to that of other local and network-attached devices; it is under control of the developer when and how to force flush data to disk if you want to bypass the traditional lazy-writer functionality in the operating systems file-cache. Because of the session oriented model for access to the volume you do not need to worry about eventual consistency issues.

Snapshots

If we would have stopped here that would have already been quite a solid service for developers to use. We realized we needed to do more to make sure that developers could build truly geo-scalable applications. For that we introduced snapshot functionality: you ask the EC2 to make a snapshot of your volume and store it into Amazon S3. You can use this for long term backup purposes, for use in rollback strategies, but also for (world-wide) volume re-creation purposes.

When you create a volume you can ask it to be created from a particular snapshot. And because this snapshot is stored in S3, which is accessible in all Availability Zones, your new volume can be created in any zone, not just the one where the snapshot originated from.

The snapshot is extremely powerful technology and allows for building highly fault-tolerant applications operating world-wide. Combine these snapshots with Availability Zones and Elastic IPs and you have all the tools to manage and migrate even the most complex of applications.

And the great thing is it that it is all done with using standard technologies such that you can use this with any kind of application, middleware or any infrastructure software, whether it is legacy or brand new.

Early access

This new functionality is already being used privately by a handful of customers, and will be publically available later this year. We are talking about this service at this early stage because we believe this will help many of our EC2 customers with setting their development priorities for this year.

You can find more information at the AWS developer’s blog.

update:Thorsten from RightScale, who has been using the service, writes about his experiences