Persistent Storage for Amazon EC2
I would like to introduce to you the newest feature of Amazon EC2: Persistent local storage. This has been very high on the request list of EC2 customers and I believe that combined with the Availability Zones and Elastic IP Address features released earlier this month this makes EC2 the ideal environment for building highly scalable and reliable applications.
Significant innovation has gone into this feature: Instead of restricting developers to the use of a particular (distributed) file-system we once again decided to look at what is the most fundamental building block and how we could offer that in the most scalable and reliable manner.
Persistent storage for Amazon EC2 will be offered in the form of storage volumes which you can mount into your EC2 instance as a raw block storage device. It basically looks like an unformatted hard disk. Once you have the volume mounted for the first time you can format it with any file system you want or if you have advanced applications such as high-end database engines, you could use it directly.
Developers can create any number of volumes they want, in size ranging from 1 GB to 1TB. This volume will be created within a specified Availability Zone and will be accessible by your EC2 instances running in that Availability Zone. As to be expected with a volume abstraction only one instance can have the volume mounted at any given time. Volumes can migrate and be reattached to other instances if necessary for failure handling or application migration reasons.
The consistency of data written to this device is similar to that of other local and network-attached devices; it is under control of the developer when and how to force flush data to disk if you want to bypass the traditional lazy-writer functionality in the operating systems file-cache. Because of the session oriented model for access to the volume you do not need to worry about eventual consistency issues.
Snapshots
If we would have stopped here that would have already been quite a solid service for developers to use. We realized we needed to do more to make sure that developers could build truly geo-scalable applications. For that we introduced snapshot functionality: you ask the EC2 to make a snapshot of your volume and store it into Amazon S3. You can use this for long term backup purposes, for use in rollback strategies, but also for (world-wide) volume re-creation purposes.
When you create a volume you can ask it to be created from a particular snapshot. And because this snapshot is stored in S3, which is accessible in all Availability Zones, your new volume can be created in any zone, not just the one where the snapshot originated from.
The snapshot is extremely powerful technology and allows for building highly fault-tolerant applications operating world-wide. Combine these snapshots with Availability Zones and Elastic IPs and you have all the tools to manage and migrate even the most complex of applications.
And the great thing is it that it is all done with using standard technologies such that you can use this with any kind of application, middleware or any infrastructure software, whether it is legacy or brand new.
Early access
This new functionality is already being used privately by a handful of customers, and will be publically available later this year. We are talking about this service at this early stage because we believe this will help many of our EC2 customers with setting their development priorities for this year.
You can find more information at the AWS developer’s blog.
update:Thorsten from RightScale, who has been using the service, writes about his experiences
2 TrackBacks
Listed below are links to blogs that reference this entry: Persistent Storage for Amazon EC2.
TrackBack URL for this entry: http://mt.vogels.net/mt-tb.cgi/118
Talk about serendipity... I was just talking about using EC2 and S3 for backups and along comes a solution to actually make it easier to build. Without persistent storage, it would be hard to persist the backed up data to... Read More
Werner Vogels, CTO at Amazon, announces on his corporate blog a new major feature that the Elastic Compute Read More

Having tested the new storage volumes I can say only one thing: you'll love them! They really raise the EC2 offering to the next level. It will surpass non-cloud computing not only in scale and price but also in features. Yay!
More thoughts on how the storage volumes will change the game in my blog post at http://blog.rightscale.com/2008/04/13/amazon-takes-ec2-to-the-next-level-with-persistent-storage-volumes/
Thorsten - CTO RightScale
--- SNIP ---
As to be expected with a volume abstraction only one instance can have the volume mounted at any given time.
--- SNIP ---
Why? I would expect you'd be able to mount the block device in more than one place and run software like ocfs or gfs on the device. That would be great!
So what is the downside.
I mean is it like a local FileSystem. can I do java.io.File file = new File();
...just hope there's some amount of "included" storage with your instance based on its size. It's pretty crappy if they make you pay extra for something like this (within reason). It should be the same space you get on the instance. Extra for extra charge...but I think we can just continue to use S3 if we need more than what comes with the instance...to go over 60, or 120 or whatever GB, you'd have to be storing videos and other media...at which point you should be using S3...and if it's NOT media, then you're running the type of site that should be able to easily cover those extra expenses to go and increase that persistent storage amount.
This is simply fantastic.
The only problem I see, is not being available right now :)
I've been using EC2 servers/intances for a couple of months and I must say I'm very happy with the service. Persistent storage has been highly request by EC2 customers and, as normally, Amazon is listening.
I just don't like the "later this year"...
Great Job, Werner. This is really fantastic. I've been running datacenters for 10 years, and I haven't been this excited about a new technology in a long time.
Keep up the great work, guys.
--Chris
"only one instance can have the volume mounted at any given time"
Does this mean i have no way of creating a common file system across instances? If this is true, this is not good - i need to access common files across instances, how can i do that (without using S3).