Expanding the Cloud - Cluster Compute Instances for Amazon EC2

| | Comments (4) | TrackBacks (2)

Today, Amazon Web Services took very an important step in unlocking the advantages of cloud computing for a very important application area. Cluster Computer Instances for Amazon EC2 are a new instance type specifically designed for High Performance Computing applications. Customers with complex computational workloads such as tightly coupled, parallel processes, or with applications that are very sensitive to network performance, can now achieve the same high compute and networking performance provided by custom-built infrastructure while benefiting from the elasticity, flexibility and cost advantages of Amazon EC2

During my academic career, I spent many years working on HPC technologies such as user-level networking interfaces, large scale high-speed interconnects, HPC software stacks, etc. In those days, my main goal was to take the advances in building the highly dedicated High Performance Cluster environments and turn them into commodity technologies for the enterprise to use. Not just for HPC but for mission critical enterprise systems such as OLTP. Today, I am very proud to be a part of the Amazon Web Services team as we truly make HPC available as an on-demand commodity for every developer to use.

HPC and Amazon EC2

Almost immediately after the launch of Amazon EC2, our customers started to use it for High Performance Computing. Early users included some Wall Street firms who knew exactly how to balance the scale of computation against the quality of the results they needed to create a competitive edge. They have run thousands of instances of complex Monte Carlo simulations at night to determine how to be ready at market open. Other industries using Amazon EC2 for HPC-style workloads include pharmaceuticals, oil exploration, industrial and automotive design, media and entertainment, and more.

Further computationally intensive, highly parallel workloads have found their way to Amazon EC2 as businesses have explored using HPC types of algorithms for other application categories, for example to to process very large unstructured data sets for Business Intelligence applications. This has led to strong growth in the popularity of Hadoop and Map Reduce technologies, including Amazon Elastic Map Reduce as a tool for making it easy to run these compute jobs by taking away much of the heavy lifting normally associated with running large Hadoop clusters.

As much as Amazon EC2 and Elastic Map Reduce have been successful in freeing some HPC customers with highly parallelized workloads from the typical challenges of HPC infrastructure in capital investment and the associated heavy operation lifting, there were several classes of HPC workloads for which the existing instance types of Amazon EC2 have not been the right solution. In particular this has been true for applications based on algorithms - often MPI-based - that depend on frequent low-latency communication and/or require significant cross sectional bandwidth. Additionally, many high-end HPC applications take advantage of knowing their in-house hardware platforms to achieve major speedup by exploiting the specific processor architecture. There has been no easy way for developers to do this in Amazon EC2... until today.

Introducing Cluster Compute Instances for Amazon EC2

Cluster Computer Instances are similar to other Amazon EC2 instances but have been specifically engineered to provide high performance compute and networking. Cluster Compute Instances can be grouped as cluster using a "cluster placement group" to indicate that these are instances that require low-latency, high bandwidth communication. When instances are placed in a cluster they have access to low latency, non-blocking 10 Gbps networking when communicating the other instances in the cluster.

Next, Cluster Compute Instances are specified down to the processor type so developers can squeeze optimal performance out of them using compiler architecture-specific optimizations. At launch Cluster Computer Instances for Amazon EC2 will have 2 Intel Xeon X5570 (also known as quad core i7 or Nehalem) processors.

Unlocking the benefits of the cloud for the HPC community

Cluster Compute Instances for Amazon EC2 change the HPC game for two important reasons:

Access to scalable on-demand capacity

Dedicated High Performance Compute clusters require significant capital investments and their procurement often has longer lead times than many enterprise class server systems. This means that most HPC systems are constrained in one or more dimensions by the time that they become operational. As a result, HPC systems are typically shared resources with long queues of jobs waiting to be executed and many users have to be very careful not to exceed their capacity allocation for that period. Cluster Compute Instances for Amazon EC2 bring the advantages of cloud computing to this class of High Performance Computing removing the need for upfront capital investments and giving users access to HPC capacity on demand at the exact scale that they require for their application.

Commoditizing the management of HPC resources

Traditionally we have thought about HPC as the domain of extreme specialism; the kind of research that only happens at places such as the US National Research Labs. These labs have access to some of the world's fastest supercomputers and have dedicated staff to keep them running and fully loaded 365 days a year. But there is a much larger category of potential HPC users beyond these top supercomputer specialists, including even within the more mid-range HPC workloads of these labs such as Lawrence Berkeley National Labs where they have been early successful users of the new Cluster Compute Instances.

Many organizations are using mid-range HPC systems for their enterprise and research needs. Given the specialized nature of these platforms, they require dedicated resources to maintain and operate and put a big burden on the IT organization. Cluster Compute Instances for Amazon EC2 removes the heavy lifting of the operational burden typically associated with HPC systems. There is no more need for hardware tinkering to keep the clusters up and running (I spent many nights doing this; there is no glory in it). These instance types are managed exactly the way other Amazon EC2 instances are managed which allows users to capitalize on the investment they have already made in that area.

More information

If you are looking for more information on Cluster Compute Instances for Amazon EC2 visit our HPC Solutions page. Jeff Barr in his blog post on the AWS developer blog has additional details and there are some great testimonials of early Cluster Compute Instances customers in the press release.

2 TrackBacks

Listed below are links to blogs that reference this entry: Expanding the Cloud - Cluster Compute Instances for Amazon EC2.

TrackBack URL for this entry: http://mt.vogels.net/mt-tb.cgi/154

Atbrox is startup company providing technology and services for Search and Mapreduce/Hadoop. Our background is from Google, IBM and research. Update 2010-July-13: Can remove towards from the title of this posting today, Amazon just launched cluster com... Read More

Windows Azure, SQL Azure Database and related cloud computing topics now appear in this daily series. Read More

4 Comments

Bill McColl said:

Another important milestone for AWS. Great news. Cloudcel, our massively parallel realtime analytics platform on AWS has very demanding latency requirements, in contrast to offline analytics approaches such as databases and Hadoop. In a recent article "Cloud Computing: The Need For Speed"

http://cloudcomputing.sys-con.com/node/1449800

I asked "Who is going to build the low-latency cloud for enterprise customers?". Seems like, as always these days, the answer is Amazon. At Cloudscale

http://cloudscale.com

we'll certainly be evaluating this new capability, as part of our mission to bring ultra-low-latency cloud computing to the big data space.

Exciting times!

Bill McColl
CEO, Cloudscale

Chris Samuel said:

Interesting, though I'd want to see some MPI latency numbers before I'd call 10GigE low latency - no InfiniBand option ?

Siah said:

Is Amazon going to have a [grad] student discount for their HPC platform. I'd like to use it but I would also like to get it for cheaper than regular EC2

Curt Monash said:

A few questions that come to mind:

1. A developer who optimizes for a certain configuration would like to know it will be available for a while. How long do you expect to support any given hardware configuration?

2. How frequently do you expect to introduce support for new configurations?

The answers to #1 and #2 taken together go a long way toward showing how good an alternative is to developing for hardware one actually buys.

3 & 4. How soon do you expect to bring in solid-state memory? And with what kind of interface? (I vote for some variants on "really soon" and "a lot higher bandwidth than SAS or SATA", as per http://www.dbms2.com/2010/06/25/flash-is-coming-well/)

5. How "Elastic" can one really ever be if there's Big Data involved?

OK, that last one was a little vague ... ;)

About this Entry

This page contains a single entry by Werner Vogels published on July 13, 2010 12:00 AM.

Expanding the Cloud - Amazon S3 Reduced Redundancy Storage was the previous entry in this blog.

Find recent content on the main index or look in the archives to find all content.