Expanding the Cloud - Cluster Compute Instances for Amazon EC2

All Things Distributed Now Go Build! Articles @werner

Expanding the Cloud - Cluster Compute Instances for Amazon EC2

July 13, 2010 • 1040 words

Today, Amazon Web Services took very an important step in unlocking the advantages of cloud computing for a very important application area. Cluster Computer Instances for Amazon EC2 are a new instance type specifically designed for High Performance Computing applications. Customers with complex computational workloads such as tightly coupled, parallel processes, or with applications that are very sensitive to network performance, can now achieve the same high compute and networking performance provided by custom-built infrastructure while benefiting from the elasticity, flexibility and cost advantages of Amazon EC2

During my academic career, I spent many years working on HPC technologies such as user-level networking interfaces, large scale high-speed interconnects, HPC software stacks, etc. In those days, my main goal was to take the advances in building the highly dedicated High Performance Cluster environments and turn them into commodity technologies for the enterprise to use. Not just for HPC but for mission critical enterprise systems such as OLTP. Today, I am very proud to be a part of the Amazon Web Services team as we truly make HPC available as an on-demand commodity for every developer to use.

HPC and Amazon EC2

Almost immediately after the launch of Amazon EC2, our customers started to use it for High Performance Computing. Early users included some Wall Street firms who knew exactly how to balance the scale of computation against the quality of the results they needed to create a competitive edge. They have run thousands of instances of complex Monte Carlo simulations at night to determine how to be ready at market open. Other industries using Amazon EC2 for HPC-style workloads include pharmaceuticals, oil exploration, industrial and automotive design, media and entertainment, and more.

Further computationally intensive, highly parallel workloads have found their way to Amazon EC2 as businesses have explored using HPC types of algorithms for other application categories, for example to to process very large unstructured data sets for Business Intelligence applications. This has led to strong growth in the popularity of Hadoop and Map Reduce technologies, including Amazon Elastic Map Reduce as a tool for making it easy to run these compute jobs by taking away much of the heavy lifting normally associated with running large Hadoop clusters.

As much as Amazon EC2 and Elastic Map Reduce have been successful in freeing some HPC customers with highly parallelized workloads from the typical challenges of HPC infrastructure in capital investment and the associated heavy operation lifting, there were several classes of HPC workloads for which the existing instance types of Amazon EC2 have not been the right solution. In particular this has been true for applications based on algorithms - often MPI-based - that depend on frequent low-latency communication and/or require significant cross sectional bandwidth. Additionally, many high-end HPC applications take advantage of knowing their in-house hardware platforms to achieve major speedup by exploiting the specific processor architecture. There has been no easy way for developers to do this in Amazon EC2... until today.

Introducing Cluster Compute Instances for Amazon EC2

Cluster Computer Instances are similar to other Amazon EC2 instances but have been specifically engineered to provide high performance compute and networking. Cluster Compute Instances can be grouped as cluster using a "cluster placement group" to indicate that these are instances that require low-latency, high bandwidth communication. When instances are placed in a cluster they have access to low latency, non-blocking 10 Gbps networking when communicating the other instances in the cluster.

Next, Cluster Compute Instances are specified down to the processor type so developers can squeeze optimal performance out of them using compiler architecture-specific optimizations. At launch Cluster Computer Instances for Amazon EC2 will have 2 Intel Xeon X5570 (also known as quad core i7 or Nehalem) processors.

Unlocking the benefits of the cloud for the HPC community

Cluster Compute Instances for Amazon EC2 change the HPC game for two important reasons:

Access to scalable on-demand capacity

Dedicated High Performance Compute clusters require significant capital investments and their procurement often has longer lead times than many enterprise class server systems. This means that most HPC systems are constrained in one or more dimensions by the time that they become operational. As a result, HPC systems are typically shared resources with long queues of jobs waiting to be executed and many users have to be very careful not to exceed their capacity allocation for that period. Cluster Compute Instances for Amazon EC2 bring the advantages of cloud computing to this class of High Performance Computing removing the need for upfront capital investments and giving users access to HPC capacity on demand at the exact scale that they require for their application.

Commoditizing the management of HPC resources

Traditionally we have thought about HPC as the domain of extreme specialism; the kind of research that only happens at places such as the US National Research Labs. These labs have access to some of the world's fastest supercomputers and have dedicated staff to keep them running and fully loaded 365 days a year. But there is a much larger category of potential HPC users beyond these top supercomputer specialists, including even within the more mid-range HPC workloads of these labs such as Lawrence Berkeley National Labs where they have been early successful users of the new Cluster Compute Instances.

Many organizations are using mid-range HPC systems for their enterprise and research needs. Given the specialized nature of these platforms, they require dedicated resources to maintain and operate and put a big burden on the IT organization. Cluster Compute Instances for Amazon EC2 removes the heavy lifting of the operational burden typically associated with HPC systems. There is no more need for hardware tinkering to keep the clusters up and running (I spent many nights doing this; there is no glory in it). These instance types are managed exactly the way other Amazon EC2 instances are managed which allows users to capitalize on the investment they have already made in that area.

More information

If you are looking for more information on Cluster Compute Instances for Amazon EC2 visit our HPC Solutions page. Jeff Barr in his blog post on the AWS developer blog has additional details and there are some great testimonials of early Cluster Compute Instances customers in the press release.