Using the Cloud to build highly-efficient systems

| | Comments (7) | TrackBacks (5)

These are times where many companies are focusing on the basics of their IT operations and are asking themselves how they can operate more efficiently to make sure that every dollar is spent wisely. This is not the first time that we have gone through this cycle, but this time there are tools available to CIOs and CTOs that help them to manage their IT budgets very differently. By using infrastructure as a service, basic IT costs are moved from a capital expense to a variable cost, building clearer relationships between expenditures and revenue generating activities. CFOs are especially excited about the premise of this shift.

In recent weeks in my discussions with many of our Amazon Web Services customers I have seen a heightened interest in moving functionality into the AWS cloud to get a better grasp on controlling cost. And this is across the board; from young businesses to Fortune 500 enterprises, from research labs to television networks, all are concerned about reducing upfront cost associated with the new ventures and reducing waste in existing operations. Most of them point to 3 properties of the Amazon Web Services model that helps them become more efficient:

  1. The pay-as-you-go model. There are significant advantages to this model for efficiency as one only pays for those resources one has actually consumed. If the application scales along the right revenue generating dimensions these costs will be in line with the revenue being generated.

  2. Managing peak capacity. Many IT organizations need to maintain extra capacity for anticipated peak loads, capacity that sits idle for most of the time. These peak loads can be driven by customer demand such as in the online world, but it can also be capacity required to execute essential IT tasks such as periodic document indexing or business tasks such as closing the books at the end of a quarter. This is often the first step that our enterprise customers take to become familiar with using infrastructure as a service. After successfully running some of their peaks jobs they will then starting moving more permanent processing into the cloud.

    A great example in the online world is the Indy 500 organization that normally runs 50 servers to serve their customers, but during the races move all of their processing into Amazon EC2 to handle all traffic no matter how many hundreds of thousands of customers show up at the same time. The savings for the Indy IT budget during the races this spring was over 50%.

  3. Higher reliability at lower cost. Negotiating several contracts with different datacenter and network providers to make sure the IT tasks can survive complex failure scenarios is a difficult task and many organizations find it hard to achieve this in a cost efficient manner. Amazon EC2 with its Regions and Availability Zones gives its customers access to several high-end datacenters with highly redundant networking capabilities at a single pricing model, without any negotiations.

Amazon's efficiency principles

At Amazon we have a long history of implementing our services in a highly efficient manner. Whether these are our infrastructure services or our high-level ecommerce services, frugality is essential in our retail business. Margins in a retail business are traditionally small and these constraints have driven major innovations in the way that we manage our IT capacity. We have developed a lot of expertise in building highly efficient architectures to support Amazon's goal of providing our customer with products at low prices. Every savings we have been able to make in our IT cost we have been able to give back to our customers in terms of lowering prices. This tradition of letting customers benefit from our cost saving is something that we also apply to our Amazon Web Services business. When earlier this year we were able to negotiate better deals with our network providers we immediately reduced the bandwidth cost for our customers.

But we have learned at Amazon that having a low cost infrastructure is only the starting point of being as efficient as possible. You need to make sure that your applications will make use of the infrastructure in an adaptive and scalable manner to achieve a high degree of efficiency. In the Amazon architecture being incrementally scalable is key. This means that services' and applications' main course of action to handle increasing load or larger datasets is to grow one unit at a time. A more precise definition can be found here.

All services at Amazon are built to be horizontally scalable. An efficient request routing mechanism delivers requests to services in a manner that optimizes performance at a certain efficiency point. Capacity is acquired and released on short time frames to handle increase and decreases in resource usage. To achieve this principle of automatic scaling our services there are four basic components that need to work together:

  1. Elastic Compute Capacity. The basic resources required to execute our services and applications need to be able to grow and shrink at a moment's notice in a fully automated fashion. This is the fundamental premise behind Amazon EC2; whenever an Amazon service requires additional capacity it can use a simple API call to acquire additional capacity without any interference from operators or data techs, and can release it when no longer needed.

  2. Monitoring. We relentlessly measure every possible resource usage parameter, every application counter, and every customer's experience. Many gigabits per second of monitoring data flows continuously through the Amazon networks to make sure that our customers are getting serviced at the levels they can expect and at an efficiency level the business desires. We don't really care that much about averages or medians, for us performance at the 99.9 percentile is important to make sure that all our customers get the right experience.

  3. Load balancing. Using the monitoring information we route requests intelligently, using several algorithms, to those services instances that can provide responses with the expected performance. In reality balancing the load is a secondary task of the request routing system as it is the customer's experience we are most driven by. The optimization quest is to deliver the right customer experience at the optimal resource utilization.

  4. Automatic scaling. Using Monitoring data, Load balancing and EC2, the auto-scaling service monitors service health and performance, brings more capacity on-line if needed or reduces the number of instances to meet efficiency goals. It spreads instances over multiple availability zones and regions to achieve the desired reliability guarantees. All without interference of developers or operators.

These four services are the core of Amazon's highly efficient infrastructure that has allowed us to drive our IT costs to the floor for our retail operations.

scale LB monitoring

Building highly-efficient systems on AWS

To make sure our customers can also benefit from our experience in building highly efficient systems we have decided to release versions of these services on the Amazon Web Services platform. The Monitoring, Load Balancing and Auto-Scaling services will be combined with a Management Console that provides a simple, point-and-click web interface that lets you configure, manage and access your AWS cloud resources.

They will first be released in private beta and you can express your interest in that program on the AWS web site. More details can be found in the posting on the Amazon Web Services blog.

Graphic by Renato Valdés Olmos of Postmachina

5 TrackBacks

Listed below are links to blogs that reference this entry: Using the Cloud to build highly-efficient systems.

TrackBack URL for this entry: http://mt.vogels.net/mt-tb.cgi/128

» And now for the enterprise ... from Rough Type: Nicholas Carr's Blog

For Amazon.com's utility-computing operation, Amazon Web Services, 2009 will be a crucial year, as the company is looking to expand beyond its traditional customer base of web developers and other small-scale businesses and push its services into the e... Read More

Amazon AWS Today is a big day for Amazon EC2- now production ready with SLA, windows support and 4 new capabilities. All these long awaited features are set to go live today.Which in effect can be summarized as Amazon EC2 is no more experimental with ... Read More

» Amazon EC2 comes out of beta from CloudPundit: Massive-Scale Computing

Amazon made a flurry of EC2 announcements today. First off, EC2 is now out of beta, which means that there’s now a service-level agreement. It’s a 99.95% SLA, where downtime is defined as two or more Availability Zones within the same regio... Read More

» Amazon's new EC2 SLA from O'Reilly Radar

Amazon announced a new SLA for EC2, similar to the one for S3. This is a notable step for Amazon and cloud computing as a whole, as it establishes a new bar for utility computing services. Amazon is committing to 99.95% availability for the EC2 service... Read More

TODO: checkout "Elasticfox" Amazon Web Services Blog: Big Day for Amazon EC2: Production, SLA, Windows, and 4 New Capabilities Production - After a two year beta period, Amazon EC2 is now ready for production. During the beta we heard a... Read More

7 Comments

Hello Werner,

Where can we find more information about your _global_ "availability zones and regions" and find out whether we are covered or not?

Thanks for the post.

Kemal Ispirli

tross said:

Wow, "The Monitoring, Load Balancing and Auto-Scaling services will be combined with a Management Console that provides a simple, point-and-click web interface that lets you configure, manage and access your AWS cloud resources."

How do partners in your AWS developer ecosystem like RightScale feel about this? Are you competing with your value added partners?

Phil Dewey said:

While I applaud the SLA and "beta" tag removal, it looks on the surface like AWS wants to compete with it’s most loyal customers - i.e. Rightscale, CohesiveFT, Ylastic, SOASTA and Informascale. Does AWS intend to develop apps a la Google? Is there a business for VARS on the AWS platform? Is there something in the water in Seattle that causes large technology customers to purposely screw their ISVs?

First, thanks Werner for the exciting announcement. The pace of innovation you are driving continues to make the cloud market the most exciting corner of the technology world to be in -- especially in the tough economic times we've entered.

Second, I'd like to say thanks to all RightScale's defenders, but I don't think we need defending here ;-) We've had a long and cooperative relationship with many, many people at Amazon (including Werner) and expect that to continue. So, I'd like to clarify how we regard this news as an Amazon partner.

Really, nothing that Werner announces above is unexpected. Load balancing at the cloud infrastructure level has been on our wish list since the beginning, and we're agnostic about monitoring. In fact, we'll provide support for both those new services within RightScale when they launch. All part of our philosophy to let our customers choose.

Also, the auto-scaling described above simply connects monitored events with the launching of instances. In our view, that's important, but only part of the puzzle. The more challenging task is to provide an overall system design and architecture where the auto-launched instances configure themselves into cohesive, resilient clusters, as well as to offer a development environment that supports workflow and lifecycle features such as versioning. That is what RightScale offers through it's server template architecture, and why our management platform has been much more than a dashboard for some time -- offering a complete solution with pre-packaged software stacks, training & support to give companies a quick and effective onramp to the cloud. For more detail on today's announcement in relation to our platform, check out blog.rightscale.com.

Finally, with due respect and admiration to Amazon for it's truly brilliant pioneering role in establishing cloud computing and storage, as Werner's colleague, AWS VP Adam Selipsky, has stated, "Any time you have a large and attractive market, you'll have more than one winner." In other words, it's become a multi-cloud world. We expect that RightScale's role as a neutral provider of multi-cloud support and portability will preserve our distinct place in the cloud management market as an enabling platform for both customers and ISVs -- even as we continue to be a cooperative partner for Amazon.

I am not sure at what level of granularity an "application counter" is and whether it indeed constitutes an application counter from the customer point of views (especially if they are using additional middleware technology) but I would expect in the future that customers will want to go beyond this and relate application activities (operations, requests, transactions) with costs via metering and billing runtimes that tie activity to cost drivers and resource usage. Because current cloud computing costs drivers correlate somewhat with performance management the management of the cloud is going to have to integrate performance, capacity, value/quality, and cost management domains otherwise there would appear to be serious lack of control and insight.

I have tried to address this in the following short article that discusses an approach to both performance management and cost management - metering the cloud.

http://www.jinspired.com/products/jxinsight/meteringthecloud.html

William

Thanks for this article. I am working on a state-wide task force consisting of leaders in philanthropy, state officials and teachers whose task it is to come up with a set of policy suggestions for the Governor of Ohio on Preparing Students for 21st Century Skills. I would be very interested to see how your article, (with slight modifications) could be directed to leaders charged with public education in the United States. I can see very clearly how your message applies to other businesses and even IT directors of Colleges and Universities. However,this level of rich detail is missing from conversations about what is required to equip public schools with technology that is appropritate for 21st Century learning. I would be very intersted if you and perhaps Clayton Christensen from the Harvard Business School could come up with suggestions for the next Secretary of Education.

Ricky Ho said:

A general security challenge of running inside a cloud is that most cryptographic algorithm (including SSL) require the “private” key to be stored in a secured way. In a data center scenario, the “private” key is stored in a keystore file, under the protection of the OS, within a physical machine locked in the data center.

But now in a cloud scenario, the machine is virtual and can run on any physical hardware that we (or even Amazon) don’t know. Where do we put our keystore in a secure manner ? Can you offer some guideline in this area ?

Rgds,
Ricky

About this Entry

This page contains a single entry by Werner Vogels published on October 23, 2008 5:31 AM.

Amazon EC2 in Full Production was the previous entry in this blog.

Expanding the Cloud: Amazon CloudFront is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.