
Every year I enjoy travelling to the South-by-South-West (SXSW) festival as it is ons of the biggest event with many Amazon customers present. Thousand of AWS customers and partners will be in Austin for SXSW Interactive and given the free flowing networking it is a very important feedback opportunity for us. But also many Amazon customers will be there for the Film and the Music festival, and I always enjoy getting feedback from those Amazon consumers and producers that are attending these festivals.
The program is always a bit in flux, but here are the events in the beginning of the week that I am taking part in:
- Sunday 3/15 1-2pm - I will give a talk at Techstars on "The History of Microcroservices at Amazon". There will also be a talk and Q&A about Amazon Lamba at noon. Following my talk there will be a reception.
- Sunday 3/15 4-5pm - I will moderate a panel at ff Massive 2015 about "Scaling a Startup" with Shane Snow, Rami Essaid, Trevor Coleman, and Jordan Kretchmer.
- Monday 3/16 9:30am-1:30pm I will be a judge at the HATCH Startup competition.
- Monday 3/16 5-6pm I will do a fireside chat with Valentin Schöndienst the CEO of Move Fast from Berlin at the German House. It is followed by a Meet The Berliners Party co-sponsored by AWS.
I hope to see you there or at one of the many other events I drop in on.

Grapevine was one of the first systems designed to be fully distributed. It was built at the famous Xerox PARC (Palo Alto Research Center) Computer Science Laboratory as an exercise in discovering what is needed as the fundamental building blocks of a distributed system; messaging, naming, discovery, location, routing, authentication, encryption, replication, etc. The origins of the system are described in Grapevine: An Exercise in Distributed Computing by researchers who all went on to become grandmasters in distributed computing: Andrew Birrell, Roy Levin, Roger Needham, and Mike Schroeder.
For this weekend's reading we will use a followup paper that focusses on the learnings with running Grapevine for several years under substantial load.
Experience with Grapevine: The Growth of a Distributed System, Michael Schroeder, Andrew Birrell, and Roger Needham, in ACM Transactions on Computer Systems, vol. 2, no. 1, February 1984.

Several problems in Distributed Systems can be seen as the challenge to determine a global state. In the classical "Time, Clocks and the Ordering of Events in a Distributed System" Lamport had laid out the principles and mechanisms to solve such problems, and the Distributed Snapshots algorithm, popularly know as the Chandy-Lamport algorithm, is an application of that work. The fundamental techniques in the Distributed Snapshot paper are the secret sauce in many distributed algorithms for deadlock detection, termination detection, consistent checkpointing for fault tolerance, global predicate detection for debugging and monitoring, and distributed simulation.
An interesting anecdote about the algorithm is told by Lamport: "The distributed snapshot algorithm described here came about when I visited Chandy, who was then at the University of Texas in Austin. He posed the problem to me over dinner, but we had both had too much wine to think about it right then. The next morning, in the shower, I came up with the solution. When I arrived at Chandy's office, he was waiting for me with the same solution."
Distributed Snapshots: Determining Global States of a Distributed System K. Mani Chandy and Leslie Lamport, ACM Transactions on Computer Systems 3(1), February 1985.

Disk arrays, which organize multiple, independent disks into a large, high-performance logical disk, were a natural solution to dealing with constraints on performance and reliability of single disk drives. The term "RAID" was invented by David Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987. In their June 1988 paper "A Case for Redundant Arrays of Inexpensive Disks (RAID)" they argued that the top performing mainframe disk drives of the time could be beaten on performance by an array of the inexpensive drives that had been developed for the growing personal computer market. Although failures would rise in proportion to the number of drives, by configuring for redundancy, the reliability of an array could far exceed that of any large single drive.
In 1994 Peter Chen together with Ed Lee and the Berkeley team wrote a computer survey paper that lays out in great detail the background case for disk arrays and goes into the details of the various RAID models.
RAID: High-Performance, Reliable Secondary Storage Peter Chen, Edward Lee, Garth Gibson, Randy Katz and David Patterson, ACM Computing Surveys, Vol 26, No. 2, June 1994.
One of our guiding principles at AWS is to listen closely to our customers and the feedback that I am getting about our training and certification program is very positive. Many architects and engineers know the Cloud is the future of development and IT and the are gearing up to be as succesful as possible in this new normal.
This is why I’m excited to announce the availability of a new Professional level certification from AWS that has been high on the list of our customers. With the growing adoption of cloud computing, we see more of our customers establishing DevOps practices within their IT organizations as a way to increase IT efficiency, improve agility, and in turn innovate faster for their own customers. With increasing demand for IT professionals with cloud computing expertise and DevOps skills in particular, we are excited to offer a certification to help further our customers’ skills in this area. Last November we announced a new AWS Certification for the AWS Certified DevOps Engineer - Professional to help IT professionals validate their skills in the DevOps area and enable employers to identify qualified candidates to lead DevOps initiatives. Today, I am pleased to announce that we have released this exam publicly, as well as awarded our first DevOps Engineer certifications to those who successfully completed the beta exam. Congratulations to everyone who has achieved their AWS Certified DevOps Engineer - Professional certifications!
The AWS Certified DevOps Engineer - Professional certification is the next step in the path for AWS Certified Developers and SysOps Administrators. It validates technical expertise in provisioning, operating, and managing distributed application systems on the AWS platform. The exam is intended to identify individuals who are capable of implementing and managing systems which are highly available, scalable, and self-healing on the AWS platform. You must already be certified as an AWS Certified Developer - Associate or AWS Certified SysOps Administrator – Associate before you are eligible to take this exam. Find the Exam Guide, a practice exam, and other resources at aws.amazon.com/certification.
I’m looking forward to hearing about how this new certification has helped customers and partners!

After a year of absence I am bringing back the Back to Basic Weekend Reading Series. We'll continue to look at the fundamental works of Computer Science and Engineering, and other interesting technical works.
We will start this year with a topic that spans many sciences: that of complex networks. It is relevant to everything from biology, life sciences, social sciences to computer engineering. There is no one better suitable to teach us about the fundamentals of complex networks than Steven Strogatz, the well known author and applied mathematics professor from Cornell University. You can find much of Steven's work on his personal website. In 2001 he published a review paper on complex networks in Nature that is a must read if you want to learn about (or revisit) this topic:
Exploring Complex Networks Steven Strogatz in Nature 410, 268-276 (8 March 2001)
When AWS launched, it changed how developers thought about IT services: What used to take weeks or months of purchasing and provisioning turned into minutes with Amazon EC2. Capital-intensive storage solutions became as simple as PUTting and GETting objects in Amazon S3. At AWS we innovate by listening to and learning from our customers, and one of the things we hear from them is that they want it to be even simpler to run code in the cloud and to connect services together easily. Customers want to focus on their unique application logic and business needs – not on the undifferentiated heavy lifting of provisioning and scaling servers, keeping software stacks patched and up to date, handling fleet-wide deployments, or dealing with routine monitoring, logging, and web service front ends. So we challenged ourselves to come up with an easy way to run applications without having to manage the underlying infrastructure and without giving up on the flexibility to run the code that developers wanted. Our answer is a new compute service called AWS Lambda.
AWS Lambda makes building and delivering applications much easier by giving you a simple interface to upload your Node.js code directly to Lambda, set triggers to run the code (which can come from other AWS services like Amazon S3 or Amazon DynamoDB, to name a couple), and that’s it: you’re ready to go. AWS handles all the administration of the underlying compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling, code and security patch deployment, and code monitoring and logging. You can go from code to service in three clicks and then let AWS Lambda take care of the rest.
One of the most exciting aspects of Lambda is that it helps you create dynamic, event-driven applications in the cloud. Lambda is launching in conjunction with a new Amazon S3 feature called event notifications the generates events whenever objects are added or changed in a bucket, and our recently announced Amazon DynamoDB Streams feature that generates events when a table is updated. Now developers can attach code to Amazon S3 buckets and Amazon DynamoDB tables, and it will automatically run whenever changes occur to those buckets or tables. Developers don’t have to poll, proxy, or worry about being over or under capacity – Lambda functions scale to match the event rate and execute only when needed, keeping your costs low.
Event-driven cloud computing makes it easy to create responsive applications, often without needing to write new APIs. For example, a mobile, tablet, or web application that uploads images to Amazon S3 can automatically trigger the generation of thumbnails with a few lines of code – no servers, queues, or new APIs are needed. Logs are equally easy to process – if you already use AWS CloudTrail to track API calls made to AWS services, you now can easily audit the result just by turning on S3 event notifications for the appropriate bucket and writing a few lines of JavaScript code. Data stored in Amazon DynamoDB can be automatically verified, audited, copied, or transformed with an AWS Lambda function through the new Streams feature we announced earlier this week. AWS Lambda is also launching with support for Amazon Kinesis that makes it easy to process data in a Kinesis stream...and we’re not stopping there – keep watching for more integration points between AWS Lambda and other AWS services that make it easy to respond to events of all types.
We’re excited about event-driven computing – using AWS Lambda to extend other AWS services helps developers create applications that are simple, powerful, and inherently scalable. Lambda also excels at another challenge we hear a lot from customers: Turning some library code into a scalable, secure, and reliable cloud-based backend. With Lambda, developers can upload any library, even native (“binary”) libraries, making it easy to use a few lines of JavaScript to turn a library into an AWS-operated cloud service accessible as a Lambda function. AWS Lambda’s “stateless” programming model lets you quickly deploy and seamlessly scale to the incoming request rate, so the same code that works for one request a day also works for a thousand requests a second.
As with other AWS services, AWS Lambda can be accessed programmatically using the AWS SDK, through a RESTful web service API, from the command line interface, or through the AWS Lambda console. The console lets you edit and run code directly from a browser – you can author, debug, and experiment in real time without even needing an IDE. The AWS Lambda console can also create simulated events for Amazon S3 event notifications, Amazon DynamoDB Streams, and other event sources to help you verify how your code handles events from those sources. Once you’ve created and tested your Lambda function, you can monitor its performance and activity in the AWS Lambda console dashboard or through AWS CloudWatch, including setting alarms on latency or error rates. Logs for your Lambda functions are automatically captured as AWS CloudWatch Logs.
AWS Lambda is launching as a [Preview(http://aws.amazon.com/lambda) with support for functions written in JavaScript (more languages to come) and event integration with Amazon S3, Amazon DynamoDB , and Amazon Kinesis. Preview mode lets you try all of AWS Lambda’s features with a limit on concurrent function requests. We look forward to seeing what our customers will do with AWS Lambda and the new Amazon S3 and DynamoDB event features. We’d like to hear your thoughts on our new event-driven compute service and features, so please connect directly with the product team on the AWS Lambda forum.
Today, I am excited to announce the Preview of the Amazon EC2 Container Service, a highly scalable, high performance container management service. We created EC2 Container Service to help customers run and manage Dockerized distributed applications.
Benefits of Containers
Customers have been using Linux containers for quite some time on AWS and have increasingly adopted microservice architectures. The microservices approach to developing a single application is to divide the application into a set of small services, each running its own processes, which communicate with each other. Each small service can be scaled independently of the application and can be managed by different teams. This approach can increase agility and speed of feature releases. The compact, resource efficient footprint of containers was attractive to sysadmins looking to pack lots of different applications and tasks, such as a microservice, onto an instance. Over the past 20 months, the development of Docker has opened up the power of containers to the masses by giving developers a simple way to package applications into containers that are portable from environment to environment. We saw a lot of customers start adopting containers in their production environments because Docker containers provided a consistent and resource efficient platform to run distributed applications. They experienced reduced operational complexity and increased developer velocity and pace of releases.
Cluster Management Difficulties
Getting started with Docker containers is relatively easy, but deploying and managing containers, in the thousands, at scale is difficult without proper cluster management. Maintaining your own cluster management platform involves installing and managing your own configuration management, service discovery, scheduling, and monitoring systems. Designing the right architecture to scale these systems is no trivial task. We saw customers struggle with this over and over.
Leveraging AWS
When we started AWS, the thinking was we could use Amazon’s expertise in ultra-scalable system software and offer a set of services that could act as infrastructure building blocks to customers. Through AWS, we believed that developers would no longer need to focus on buying, building, and maintaining infrastructure but rather focus on creating new things. Today with EC2 Container Service, we believe developers no longer need to worry about managing containers and clusters. Rather, we think they can go back to creating great applications, containerize them, and leave the rest to AWS. EC2 Container Service helps you capitalize on Docker’s array of benefits by taking care of the undifferentiated heavy lifting of container and cluster management: we are providing you containers as a service. Furthermore, through EC2 Container Service, we are treating Docker containers as core building blocks of computing infrastructure and providing them many of the same capabilities that you are used to with EC2 instances (e.g., VPCs, security groups, etc) at the container level.
Sign up here for the Preview and tell us what you think. We are just getting started and have a lot planned on our roadmap. We are interested in listening to what features you all would like to use. Head over to Jeff Barr’s blog to learn more about how to use EC2 Container Service.
Automated deployments are the backbone of a strong DevOps environment. Without efficient, reliable, and repeatable software updates, engineers need to redirect their focus from developing new features to managing and debugging their deployments. Amazon first faced this challenge many years ago.
When making the move to a service-oriented architecture, Amazon refactored its software into small independent services and restructured its organization into small autonomous teams. Each team took on full ownership of the development and operation of a single service, and they worked directly with their customers to improve it. With this clear focus and control, the teams were able to quickly produce new features, but their deployment process soon became a bottleneck. Manual deployment steps slowed down releases and introduced bugs caused by human error. Many teams started to fully automate their deployments to fix this, but that was not as simple as it first appeared.
Deploying software to a single host is easy. You can SSH into a machine, run a script, get the result, and you’re done. The Amazon production environment, however, is more complex than that. Amazon web applications and web services run across large fleets of hosts spanning multiple data centers. The applications cannot afford any downtime, planned or otherwise. An automated deployment system needs to carefully sequence a software update across a fleet while it is actively receiving traffic. The system also requires the built-in logic to correctly respond to the many potential failure cases.
It didn’t make sense for each of the small service teams to duplicate this work, so Amazon created a shared internal deployment service called Apollo. Apollo’s job was to reliably deploy a specified set of software across a target fleet of hosts. Developers could define their software setup process for a single host, and Apollo would coordinate that update across an entire fleet of hosts. This made it easy for developers to “push-button” deploy their application to a development host for debugging, to a staging environment for tests, and finally to production to release an update to customers. The added efficiency and reliability of automated deployments removed the bottleneck and enabled the teams to rapidly deliver new features for their services.
Over time, Amazon has relied on and dramatically improved Apollo to fuel the constant stream of improvements to our web sites and web services. Thousands of Amazon developers use Apollo each day to deploy a wide variety of software, from Java, Python, and Ruby apps, to HTML web sites, to native code services. In the past 12 months alone, Apollo was used for 50M deployments to development, testing, and production hosts. That’s an average of more than one deployment each second.
The extensive use of Apollo inside Amazon has driven the addition of many valuable features. It can perform a rolling update across a fleet where only a fraction of the hosts are taken offline at a time to be upgraded, allowing an application to remain available during a deployment. If a fleet is distributed across separate data centers, Apollo will stripe the rolling update to simultaneously deploy to an equivalent number of hosts in each location. This keeps the fleet balanced and maximizes redundancy in the case of any unexpected events. When the fleet scales up to handle higher load, Apollo automatically installs the latest version of the software on the newly added hosts.
Apollo also tracks the detailed deployment status on individual hosts, and that information is leveraged in many scenarios. If the number of failed host updates crosses a configurable threshold, Apollo will automatically halt a deployment before it affects the application availability. On the next deployment, such as a quick rollback to the prior version, Apollo will start updating these failed hosts first, thus bringing the whole fleet to a healthy state as quickly as possible. Developers can monitor ongoing deployments and view their history to answer important questions like “When was this code deployed into production, and which hosts are currently running it?” or “What version of the application was running in production last week?”
Many of our customers are facing similar issues as they increase the rate of their application updates. They’ve asked us how we do it, because they want to optimize their processes to achieve the same rapid delivery. Since automated deployments are a fundamental requirement for agile software delivery, we created a new service called AWS CodeDeploy.
CodeDeploy allows you to plug in your existing application setup logic, and then configure the desired deployment strategy across your fleets of EC2 instances. CodeDeploy will take care of orchestrating the fleet rollout, monitoring the status, and giving you a clear dashboard to control and track all of your deployments. It simplifies and standardizes your software release process so that developers can focus on what they do best –building new features for their customers. Jeff Barr’s blog post includes a great walkthrough of using CodeDeploy, and is a good place to start to gain a deeper understanding of the service.
AWS CodeDeploy is the first in a set of ALM services designed to help customers build a productive cloud development process. We look forward to sharing more about the internal processes and tools that Amazon uses for agile software delivery. Apollo is just one piece of a larger set of solutions. We’d like to hear about the ways that you think we can help improve your delivery process, so please connect directly with the product team on the CodeDeploy forum.

I’m excited to be heading to Las Vegas in less than two weeks for our annual re:Invent conference. One of the highlights for me is being able to host an extensive lineup of startup-focused events which take place at re:Invent on Thursday, November 13.
Here’s a quick peak at the startup experience this year:
Third Annual Startup Launches
I’m excited to host this event where five AWS-powered startups will make a significant, never-before-shared launch announcement on stage. Included in the announcements are special discounts on the newly-launched products—discounts only available to session attendees. And to top it all off, we’ll have a happy hour immediately following the final launch announcement!
Founders Fireside Chat
In this session I’ll sit down with leaders who have taken their startups from an idea on a cocktail napkin to known names in a matter of a few years by harnessing the possibilities of technology and AWS. Their insights and learnings apply not only to fledgling startups and future entrepreneurs, but to enterprises seeking out ways to become more agile, responsive, and dynamic in the rapid technology race. Dan Wagner, CEO of Civis Analytics, Adam Jacob, Chief Dev Officer of Chef, and Alan Schaaf, founder of Imgur will join me in this fireside chat.
CTO-to-CTO Fireside Chat
I’ll get into the mindsets of the technical leaders behind some of the most progressive and innovative startups in the world. This is your opportunity to learn what happens behind the scenes, how pivotal technology and AWS infrastructure decisions are made, and the thinking that leads to products and services that disrupt and reshape how businesses and people use technologies day to day. I’ll chat with Chris Wanstrath, CEO of Github, Andrew Miklas, CTO of Pagerduty, and Seth Proctor, CTO of NuoDB.
VC Panel Discussion
In this session you can hear what our high-powered panel of top venture capitalists have to say about trends, pleasant and unpleasant surprises, the next big things on the horizon, and emerging startup hotspots for cloud apps and infrastructure. Greylock Partners, Draper Associates, Scale Venture Partners, Madrona Venture Group, and General Catalyst Partners will join the panel.
Startup Pub Crawl
Don’t forget about Wednesday night! In preparation for the Thursday startup track, be sure to join the AWS Startup Team for a stop along the re:Invent Pub Crawl. It will be an opportunity to network with the hottest startups, enterprises, and AWS team members. Join the team at Buddy V’s on Wednesday, November 12 from 5:30-7:30 at the Grand Canal Shoppes at the Venetian.
For venue information, and detailed session schedules, please check out the re:Invent website.
See you in Vegas!
