Today, I’m thrilled to announce several major features that significantly enhance the development experience on DynamoDB. We are introducing native support for document model like JSON into DynamoDB, the ability to add / remove global secondary indexes, adding more flexible scaling options, and increasing the item size limit to 400KB. These improvements have been sought by many applications developers, and we are happy to be bringing them to you. The best part is that we are also significantly expanding the free tier many of you already enjoy by increasing the storage to 25 GB and throughput to 200 million requests per month. We designed DynamoDB to operate with at least 99.999% availability. Now that we have added support for document object model while delivering consistent fast performance, I think DynamoDB is the logical first choice for any application. Let’s now look at the history behind the service and the context for new innovations that make me think that.

NoSQL and Scale

More than a decade ago, Amazon embarked on a mission to build a distributed system that challenged conventional methods of data storage and querying. We started with Amazon Dynamo, a simple key-value store that was built to be highly available and scalable to power various mission-critical applications in Amazon’s e-commerce platform. The original Dynamo paper inspired many database solutions, which are now popularly referred to as NoSQL databases. These databases trade off complex querying capabilities and consistency for scale and availability.

In 2012, we launched Amazon DynamoDB, the successor to Amazon Dynamo. For DynamoDB, our primary focus was to build a fully-managed highly available database service with seamless scalability and predictable performance. We built DynamoDB as a fully-managed service because we wanted to enable our customers, both internal and external, to focus on their application rather than being distracted by undifferentiated heavy lifting like dealing with hardware and software maintenance. The goal of DynamoDB is simple: to provide the same level of scalability and availability as the original Dynamo, while freeing developers from the burden of operating distributed datastores (such as cluster setup, software upgrades, hardware lifecycle management, performance tuning, security upgrades, operations, etc.) Since launch, DynamoDB has been the core infrastructure powering various AWS and Amazon services and has provided more than 5 9s of availability worldwide. Developers within and outside of Amazon have embraced DynamoDB because it enables them to quickly write their app and it shields them from scaling concerns as their app changes or grows in popularity. This is why DynamoDB has been getting widespread adoption from exciting customers like AdRoll, Scopely, Electronic Arts,, Shazam, Devicescape, and Dropcam.

NoSQL and Flexibility: Document Model

A trend in NoSQL and relational databases is the mainstream adoption of the document model. JSON has become the accepted medium for interchange between numerous Internet services. JSON-style document model enables customers to build services that are schema-less. Typically, in a document model based datastore, each record and its associated data is modeled as a single document. Since each document can have a unique structure, schema migration becomes a non-issue for applications.

While you could store JSON documents in DynamoDB from the day we launched, it was hard to do anything beyond storing and retrieving these documents. Developers did not have direct access to the nested attributes embedded deep within a JSON document, and losing sight of these deeply nested attributes deprived developers of some of the incredible native capabilities of DynamoDB. They couldn’t leverage capabilities like conditional updates (the ability to make an insert into a DynamoDB table if a condition is met based on the latest state of the data across the distributed store) or Global Secondary Indexes (the ability to project one or more of the attributes of your items into a separate table for richer indexing capability). Until now, developers who wanted to store and query JSON had two choices: a) develop quickly and insert opaque JSON blobs in DynamoDB while losing access to key DynamoDB capabilities, or b) decompose the JSON objects themselves into attributes, which requires additional programming effort and a fair bit of forethought.

Enter Native Support for JSON in DynamoDB: Scalability and Flexibility Together at Last

The DynamoDB team is launching document model support. With today's announcement, DynamoDB developers can now use the AWS SDKs to inject a JSON map into DynamoDB. For example, imagine a map of student id that maps to detailed information about the student: their names, a list of their addresses (also represented as a map), etc.

      “id”: 1234,
      “firstname”: “John”,
      “lastname”: “Doe”,
      “addresses”: [
            “street”: “main st”,
            “city”: “seattle”,
            “zipcode”: 98005,
            “type”: “current”,
           “street”: “9th st”,
           “city”: seattle,
           “zipcode”: 98005,
           “type”: “past”,

With JSON support in DynamoDB, you can access the city of a student's current address by simply asking for students.1234.address[0].city. Moreover, developers can now impose conditions on these nested attributes, and perform operations like delete student 1234 if his primary residence is in Seattle.

With native support for JSON, we have unleashed the scalability and consistent fast performance capabilities of DynamoDB to provide deeper support for JSON. Now, developers do not have to choose between datastores that are optimized for scalability and those that are optimized for flexibility. Instead they can pick one NoSQL database, Amazon DynamoDB, that provides both.

Online Indexing: Improved Global Secondary Indexes

Global Secondary Index (GSI) is one of the most popular features for DynamoDB. GSIs enable developers to create scalable secondary indexes on attributes within their JSON document. However, when we initially launched GSI support, developers had to identify all of their secondary indexes up front: at the time of the table creation. As an application evolves and a developer learns more about her use cases, the indexing needs evolve as well. To minimize the up front planning, we will be adding the ability to add or remove indexes for your tables. This means that you can add, modify, and delete indexes on your table on demand. As always, you maintain the ability to independently scale your GSI indexes as the load on each index evolves. We will add the Online Indexing capability soon.

Support for Larger Items

With the document model, since you store the primary record and all its related attributes in a single document, the need for bigger items is more critical. We have increased the size of items you are able to store in DynamoDB. Starting today, you can store 400KB objects in DynamoDB, enabling you to use DynamoDB for a wider variety of applications.

Even Faster Scaling

We have made it even easier to scale your applications up and down with a single click. Previously, DynamoDB customers were only able to double the provisioned throughput on a table with each API call. This forced customers to interact with DynamoDB multiple times to, for example, scale their table from 10 writes per second to 100,000 writes per second. Starting today, you can go directly from 10 writes per second to 100,000 writes per second (or any other number) with a single click in the DynamoDB console or a single API call. This makes it easier and faster to reduce costs by optimizing your DynamoDB table’s capacity, or to react quickly as your database requirements evolve.

Mars rover image indexing using DynamoDB

We put together a demo app that indexes the metadata of NASA/JPL’s Curiosity Mars rover images. In this app, the images are stored in S3 with metadata represented as JSON documents and stored/indexed in DynamoDB.

Take a look at the application here:

Tom Soderstrom, IT Chief Technology Officer of NASA JPL, has made the point that leveraging the power of managed services like DynamoDB and S3 for these workloads allows NASA/JPL to scale applications seamlessly without having to deal with undifferentiated heavy lifting or manual effort. Having native JSON support in a database helps NASA developers write apps faster and makes it easier to share more data with global citizen scientists.

Look for a more detailed blog on the architecture of this application and its sample source code in the next few days!

Expanding the freedom to invent

Not only did we add these new capabilities, we have also expanded the free tier. Amazon DynamoDB has always provided a perpetual free tier for developers to build their new applications. Today we are announcing a significant expansion to the free tier. We are increasing the free storage tier to 25GB and giving you enough free throughput capacity to perform over 200 Million requests per month. What does this mean for you as an application developer? You could build a web application with DynamoDB as a back-end and handle over 200 million requests per month, and not have to pay anything for the database. You could build a new gaming application that can support 15,000 monthly active users. You could build an ad-tech platform that serves over 500,000 ad impression requests per month. We are giving you the freedom and flexibility to invent on DynamoDB.

What does this all mean?

To sum it all up, we have added support for JSON to streamline document oriented development, enhanced the size of items you can store in tables to 400KB, and provided a faster and even easier way to scale up DynamoDB. These features are now available in four of our AWS Regions, and will be available in the remaining regions shortly. Today they are available in: US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo) region, and EU (Ireland). We also announced the ability to add or remove Global Secondary Indexes dynamically on existing tables, which will be available soon.

DynamoDB is the logical first choice for developers building new applications. It was designed to give developers flexibility and minimal operational overhead without compromising on scale, availability, durability, or performance. I am delighted to share today’s announcements and I look forward to hearing how you use our new features!

AWS Pop-up Loft 2.0: Returning to San Francisco on October 1st

| Comments ()

It’s an exciting time in San Francisco as the return of the AWS Loft is fast approaching. We’ve been working round-the-clock, making updates to ensure the experience is more fulfilling and educational than in June. Today we’re excited to announce that…

On Wednesday, October 1st, we’ll be returning to 925 Market Street!

The AWS Loft is all about helping you scale and grow your business by offering free AWS technical resources. You’ll have access to training including hands-on bootcamps and labs, and 1:1 sessions with AWS Solutions Architects. Or you can plugin, hang out, get some work done, and stick around for evening presentations by innovative startups and community experts.

Take a look at the AWS Loft homepage to see full weekly schedules, and if you see something you like, go here and start filling your calendar.

Hours and Location
We’re at 925 Market Street and our doors will be open 10AM to 6PM on weekdays, with select events running until 8PM on weeknights. Be sure to check the calendar regularly as new evening events will be added regularly.

What’s Happening at the AWS Loft
In October there will be an abundance of sessions, events, and coding activities focused on game and mobile app development. In addition, below is an overview of the activities taking place at the AWS Loft each week:

Ask an Architect: You can schedule a 1:1, 60 minute session with a member of the AWS technical team. Be sure to bring your questions about AWS architecture, cost optimization, services and features, and anything else AWS-related. And don’t be shy—walk-ins are welcome too.

Technical Presentations: AWS Solution Architects, Product Managers, and Evangelists will deliver technical presentations covering some of the highest-rated and best-attended sessions from recent AWS events. Topics include Introduction to AWS, Big Data, Compute & Networking, Architecture, Mobile & Gaming, Databases, Operations, Security, and more.

AWS Technical Bootcamps : Limited to twenty participants, these full-day bootcamps include hands-on lab exercises using a live environment with the AWS console. Usually these cost $600, but at the AWS Loft we are offering them for free. Bootcamps you can register for include: “Getting Started with AWS,” “AWS Essentials,” “Highly Available Apps,” and “Taking AWS Operations to the Next Level.”

Self-paced, Hands-on Labs: These online technical labs are ready and waiting for walk-ins throughout the day. We’ll offer labs for beginners through advanced users on topics that range from creating Amazon EC2 instances to launching and managing a web application with CloudFormation. Usually $30 each, we are offering these labs for free in the AWS Loft.

Customer Speaking Events and Evening Happy Hours: Innovative Bay Area startups share the technical and business journeys they undertook as their ideas came to life. CircleCI, NPM, Lyft, Librato, Cotap, Runscope and others will be featured speakers during our evening happy hours.

For example, join us next week in the Loft for this special event:

The Future of IT: Startups at the NASA Jet Propulsion Laboratory

When: October 1, 2014 at 6:30 -8:00 PM
Where: The AWS Pop-up Loft

What do startups and NASA JPL have in common? You may be surprised at just how similar the Jet Propulsion Laboratory is to a startup. In this exciting talk from Tom Soderstrom, the Chief Innovation Officer of NASA JPL, will share the way he delivers projects in a startup environment: from seating to planning to delivering. You will learn the secrets of failing fast and trying lots of techniques in the quest to extend the reach of humanity. Tom will also share the role that Cloud Computing has played in the exploration of space in the last decade and where we are heading next. Don't be surprised to hear about the bold vision that NASA has set for exploration of space AND earth and how JPL plans to go about it.

Check out the AWS Loft homepage to learn the details and see full weekly schedules .

This is an extended version of an article that appeared in the Guardian today

We are rapidly entering into an era where massive computing power, digital storage and global network connections can be deployed by anyone as quickly and easily as turning on the lights. This is the promise – and the reality – of cloud computing which is driving tremendous change in the technology industry and transforming how we do business in Europe and around the world.

Cloud computing unlocks innovation within organisations of all types and sizes. No longer do they need to spend valuable human and capital resources on maintaining and procuring expensive technology infrastructure and datacenters, they can focus their most valuable resources on what they do best, building better products and services for their customers. Europe’s fastest growing start-ups, like Spotify, Soundcloud, Hailo, JustEat, WeTransfer and Shazam, through to some of the region’s largest, and oldest, enterprises, like Royal Dutch Shell, Schneider Electric, SAP, BP and Unilever, through to governments, education and research institutes, all are using cloud computing technologies to innovate faster and better serve their customers and the citizens of Europe.

According to a study from the center for economics and business research, the expected cumulative economic effects of cloud computing between 2010 and 2015 in the five largest European economies alone is around € 763 billion . Analyst firm IDC notes the cloud economy is growing by more than 20% and could generate nearly € 1 trillion in GDP and 4 million jobs by 2020 . The change being driven by cloud computing has become so significant that many of Europe’s policy-makers are debating the best policy approaches to enable broad success with cloud computing across the continent.

The European Commission has taken a lead in this discussion and is recognising the benefit cloud has for the European economy and the role it can play in building a global competitive advantage, ongoing prosperity, and world-leading innovation for Europe’s commercial and public sectors. In 2012, the European Commission set up the European Cloud Partnership (ECP), an initiative that brings together technology leaders, cloud users, both private and public sector, and policy-makers to recommend how to establish a Digital Single Market with no walls for cloud computing in Europe. As a member of the steering board of the ECP, and someone who has been working with the European Commission on their cloud strategy for many years, I am privileged to help contribute to the collaboration on how to promote and shape cloud computing in the region.

With the recent publication of the ECP’s Trusted Cloud Europe vision, which encourages cloud adoption in the region, I wanted to give the AWS view of the ECP’s vision and define a high level approach of the elements needed to continue to drive adoption of cloud computing across Europe. I believe that many of the elements needed for cloud computing to be successful in the region focus on values that are core to all of us as Europeans. As a Dutchman, I hold European values in close regard - values such as the right to a fair and democratic society, and a strong protection of privacy and freedom. Cloud computing – done right –enables broad expression and realization of these European values, especially when combined with a business model that puts customers first. One of the key themes of the ECP’s vision document is the call for a cloud computing framework that focuses on customers and empowers Europeans. As a senior member of the Amazon team, focusing on customers is something I know well.

When Amazon launched, nearly 20 years ago, it was established with the mission to be Earth's most customer-centric company. This means giving customers’ choice - where they can find and discover anything they might want to buy online and offering the lowest possible prices - bringing products to everyone at an affordable price point. This customer focus permeates every part of the Amazon business where we will not do anything unless the customer is going to benefit directly. We also know that if we do not live up to this customer-first promise, and constantly strive to give the best service, they are free to walk away. This puts the power in their hands and constantly keeps us focused on delighting our customers.

For cloud computing to be successful in Europe, providers must hold exceeding customer needs as a core value. The easiest way to accomplish this is to put the power in the hands of the customer with no minimum or long term commitments. This means they have the freedom to walk away at any time if they don’t get the service that they expect. They also have the freedom to use as much or as little of the cloud services they want and only pay for the resources used. For too long customers have been locked in to long term service contracts, costly capital outlays that require equipment upgrades every two-three years, and expensive software licensing fees from ‘old guard’ technology vendors. Being customer focused means ridding European businesses and organizations of these handcuffs and democratizing technology so that anyone has access to the same, world-class technology services on demand. This brings large amounts of the latest technology resources, something that was previously a privilege of the world’s largest companies, into the hands of organizations of all sizes.

I have also seen some antiquated thinking attempting to undermine the important work that the ECP is doing in other ways. We have heard calls in some corners to develop a cloud computing framework in Europe to protect the interests of ‘old guard’ technology vendors and the way that IT “used to be” procured leading to the same expensive contracts, just disguised as cloud. I disagree and think this goes against the ethos of the ECP’s focus which is that cloud computing should serve the customers and citizens of Europe, not shareholders of technology companies. Focusing on lowering prices for Europeans will boost the economy and prosperity of local businesses as more capital can be allocated to innovation -not activities that don’t differentiate businesses, such as the overhead of managing the underlying IT infrastructure. As a result of affordable cloud resources we are already seeing centres of innovation and excellence, emerging in London, Berlin, Barcelona and Stockholm that are beginning to rival Silicon Valley. If we continue to focus cloud computing on lowering the barrier of entry and cost of failure for customers we will see more companies experimenting and exploring things previously not possible. More experimentation drives more invention and ultimately more centres of innovation appear. This is vital to Europe’s ongoing leadership in the world economy.

Finally, one of the core messages we have been taking to the ECP is the call to put data protection, ownership, and control, in the hands of cloud users. For cloud to succeed, and realise its potential, it is essential that customers own and control their data at all times. Recent news stories have brought this topic to the fore. Customers, governments and businesses, large and small alike, have concerns about the security, ownership and privacy of their data. If they are not addressed, these concerns have the potential to undermine the pervasive adoption of cloud computing and the resulting benefits to the European business community. At AWS we decided on day one to put this control in the hands of our customers. They own the data – they choose where to store the data and their data would never be moved to optimise the network. This means that European customers using the AWS Cloud can choose to keep their data in Europe. We also give customer’s tools and techniques to encrypt their data, both at rest and in transit, and manage their secret keys in such a way that it is the customer who completely controls who can access their data, not AWS or any other party. Content that has been encrypted is rendered useless without the applicable decryption keys.

For cloud technology to be successful, and fulfil its potential to fundamentally change the European digital landscape, it must benefit the many, not the few. We have seen this with the rapid rise of the internet and we will also see this with cloud computing if we put the power in the hands of the customer. We echo the ECP’s call to focus a cloud computing framework on customers and removing barriers and restrictions to adoption in order to pave the way for increased prosperity of European businesses and provide access to high quality, secure, and trustworthy cloud services across Europe.

Cloud computing is not a technology of the future, it is a technology of today. I commend the European Commission and the ECP in recognising the potential cloud computing has to be a job creator, a driver for the economy, and a catalyst of innovation across Europe. The launch of the Trusted Cloud Europe vision is an important milestone as it will help accelerate cloud adoption in the region while helping to ensure customer-focused tenants at the core of cloud provider’s strategies. European customers were amongst the first to adopt AWS cloud technologies when we launched in 2006 and we look forward to continuing to work with the customers and policy-makers, as we help more companies in Europe reach their potential through cloud computing.

The AWS Activate CTO to CTO series on Medium

| Comments ()

I'm excited to announce a new blog dedicated to AWS startups. We're launching it on Medium, itself a startup on AWS. I kicked off the blog with a Q&A with the Medium CTO Don Neufeld. I really enjoyed Don's answers to my questions and there are some real gems in here for startup CTOs. Check it out.

We'll be keeping this blog fresh with other startup spotlights and good technical content so follow the collection and keep up.

We launched Elastic Beanstalk in 2011 with support for Java web applications and Tomcat 6 in one region, and we've seen the service grow to 6 container types (Java/Tomcat, PHP, Ruby, Python, .NET, and Node.js) supported in 8 AWS regions around the world. The Elastic Beanstalk team spends a lot of time talking to AWS Developers, and in the last few months they've noticed a common theme in those conversations: developers tell us they're interested in Docker, and ask if we are thinking about making it easy to run and scale Docker workloads in AWS.

Several weeks ago we made it simple to yum install Docker on your EC2 Instances running Amazon Linux, and today Elastic Beanstalk introduces the ability to deploy, manage, and scale Docker Containers. Along with the native Docker functionality you're used to - including log access, volume mapping, and environment variables - enjoy the automated monitoring, provisioning, and configuration of things like Elastic Load Balancing and Auto Scaling that Elastic Beanstalk provides.

Best of Both Worlds

When developers asked us to support Docker in Elastic Beanstalk they described a 'best of both worlds' scenario: they love Docker's impact on their development workflow. Packaging applications as Docker Images makes them portable, reliable, easy to share with others, and simple to test. They wanted to make it similarly easy to deploy and operate their Docker-based applications on AWS and take advantage of features like RDS, VPC, and IAM.

Now, developers can deploy their Docker Containers to Elastic Beanstalk and enjoy the deployment, management, and automation features that come with along with it, including log rotation, VPC integration, IAM Roles, and RDS (including fully-managed MySQL, PostgreSQL, Oracle, and SQL Server databases).

To get started with Docker and Elastic Beanstalk, check out this short video below or see Jeff's post for a few samples.

Customer Centricity at Amazon Web Services

| Comments ()

In the 2013 Amazon Shareholder letter, Jeff Bezos spent time explaining the decision to pursue a customer-centric way in our business.

As regular readers of this letter will know, our energy at Amazon comes from the desire to impress customers rather than the zeal to best competitors. We don’t take a view on which of these approaches is more likely to maximize business success. There are pros and cons to both and many examples of highly successful competitor-focused companies. We do work to pay attention to competitors and be inspired by them, but it is a fact that the customer-centric way is at this point a defining element of our culture.

AWS has built a reputation over the years for the breadth and depth of our services and the pace of our innovation with 280 features released in 2013. One area we don’t spend a lot of time discussing is the significant investments we’ve made in building a World Class Customer Service and Technical Support function. These are the people working behind the scenes helping customers fully leverage all of AWS’s capabilities when running their infrastructure on AWS. We launched AWS Support in 2009 and since the launch the mission has remained constant: to help customers of all sizes and technical abilities to successfully utilize the products and features provided by AWS.

Customers are frequently surprised to hear we have a Support organization that not only helps customers via email, phone, chat or web cases, but also builds innovative software to deliver better customer experiences. In recent years, this team has released technology such as Support for Health Checks, AWS Trusted Advisor, Support API’s, Trusted Advisor API’s, and many more. One customer facing feature Jeff highlighted in the Shareholder letter was AWS Trusted Advisor which is a tool that our support organization built to move support from reactive help to proactive, preventative help.

I can keep going – Kindle Fire’s FreeTime, our customer service Andon Cord, Amazon MP3’s AutoRip – but will finish up with a very clear example of internally driven motivation: Amazon Web Services. In 2012,AWS announced 159 new features and services. We’ve reduced AWS prices 27 times since launching 7 years ago, added enterprise service support enhancements, and created innovative tools to help customers be more efficient. AWS Trusted Advisor monitors customer configurations, compares them to known best practices, and then notifies customers where opportunities exist to improve performance, enhance security, or save money. Yes, we are actively telling customers they’re paying us more than they need to. In the last 90 days, customers have saved millions of dollars through Trusted Advisor, and the service is only getting started. All of this progress comes in the context of AWS being the widely recognized leader in its area – a situation where you might worry that external motivation could fail. On the other hand, internal motivation – the drive to get the customer to say “Wow” – keeps the pace of innovation fast.

We’ve always focused on getting highly skilled support engineers with all hires requiring the same technical certification process (Tech Bar Raisers) as any of our developers building services. Over the years, we have scaled the AWS Support organization to meet customer need from a team with heavy Linux Sys Admin with strong Networking skills in one location to a large global team, located in 17 locations around the world with Windows Sys Admins, Networking Engineers, DBAs, Security Specialists, Developers, and many more specializations. In 2013, we spent a lot of time developing sophisticated internal tools that make supporting our customers more efficient including intelligent skills based case routing tools that provide in-depth technical information to engineers that help address customer needs. Our customers tell us that the service is 78% better than it was 3 years ago.

Our customers can feel confident that AWS will work every day to deliver World Class Support for our customers. I wanted to share a new video with you where our customers discuss this critical behind the scenes function in a little more detail. After watching, I believe that you will have a better perspective on our mission, our support options, and the benefits that our customers derive from their use of AWS Support:

Updated Lampson's Hints for Computer Systems Design

| Comments ()

This year I have not been able to publish many back-to-basics readings, so I will not close the year with a recap of those. Instead I have a video of a wonderful presentation by Butler Lampson where he talks about the learnings of the past decades that helped him to update his excellent 1983 "Hints for computer system design".

The presentation was part of the Heidelberg Laureate Forum helt in September of this year. At the Forum many of the Abel, Fields and Turing Laureates held presentations. Our most famous computer scientists like Fernando Carbato, Stephen Cook, Edward Feigenbaum, Juris Hartmanis, John Hopcroft, Alan Kay, Vinton Cerf, etc. were all at the Forum. You can find a list of selected video presentation here

For me the highight was Butler's presentation on Hints and Principles for Computer Science Design. I include it here as it is absolutely worth watching.

We launched DynamoDB last year to address the need for a cloud database that provides seamless scalability, irrespective of whether you are doing ten transactions or ten million transactions, while providing rock solid durability and availability. Our vision from the day we conceived DynamoDB was to fulfil this need without limiting the query functionality that people have come to expect from a database. However, we also knew that building a distributed database that has unlimited scale and maintains predictably high performance while providing rich and flexible query capabilities, is one of the hardest problems in database development, and will take a lot of effort and invention from our team of distributed database engineers to solve. So when we launched in January 2012, we provided simple query functionality that used hash primary keys or composite primary keys (hash + range). Since then, we have been working on adding flexible querying. You saw the first iteration in April 2013 with the launch of Local Secondary Indexes (LSI). Today, I am thrilled to announce a fundamental expansion of the query capabilities of DynamoDB with the launch of Global Secondary Indexes (GSI). This new capability allows indexing any attribute (column) of a DynamoDB table and performing high-performance queries at any table scale.

Going beyond Key-Value

Advanced Key-value data stores such as DynamoDB achieve high scalability on loosely coupled clusters by using the primary key as the partitioning key to distribute data across nodes. Even though the resulting query functionality may appear more limiting than a relational database on a cursory examination, it works exceedingly well for a wide range of applications as evident from DynamoDB's rapid growth and adoption by customers like Electronic Arts, Scopley, HasOffers, SmugMug, AdRoll, Dropcam, Digg and by many teams at (Cloud Drive, Retail). DynamoDB continues to be embraced for workloads in Gaming, Ad-tech, Mobile, Web Apps, and other segments where scale and performance are critical. At, we increasingly default to DynamoDB instead of using relational databases when we don’t need complex query, table join and transaction capabilities, as it offers a more available, more scalable and ultimately a lower cost solution.

For non-primary key access in advanced key-value stores, a user has to resort to either maintaining a separate table or some form of scatter-gather query across partitions. Both these options are less than ideal. For instance, maintaining a separate table for indexes forces users to maintain consistency between the primary key table and the index tables. On the other hand, with a scatter gather query, as the dataset grows, the query must be scattered more and more resulting in poor performance over time. DynamoDB's new Global Secondary Indexes remove this fundamental restriction by allowing "scaled out" indexes without ever requiring any book-keeping on behalf of the developer. Now you can run queries on any item attributes (columns) in your DynamoDB table. Moreover, a GSI's performance is designed to meet DynamoDB's single digit millisecond latency - you can add items to a Users table for a gaming app with tens of millions of users with UserId as the primary key, but retrieve them based on their home city, with no reduction in query performance.

DynamoDB Refresher

DynamoDB stores information as database tables, which are collections of individual items. Each item is a collection of data attributes. The items are analogous to rows in a spreadsheet, and the attributes are analogous to columns. Each item is uniquely identified by a primary key, which is composed of its first two attributes, called the hash and range. DynamoDB queries refer to the hash and range attributes of items you’d like to access. These query capabilities so far have been based on the default primary index and optional local secondary indexes of a DynamoDB table:

  • Primary Index: Customers can choose from two types of keys for primary index querying: Simple Hash Keys and Composite Hash Key / Range Keys. Simple Hash Key gives DynamoDB the Distributed Hash Table abstraction. The key is hashed over the different partitions to optimize workload distribution. For more background on this please read the original Dynamo paper. Composite Hash Key with Range Key allows the developer to create a primary key that is the composite of two attributes, a “hash attribute” and a “range attribute.” When querying against a composite key, the hash attribute needs to be uniquely matched but a range operation can be specified for the range attribute: e.g. all orders from Werner in the past 24 hours, or all games played by an individual player in the past 24 hours.
  • Local Secondary Index: Local Secondary Indexes allow the developer to create indexes on non-primary key attributes and quickly retrieve records within a hash partition (i.e., items that share the same hash value in their primary key): e.g. if there is a DynamoDB table with PlayerName as the hash key and GameStartTime as the range key, you can use local secondary indexes to run efficient queries on other attributes like “Score.” Query “Show me John’s all-time top 5 scores” will return results automatically ordered by score.

What are Global Secondary Indexes?

Global secondary indexes allow you to efficiently query over the whole DynamoDB table, not just within a partition as local secondary indexes, using any attributes (columns), even as the DynamoDB table horizontally scales to accommodate your needs. Let’s walk through another gaming example. Consider a table named GameScores that keeps track of users and scores for a mobile gaming application. Each item in GameScores is identified by a hash key (UserId) and a range key (GameTitle). The following diagram shows how the items in the table would be organized. (Not all of the attributes are shown)

Now suppose that you wanted to write a leaderboard application to display top scores for each game. A query that specified the key attributes (UserId and GameTitle) would be very efficient; however, if the application needed to retrieve data from GameScores based on GameTitle only, it would need to use a Scan operation. As more items are added to the table, scans of all the data would become slow and inefficient, making it difficult to answer questions such as

  • What is the top score ever recorded for the game "Meteor Blasters"?
  • Which user had the highest score for "Galaxy Invaders"?
  • What was the highest ratio of wins vs. losses?

To speed up queries on non-key attributes, you can specify global secondary indexes. For example, you could create a global secondary index named GameTitleIndex, with a hash key of GameTitle and a range key of TopScore. Since the table's primary key attributes are always projected into an index, the UserId attribute is also present. The following diagram shows what GameTitleIndex index would look like:

Now you can query GameTitleIndex and easily obtain the scores for "Meteor Blasters". The results are ordered by the range key, TopScore.

Efficient Queries

Traditionally, databases have been scaled as a whole –tables and indexes together. While this may appear simple, it masked the underlying complexity of varying needs for different types of queries and consequently different indexes, which resulted in wasted resources. With global secondary indexes in DynamoDB, you can now have many indexes and tune their capacity independently. These indexes also provide query/cost flexibility, allowing a custom level of clustering to be defined per index. Developers can specify which attributes should be “projected” to the secondary index, allowing faster access to often-accessed data, while avoiding extra read/write costs for other attributes.

Start with DynamoDB

The enhanced query flexibility that global and local secondary indexes provide means DynamoDB can support an even broader range of workloads. When designing a new application that will operate in the AWS cloud, first take a look at DynamoDB when selecting a database. If you don’t need the table join capabilities of relational databases, you will be better served from a cost, availability and performance standpoint by using DynamoDB. If you need support for transactions, use the recently released transaction library. You can also use GSI features with DynamoDB Local for offline development of your application. As your application becomes popular and goes from being used by thousands of users to millions or even tens of millions of users, you will not have to worry about the typical performance or availability bottlenecks applications face from relational databases that require application re-architecture. You can simply dial up the provisioned throughput that your app needs from DynamoDB and we will take care of the rest without any impact on the performance of your app.

Dropcam tells us that they adopted DynamoDB for seamless scalability and performance as they continue to innovate on their cloud based monitoring platform which has grown to become one of the largest video platforms on the internet today. With GSIs, they do not have to choose between scalability and query flexibility and instead can get both out of their database. Guerrilla Games, the developer of Killzone Shadow Fall uses DynamoDB for online multiplayer leaderboards and game settings. They will be leveraging GSIs to add more features and increase database performance. Also, Bizo, a B2B digital marketing platform, uses DynamoDB for audience targeting. GSIs will enable lookups using evolving criterion across multiple datasets.

These are just a few examples where GSIs can help and I am looking forward to our customers building scalable businesses with DynamoDB. I want application writers to focus on their business logic, leaving the heavy-lifting of maintaining consistency across look-up attributes to DynamoDB. To learn more see Jeff Barr’s blog and the DynamoDB developer guide.

As I discussed in my re:Invent keynote earlier this month, I am now happy to announce the immediate availability of Amazon RDS Cross Region Read Replicas, which is another important enhancement for our customers using or planning to use multiple AWS Regions to deploy their applications. Cross Region Read Replicas are available for MySQL 5.6 and enable you to maintain a nearly up-to-date copy of your master database in a different AWS Region. In case of a regional disaster, you can simply promote your read replica in a different region to a master and point your application to it to resume operations. Cross Region Read Replicas also enable you to serve read traffic for your global customer base from regions that are nearest to them.

About 5 years ago, I introduced you to AWS Availability Zones, which are distinct locations within a Region that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same region. Availability Zones have since become the foundational elements for AWS customers to create a new generation of highly available distributed applications in the cloud that are designed to be fault tolerant from the get go. We also made it easy for customers to leverage multiple Availability Zones to architect the various layers of their applications with a few clicks on the AWS Management Console with services such as Amazon Elastic Load Balancing, Amazon RDS and Amazon DynamoDB. In addition, Amazon S3 redundantly stores data in multiple facilities and is designed for 99.999999999% durability and 99.99% availability of objects over a given year. Our SLAs offer even more confidence to customers running applications across multiple Availability Zones. Amazon RDS offers a monthly uptime percentage SLA of 99.95% per Multi-AZ database instance. Amazon EC2 and EBS offer a monthly uptime percentage SLA of 99.95% for instances running across multiple Availability Zones.

As AWS expanded to 9 distinct AWS Regions and 25 Availability Zones across the world during the last few years, many of our customers started to leverage multiple AWS Regions to further enhance the reliability of their applications for disaster recovery. For example, when a disastrous earthquake hit Japan in March 2011, many customers in Japan came to AWS to take advantage of the multiple Availability Zones. In addition, they also backed up their data from the AWS Tokyo Region to AWS Singapore Region as an additional measure for business continuity. In a similar scenario here in the United States, Milind Borate, the CTO of Druva, an enterprise backup company using AWS told me that after hurricane Sandy, he got an enormous amount of interest from his customers in the North Eastern US region to replicate their data to other parts of the US for Disaster Recovery.

Up until AWS and the Cloud, reliable Disaster Recovery had largely remained cost prohibitive for most companies excepting for large enterprises. It traditionally involved the expense and headaches associated with procuring new co-location space, negotiating pricing with a new vendor, adding racks, setting up network links and encryption, taking backups, initiating a transfer and monitoring it until the operation complete. While the infrastructure costs for basic disaster recovery could have been very high, the associated system and database administration costs could be just as much or more. Despite incurring these costs, given the complexity, customers could have found themselves in a situation where the restoration process does not meet their recovery time objective and/or recovery point objective. AWS provides several easy to use and cost effective building blocks to make disaster recovery very accessible to customers. Using the S3 copy functionality, you can copy the objects/files that are used by your application from one AWS Region to another. You can use the EC2 AMI copy functionality to make your server images available in multiple AWS Regions. In the last 12 months, we launched EBS Snapshot Copy, RDS Snapshot Copy, DynamoDB Data Copy and Redshift Snapshot Copy, all of which help you to easily restore the full stack of your application environments in a different AWS Region for disaster recovery. Amazon RDS Cross Region Read Replica is another important enhancement for supporting these disaster recovery scenarios.

We have heard from Joel Callaway from Zoopla, a property listing and house prices website in UK that attracts over 20 million visits per month, that they are using the RDS Snapshot Copy feature to easily transfer hundreds of GB of their RDS databases from the US East Region to the EU West (Dublin) Region every week using a few simple API calls. Joel told us that prior to using this feature it used to take them several days and manual steps to set up a similar disaster recovery process. Joel also told us that he is looking forward to using Cross Region Read Replicas to further enhance their disaster recovery objectives.

AWS customers come from over 190 countries and a lot of them in turn have global customers. Cross Region Read Replicas also make it even easier for our global customers to scale database deployments to meet the performance demands of high-traffic, globally disperse applications. This feature enables our customers to better serve read-heavy traffic from an AWS Region closer to their end users to provide a faster response time. Medidata delivers cloud-based clinical trial solutions using AWS that enable physicians to look up patient records quickly and avoid prescribing treatments that might counteract the patient’s clinical trial regimen. Isaac Wong, VP of Platform Architecture with Medidata, told us that their clinical trial platform is global in scope and the ability to move data closer to the doctors and nurses participating in a trial anywhere in the world through Cross Region Read Replicas enables them to shorten read latencies and allows their health professionals to serve their patients better. Isaac also told us that using Cross Region Replication features of RDS, he is able to ensure that life critical services of their platform are not affected by regional disruption. These are great examples of how many of our customers are very easily and cost effectively able to implement disaster recovery solutions as well as design globally scalable web applications using AWS.

Note that building a reliable disaster recovery solution entails that every component of your application architecture, be it a web server, load balancer, application, cache or database server, is able to meet the recovery point and time objectives you have for your business. If you are going to take advantage of Cross Region Read Replicas of RDS, make sure to monitor the replication status through DB Event Notifications and the Replica Lag metric through CloudWatch to ensure that your read replica is always available and keeping up. Refer to the Cross Region Read Replica section of the Amazon RDS User Guide to learn more.

AWS re:Invent 2013

| Comments ()

Today we are kicking off AWS re:Invent 2013. Over the course of the next three days, we will host more than 200 sessions, training bootcamps, and hands on labs taught by expert AWS staff as well as dozens of our customers.

This year’s conference kicks off with a keynote address by AWS Senior Vice President Andy Jassy, followed by my keynote on Thursday morning. Tune in to hear the latest from AWS and our customers.

If you’re not already here in Vegas with us, you can sign up to watch the keynotes on live stream here.

Outside of the keynotes, there are an incredible number of sessions offering a tailored experience whether you are a developer, startup, executive, partner, or other. You can see the full session catalog here. I’m impressed by the scale and technical depth of what’s offered to attendees.

After my keynote on Thursday I will host two fireside chat sessions with cloud innovators and industry influencers:

First, I’ll talk with three technical startup founders

In the second session I will talk with three startup influencers

I will follow those two sessions with Startup Launches, where five companies will either launch their business or a significant feature entirely built on AWS. It will be a busy, fun, and informative afternoon!

Look forward to seeing you around the conference.