Document Model Support in DynamoDB: Flexibility, Availability, Performance, and Scale...Together at last

All Things Distributed Now Go Build! Articles @werner

Document Model Support in DynamoDB: Flexibility, Availability, Performance, and Scale...Together at last

October 08, 2014 • 1649 words

Today, I’m thrilled to announce several major features that significantly enhance the development experience on DynamoDB. We are introducing native support for document model like JSON into DynamoDB, the ability to add / remove global secondary indexes, adding more flexible scaling options, and increasing the item size limit to 400KB. These improvements have been sought by many applications developers, and we are happy to be bringing them to you. The best part is that we are also significantly expanding the free tier many of you already enjoy by increasing the storage to 25 GB and throughput to 200 million requests per month. We designed DynamoDB to operate with at least 99.999% availability. Now that we have added support for document object model while delivering consistent fast performance, I think DynamoDB is the logical first choice for any application. Let’s now look at the history behind the service and the context for new innovations that make me think that.

NoSQL and Scale

More than a decade ago, Amazon embarked on a mission to build a distributed system that challenged conventional methods of data storage and querying. We started with Amazon Dynamo, a simple key-value store that was built to be highly available and scalable to power various mission-critical applications in Amazon’s e-commerce platform. The original Dynamo paper inspired many database solutions, which are now popularly referred to as NoSQL databases. These databases trade off complex querying capabilities and consistency for scale and availability.

In 2012, we launched Amazon DynamoDB, the successor to Amazon Dynamo. For DynamoDB, our primary focus was to build a fully-managed highly available database service with seamless scalability and predictable performance. We built DynamoDB as a fully-managed service because we wanted to enable our customers, both internal and external, to focus on their application rather than being distracted by undifferentiated heavy lifting like dealing with hardware and software maintenance. The goal of DynamoDB is simple: to provide the same level of scalability and availability as the original Dynamo, while freeing developers from the burden of operating distributed datastores (such as cluster setup, software upgrades, hardware lifecycle management, performance tuning, security upgrades, operations, etc.) Since launch, DynamoDB has been the core infrastructure powering various AWS and Amazon services and has provided more than 5 9s of availability worldwide. Developers within and outside of Amazon have embraced DynamoDB because it enables them to quickly write their app and it shields them from scaling concerns as their app changes or grows in popularity. This is why DynamoDB has been getting widespread adoption from exciting customers like AdRoll, Scopely, Electronic Arts, Amazon.com, Shazam, Devicescape, and Dropcam.

NoSQL and Flexibility: Document Model

A trend in NoSQL and relational databases is the mainstream adoption of the document model. JSON has become the accepted medium for interchange between numerous Internet services. JSON-style document model enables customers to build services that are schema-less. Typically, in a document model based datastore, each record and its associated data is modeled as a single document. Since each document can have a unique structure, schema migration becomes a non-issue for applications.

While you could store JSON documents in DynamoDB from the day we launched, it was hard to do anything beyond storing and retrieving these documents. Developers did not have direct access to the nested attributes embedded deep within a JSON document, and losing sight of these deeply nested attributes deprived developers of some of the incredible native capabilities of DynamoDB. They couldn’t leverage capabilities like conditional updates (the ability to make an insert into a DynamoDB table if a condition is met based on the latest state of the data across the distributed store) or Global Secondary Indexes (the ability to project one or more of the attributes of your items into a separate table for richer indexing capability). Until now, developers who wanted to store and query JSON had two choices: a) develop quickly and insert opaque JSON blobs in DynamoDB while losing access to key DynamoDB capabilities, or b) decompose the JSON objects themselves into attributes, which requires additional programming effort and a fair bit of forethought.

Enter Native Support for JSON in DynamoDB: Scalability and Flexibility Together at Last

The DynamoDB team is launching document model support. With today’s announcement, DynamoDB developers can now use the AWS SDKs to inject a JSON map into DynamoDB. For example, imagine a map of student id that maps to detailed information about the student: their names, a list of their addresses (also represented as a map), etc.

{
      “id”: 1234,
      “firstname”: “John”,
      “lastname”: “Doe”,
      “addresses”: [
       {
	        “street”: “main st”,
	        “city”: “seattle”,
	        “zipcode”: 98005,
            “type”: “current”,
       },
       {
	       “street”: “9th st”,
	       “city”: seattle,
	       “zipcode”: 98005,
           “type”: “past”,
       }
       ]
}

With JSON support in DynamoDB, you can access the city of a student’s current address by simply asking for students.1234.address[0].city. Moreover, developers can now impose conditions on these nested attributes, and perform operations like delete student 1234 if his primary residence is in Seattle.

With native support for JSON, we have unleashed the scalability and consistent fast performance capabilities of DynamoDB to provide deeper support for JSON. Now, developers do not have to choose between datastores that are optimized for scalability and those that are optimized for flexibility. Instead they can pick one NoSQL database, Amazon DynamoDB, that provides both.

Online Indexing: Improved Global Secondary Indexes

Global Secondary Index (GSI) is one of the most popular features for DynamoDB. GSIs enable developers to create scalable secondary indexes on attributes within their JSON document. However, when we initially launched GSI support, developers had to identify all of their secondary indexes up front: at the time of the table creation. As an application evolves and a developer learns more about her use cases, the indexing needs evolve as well. To minimize the up front planning, we will be adding the ability to add or remove indexes for your tables. This means that you can add, modify, and delete indexes on your table on demand. As always, you maintain the ability to independently scale your GSI indexes as the load on each index evolves. We will add the Online Indexing capability soon.

Support for Larger Items

With the document model, since you store the primary record and all its related attributes in a single document, the need for bigger items is more critical. We have increased the size of items you are able to store in DynamoDB. Starting today, you can store 400KB objects in DynamoDB, enabling you to use DynamoDB for a wider variety of applications.

Even Faster Scaling

We have made it even easier to scale your applications up and down with a single click. Previously, DynamoDB customers were only able to double the provisioned throughput on a table with each API call. This forced customers to interact with DynamoDB multiple times to, for example, scale their table from 10 writes per second to 100,000 writes per second. Starting today, you can go directly from 10 writes per second to 100,000 writes per second (or any other number) with a single click in the DynamoDB console or a single API call. This makes it easier and faster to reduce costs by optimizing your DynamoDB table’s capacity, or to react quickly as your database requirements evolve.

Mars rover image indexing using DynamoDB

We put together a demo app that indexes the metadata of NASA/JPL’s Curiosity Mars rover images. In this app, the images are stored in S3 with metadata represented as JSON documents and stored/indexed in DynamoDB.

Take a look at the application here: http://dynamodb-msl-image-explorer.s3-website-us-east-1.amazonaws.com/

Tom Soderstrom, IT Chief Technology Officer of NASA JPL, has made the point that leveraging the power of managed services like DynamoDB and S3 for these workloads allows NASA/JPL to scale applications seamlessly without having to deal with undifferentiated heavy lifting or manual effort. Having native JSON support in a database helps NASA developers write apps faster and makes it easier to share more data with global citizen scientists.

Look for a more detailed blog on the architecture of this application and its sample source code in the next few days!

Expanding the freedom to invent

Not only did we add these new capabilities, we have also expanded the free tier. Amazon DynamoDB has always provided a perpetual free tier for developers to build their new applications. Today we are announcing a significant expansion to the free tier. We are increasing the free storage tier to 25GB and giving you enough free throughput capacity to perform over 200 Million requests per month. What does this mean for you as an application developer? You could build a web application with DynamoDB as a back-end and handle over 200 million requests per month, and not have to pay anything for the database. You could build a new gaming application that can support 15,000 monthly active users. You could build an ad-tech platform that serves over 500,000 ad impression requests per month. We are giving you the freedom and flexibility to invent on DynamoDB.

What does this all mean?

To sum it all up, we have added support for JSON to streamline document oriented development, enhanced the size of items you can store in tables to 400KB, and provided a faster and even easier way to scale up DynamoDB. These features are now available in four of our AWS Regions, and will be available in the remaining regions shortly. Today they are available in: US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo) region, and EU (Ireland). We also announced the ability to add or remove Global Secondary Indexes dynamically on existing tables, which will be available soon.

DynamoDB is the logical first choice for developers building new applications. It was designed to give developers flexibility and minimal operational overhead without compromising on scale, availability, durability, or performance. I am delighted to share today’s announcements and I look forward to hearing how you use our new features!