Over a year ago the AWS team opened a "pop-up loft" in San Francisco at 925 Market Street. The goal of opening the loft was to give developers an opportunity to get in-person support and education on AWS, to network, get some work done, or just hang out with peers. It became a great success; every time when I visit the loft there is a great buzz with people getting advice from our solution architects, getting training or attending talks and demos. It became such a hit among developers that we decided to reopen the loft last year August after its initial run of 4 weeks, making sure everyone would have continued access to this important resource.
Building on the success of the San Francisco loft the team will now also open a pop-up loft in New York City on June 24 at 350 West Broadway. It will extend the concept that was pioneered in SF to give NYC developers access to great AWS resources:
- Ask an Architect: You can schedule a 1:1, 60-minute session with a member of the AWS technical team. Bring your questions about AWS architecture, cost optimization, services and features, and anything else AWS related. And don’t be shy — walk-ins are welcome too.
- Technical Presentations: AWS solution architects, product managers, and evangelists deliver technical presentations covering some of the highest-rated and best-attended sessions from recent AWS events. Talks cover solutions and services including Amazon Echo, Amazon DynamoDB, mobile gaming, Amazon Elastic MapReduce, and more.
- AWS Technical Bootcamps: Limited to 25 participants per bootcamp, these full-day bootcamps include hands-on lab exercises that use a live environment with the AWS console. Usually these cost $600, but at the AWS Pop-up Loft we are offering them for free. Bootcamps you can register for include “Getting Started with AWS — Technical,” “Store and Manage Big Data in the Cloud,” “Architecting Highly Available Apps,” and “Taking AWS Operations to the Next Level.”
- Self-paced, Hands-on Labs: Beginners through advanced users can attend labs on topics that range from creating Amazon EC2 instances to launching and managing a web application with AWS CloudFormation. Usually $30 each, these labs are offered for free in the AWS loft.
You are all invited to join us for the grand opening party at the loft on June 24 at 7 PM. There will be food, drinks, DJ, and free swag. The event will be packed, so RSVP today if you want to come and mingle with hot startups, accelerators, incubators, VCs, and our AWS technical experts. Entrance is on a first come, first serve basis.
I have signed up to do two loft events:
- Fireside Chat with AWS Community Heroes on June 16 starting at 6pm Jeremy Edberg (Reddit/Netflix) and Valentino Volonghi (Adroll) will be joining me at the San Francisco loft for a fireside chat about startups, technology, entrepreneurship and more. Jeremy and Valentino have been recognized by AWS as Community Heroes, an honor reserved for developers who’ve had a real impact within the community. Following the talk, we’ll kick off a unique networking social including specialty cocktails, beer, wine, food, and party swag!
- Fireside Chat with NYC Founders on July 7 a number of startup founders who have gone through NYC accelerators will join me for a conversation about trends in the New York startup scene.
I hope to see you there!
Today was a big day for the Amazon Web Services teams as a whole range of new services and functionality was delivered to our customers. Here is a brief recap of it:
The Amazon Machine Learning service As I wrote last week machine learning is becoming an increasingly important tool to build advanced data driven applications. At Amazon we have hundreds of teams using machine learning and by making use of the Machine Learning Service we can significantly speed up the time they use to bring their technologies into production. And you no longer need to be a machine learning expert to be able to use it.
Amazon Machine Learning is a service that allows you easily to build predictive applications, including fraud detection, demand forecasting, and click prediction. Amazon ML uses powerful algorithms that can help you create machine learning models by finding patterns in existing data, and using these patterns to make predictions from new data as it becomes available. The Amazon ML console and API provide data and model visualization tools, as well as wizards to guide you through the process of creating machine learning models, measuring their quality and fine-tuning the predictions to match your application requirements. Once the models are created, you can get predictions for your application by using the simple API, without having to implement custom prediction generation code or manage any infrastructure. Amazon ML is highly scalable and can generate billions of predictions, and serve those predictions in real-time and at high throughput. With Amazon ML there is no setup cost and you pay as you go, so you can start small and scale as your application grows. Details on the AWS Blog
The Amazon Elastic File System
AWS has been offering a range of storage solutions: objects, block storage, databases, archiving, etc. for a while already. Customers have been asking to add file system functionality to our set of solutions as much of their traditional software required an EC2 mountable shared file system. When we designed Amazon EFS we decided to build along the AWS principles: Elastic, scalable, highly available, consistent performance, secure, and cost-effective.
Amazon EFS is a fully-managed service that makes it easy to set up and scale shared file storage in the AWS Cloud. With a few clicks in the AWS Management Console, customers can use Amazon EFS to create file systems that are accessible to EC2 instances and that support standard operating system APIs and file system semantics. Amazon EFS file systems can automatically scale from small file systems to petabyte-scale without needing to provision storage or throughput. Amazon EFS can support thousands of concurrent client connections with consistent performance, making it ideal for a wide range of uses that require on-demand scaling of file system capacity and performance. Amazon EFS is designed to be highly available and durable, storing each file system object redundantly across multiple Availability Zones. With Amazon EFS, there is no minimum fee or setup costs, and customers pay only for the storage they use. Details on the AWS Blog.
The Amazon ECS Container Service
Containers are an important building block in modern style of software development and since the launch of Amazon ECS last year November it has become a very important tool for architects and developers. Today Amazon ECS moves into General Availability (GA) so you can use it for your certified production systems.
With going GA Amazon ECS also delivers a new scheduler to support long running application, see my detail block post over here: State Management and Scheduling with the Amazon EC2 Container Service Also read the details on the AWS Blog
Amazon Lambda One of the most exciting technologies we have built lately at AWS is Amazon Lambda. Developers really have flocked to using this serverless programming technology to build event driven services. A great example in the AWSSummit today was by Valentino Volonghi on the use of Lambda by Adroll to deliver real-time updates around the word to their DynamoDB instances. Today Amazon Lambda is entering General Availability. Two areas where Lambda is driving a lot of innovation at our customers is Mobile and the Internet of Things (IoT). We have taken feedback from our customers and driven much innovation to extend it with great new functionality:
- Synchronous Events – You can now create AWS Lambda functions that respond to events in your application in real time (synchronous) as well as asynchronously. Synchronous requests allow mobile and IoT apps to move data transformations and analysis to the cloud and make it easy for any application or web service to use Lambda to create back-end functionality. Synchronous events operate with low latency so you can deliver dynamic, interactive experiences to your users. To learn more about using synchronous events, read Getting Started: Handling Synchronous Events in the AWS Lambda Developers Guide.
- AWS Mobile SDK support for AWS Lambda (Android, iOS) – AWS Lambda is now included in the AWS Mobile SDK, making it easy to build mobile applications that use Lambda functions as their app backend. When invoked through the mobile SDK, the Lambda function automatically has access to data about the device, app, and end user identity, making it easy to create rich, personalized responses to in-app activity. To learn more, visit the AWS Mobile SDK page.
- Target, Filter, and Route Amazon SNS Notifications with AWS Lambda – You can now invoke a Lambda function by sending it a notification in Amazon SNS, making it easy to modify or filter messages before routing them to mobile devices or other destinations. Apps and services that already send SNS notifications, such as Amazon CloudWatch, gain automatic integration with AWS Lambda through SNS messages without needing to provision or manage infrastructure.
- Apply Custom Logic to User Preferences and Game State – Amazon Cognito makes it easy to save user data, such as app preferences or game state, in the AWS Cloud and synchronize it among all the user’s devices. You can now use AWS Lambda functions to validate, audit, or modify data as it is synchronized and Cognito will automatically propagate changes made by your Lambda function to the user’s devices. End user identities created using Amazon Cognito are also included in Lambda events, making it easy to store or search for customer-specific data in a mobile, IoT, or web backend.
- AWS CloudTrail Integration – Lambda now supports AWS CloudTrail logging for API requests. You can also use AWS Lambda to automatically process CloudTrail events to add security checks, auditing, or notifications for any AWS API call.
- Enhanced Kinesis Stream Management – You can now add, edit and remove Kinesis streams as event sources for Lambda functions using the AWS Lambda console, as well as view existing event sources for your Lambda functions. Multiple Lambda functions can now respond to events in a single Kinesis or DynamoDB stream.
- Increased Default Limits – Lambda now offers 100 concurrent executions and 1,000 TPS as a default limit and you can contact customer service to have these limits quickly raised to match your production needs.
- Enhanced Metrics and Logging – In addition to viewing the number of executions of your Lambda function and its error rate and duration, you can now also see throttled attempts through a CloudWatch metric for each function. Amazon CloudWatch Logs now also support time-based sorting, making it easier to search Lambda logs and correlate them with CloudWatch metrics. API enhancements make it easier to distinguish problems in your code (such as uncaught top-level exceptions or timeouts) from errors you catch and return yourself.
- Simplified Access Model and Cross-Account Support – Lambda now supports resource policies and cross-account access, making it easier to configure event sources such as Amazon S3 buckets and allowing the bucket owner to be in a separate AWS account. Separate IAM roles are no longer required to invoke a Lambda function, making it faster to set up event sources.
We will also launch Java as a programming language to be used in lambda in a few weeks. More details on the AWS Blog
Last November, I had the pleasure of announcing the preview of Amazon EC2 Container Service (ECS) at re:Invent. At the time, I wrote about how containerization makes it easier for customers to decompose their applications into smaller building blocks resulting in increased agility and speed of feature releases. I also talked about some of the challenges our customers were facing as they tried to scale container-based applications including challenges around cluster management. Today, I want to dive deeper into some key design decisions we made while building Amazon ECS to address the core problems our customers are facing.
Running modern distributed applications on a cluster requires two key components - reliable state management and flexible scheduling. These are challenging problems that engineers building software systems have been trying to solve for a long time. In the past, many cluster management systems assumed that the cluster was going to be dedicated to a single application or would be statically partitioned to accommodate multiple users. In most cases, the applications you ran on these clusters were limited and set by the administrators. Your jobs were often put in job queues to ensure fairness and increased cluster utilization. For modern distributed applications, many of these approaches break down, especially in the highly dynamic environment enabled by Amazon EC2 and Docker containers. Our customers expect to spin up a pool of compute resources for their clusters on demand and dynamically change the resources available as their jobs change over time. They expect these clusters to span multiple availability zones, and increasingly want to distribute multiple applications - encapsulated in Docker containers - without the need to statically partition the cluster. These applications are typically a mix of long running processes and short lived jobs with varying levels of priority. Perhaps most importantly, our customers told us that they wanted to be able to start with a small cluster and grow over time as their needs grew without adding operational complexity.
A modern scheduling system demands better state management than available with traditional cluster management systems. Customers running Docker containers across a cluster of Amazon EC2 instances need to know where those containers are running and whether they are in their desired state. They also need information about the resources in use and the remaining resources available as well as the ability to respond to failures, including the possibility that an entire Availability Zone may become unavailable. This requires customers to store the state of their cluster in a highly available and distributed key-value store. Our customers have told us that scaling and operating these data storage systems is very challenging. Furthermore, they felt that this was undifferentiated heavy lifting and would rather focus their energy on running their applications and growing their businesses. Let's dive into the innovations of Amazon ECS that addresses these problems and removes much of the complexity and "muck" of running a high performance, highly scalable Docker-aware cluster management system.
State Management with Amazon ECS
At Amazon, we have built a number of core distributed systems primitives to support our needs. Amazon ECS is built on top of one of these primitives - a Paxos-based transaction journal that maintains a history of state transitions. These transitions are offered and accepted using optimistic concurrency control and accepted offers are then replicated allowing for a highly available and highly scalable ACID compliant datastore. We then expose this state management behind a simple set of APIs. You call the Amazon ECS List and Describe APIs to access the state of your cluster. These APIs give the details of all the instances in your cluster and all the tasks running on those instances. The Amazon ECS APIs respond quickly whether you have a cluster with one instance and a few containers, or a dynamic cluster with 100s of instances and 1000s of containers. There is nothing to install and no database to manage.
Scheduling with Amazon ECS
The state management system underlying Amazon ECS enables us to provide our customers with very powerful scheduling capabilities. Amazon ECS operates as a shared state cluster management system allowing schedulers full visibility to the state of the cluster. The schedulers compete for the resources they need and our state management system resolves conflicts and commits serializable transactions to ensure a consistent and highly available view of cluster state. These transactional guarantees are required to ensure that changes in state are not lost, a very important property to ensure your jobs have the resources they require. This allows scheduling decisions to be made in parallel by multiple schedulers, allowing you to move quickly to create the distributed applications that are becoming increasingly common.
Amazon ECS includes schedulers for common workloads like long running services or run once jobs, and customers can write custom schedulers to meet their unique business or application requirements. This means there is no job queue, no waiting for tasks to start while locks are in place, and that your most important applications are getting the resources they need.
I hope this post has given you some insights into how and why we built Amazon ECS. Developing a system with these capabilities is hard and requires a lot of experience in building, scaling, and operating distributed systems. With Amazon ECS, these capabilities are available to you with just a few API calls. Building modern applications has never been easier. For a walkthrough of the new Amazon ECS Service scheduler and other features, please read Jeff Barr's post on the AWS blog and for a full list of features and capabilities, read Chris Barclay's post on the AWS Compute blog
Machine learning is a scientific discipline that explores the construction and study of algorithms that can learn from data. Such algorithms operate by building a model from example inputs and using that to make predictions or decisions, rather than following strictly static program instructions.
Machine Learning is playing an increasing important role in many areas of our businesses and our lives. ML is used for predictive analytics and predictive modeling, e.g. making predictions about the likelihood that a certain event is going to happen (will this customer be interested in this item, is this message spam). At Amazon machine learning has been key to many of our business processes, from recommendations to fraud detection, from inventory levels to book classification to abusive review detection. And there are many more application areas; search, autonomous cars (and drones), text and speech recognitions, game play, etc.
As is the case with most computer science, Machine Learning is not new. It roots are in the late 50's early 60's, although of course one can even claim that Turing was the first to discuss the topic. For this weekends reading instead going back to the early days I have picked two survey papers on two major categories of machine learning: supervised and unsupervised learning.
But first I suggest you read professor Pedro Domingos paper to understand the context of machine learning and what the prerequisites are for it to be successful.
A Few Useful Things to Know about Machine Learning, Pedro Domingos, Communications of the ACM, 55 (10), 78-87, 2012.
Unsupervised machine learning:
Data clustering: a review, A.K. Jain, M.N. Murty, and P.J. Flynn, ACM Computer Surveys, 31, 3 (September 1999)
Supervised machine learning:
Machine learning: a review of classification and combining techniques, S. B. Kotsiantis, I. D. Zaharakis, and P. E. Pintelas, Artificial Intelligence Review 26:159–190 (2006)
If all of this gets you excited and want to learn more I suggest you take professor Domnigos class on Machine learning at coursera:
Machine Learning - Why write programs when the computer can instead learn them from data? In this class you will learn how to make this happen, from the simplest machine learning algorithms to quite sophisticated ones. Enjoy!
For those of you who are interested in a more popular treatment of prediction I suggest you read Nate Silver's book The Signal and the Noise: Why So Many Predictions Fail--but Some Don't
As you all know security, privacy, and protection of our customer’s data is our number one priority and as such we work very closely with regulators to ensure that customers can be assured that they are getting the right protections when processing and storing data in the AWS. I am especially pleased that the group of European Union (EU) data protection authorities known as the Article 29 Working Party has approved the AWS Data Processing Agreement (DPA), assuring customers that it meets the high standards of EU data protection laws. The media alert below that went out today gives the details:
European Union Data Protection Authorities Approve Amazon Web Services’ Data Processing Agreement
Customers All Over the World Are Assured that AWS Agreement Meets Rigorous EU Privacy Laws
Brussels – March 31, 2015 – Amazon Web Services (AWS) today announced that the group of European Union (EU) data protection authorities known as the Article 29 Working Party has approved the AWS Data Processing Agreement (DPA), assuring customers that it meets the high standards of EU data protection laws. The approval of the AWS DPA, which embodies the Standard Contractual Clauses (often referred to as Model Clauses), means that AWS customers wishing to transfer personal data from the European Economic Area (EEA) to other countries can do so with even more knowledge that their content on AWS will be given the same high level of protection it receives in the EEA. For more detail on the approval from the Article 29 Working Party, visit the Luxembourg Data Protection Authority webpage here: http://www.cnpd.public.lu/en/actualites/international/2015/03/AWS/index.html
The AWS cloud is already being used extensively across the EU by startups, government agencies, educational institutions and leading enterprises such as Réseau Ferré de France and Veolia, in France, St James’s Place and Shell in the UK and Talanx and Hubert Burda Media in Germany. AWS customers have always had the freedom to choose the location where they store and process their content with the assurance that AWS will not move it from their chosen region. Customers have access to 11 AWS regions around the globe, including two in the EU – Ireland (Dublin) and Germany (Frankfurt) – which are comprised of multiple Availability Zones for customers to build highly secure and available applications. The DPA with Model Clauses gives AWS customers more choice when it comes to data protection and assures them that their content receives the same high levels of data protection, in accordance with European laws, no matter which AWS infrastructure region they choose around the world. The DPA is now available on request to all customers that require it.
“The security, privacy, and protection of our customer’s data is our number one priority,” said Dr Werner Vogels, Chief Technology Officer, Amazon.com. “Providing customers a DPA that has been approved by the EU data protection authorities is another way in which we are giving them assurances that they will receive the highest levels of data protection from AWS. We have spent a lot of time building tools, like security controls and encryption, to give customers the ability to protect their infrastructure and content. We will always strive to provide the highest level of data security for AWS customers in the EU and around the world.”
In the letter issued to AWS, the Article 29 Working Party said, “The EU Data Protection Authorities have analysed the arrangement proposed by Amazon Web Services” and “have concluded that the revised Data Processing Addendum is in line with Standard Contractual Clause 2010/87/EU and should not be considered as ‘ad-hoc’ clauses.” This means customers can sign the AWS Data Processing Addendum with Model Clauses without the need for authorization from data protection authorities, as would be necessary for contract clauses intended to address EU privacy rules that have not been approved, known as “ad hoc clauses.”
As well as having a DPA that has been approved by the Article 29 Working Party, AWS is fully compliant with all applicable EU data protection laws and maintains robust global security standards, such as ISO 27001, SOC 1, 2, 3 and PCI DSS Level 1. In 2013, the AWS Cloud was approved by De Nederlandsche Bank for use in the Dutch financial services sector, opening the door for financial services firms in The Netherlands to store confidential data and run mission-critical applications on AWS. AWS has teams of Solutions Architects, Account Managers, Trainers and other staff in the EU expertly trained on cloud security and compliance to assist AWS customers as they move their applications to the cloud. AWS also helps customers meet local security standards and has launched a Customer Certification Workbook, developed by independent certification body TÜV TRUST IT, providing customers with guidance on how to become certified for BSI IT Grundschutz in Germany. A copy of the workbook can be found at: http://aws.amazon.com/compliance/
“The EU has the highest data protection standards in the world and it is very important that European citizens' data is protected,” said Antanas Guoga, Member of the European Parliament. “I believe that the Article 29 Working Party decision to approve the data processing agreement put forward by Amazon Web Services is a step forward to the right direction. I am pleased to see that AWS puts an emphasis on the protection of European customer data. I hope this decision will also help to drive further innovation in the cloud computing sector across the EU.”
“For us, like many companies, data privacy is paramount,” said JP Schmetz, Chief Scientist at Hubert Burda Media. “One of the reasons we chose AWS is the fact that they put so much emphasis on maintaining the highest levels of security and privacy for all of their customers. This is why we are moving mission critical workloads to AWS.”
For more information on AWS Model Clauses please visit: http://aws.amazon.com/compliance/eu-data-protection More information on AWS’ data protection practices can be found on the AWS Data Protection webpage at: http://aws.amazon.com/compliance/data-privacy-faq/. A full list of compliance certifications and a list of the robust controls in place at AWS to maintain security and data protection for customers can be found on the AWS compliance webpage at: http://aws.amazon.com/compliance/.
Cloud computing is enabling amazing new innovations both in consumer and enterprise products, as it became the new normal for organizations of all sizes. So many exciting new areas are being empowered by cloud that it is fascinating to watch. AWS is enabling innovations in areas such as healthcare, automotive, life sciences, retail, media, energy, robotics that it is mind boggling and humbling.
Despite all of the amazing innovations we have already seen, we are still on Day One in the Cloud; at AWS we will continue to use our inventive powers to build new tools and services to enable even more exciting innovations by our customers that will touch every area of our lives. Many of these innovations will have a significant analytics component or may even be completely driven by it. For example many of the Internet of Things innovations that we have seen come to life in the past years on AWS all have a significant analytics components to it.
I have seen our customers do so many radical new things with the analytics tools that our partners and us make available that I have made a few observations I would like to share with you.
Cloud analytics are everywhere. There is almost no consumer or business area that is not impacted by Cloud enabled analytics. Often it is hidden from the consumer’s eye as it empowers applications rather than being the end game but analytics is becoming more prevalent. From retail recommendations to genomics based product development, from financial risk management to start-ups measuring the effect of their new products, from digital marketing to fast processing of clinical trial data, all are taken to the next level by cloud based analytics.
For AWS we have seen evidence of this as Amazon Redshift, our data warehouse service, has become the fastest growing Cloud service in the history of the company. We even see that for many businesses Amazon Redshift is the first cloud service they ever use. Adoption is now really starting to explode in 2015 as more and more businesses understand the power analytics has to empower their organizations. The integration with many of the standard analytics tools such as Tableau, Jaspersoft, Pentaho and many others make Redshift extremely powerful.
Cloud enables self-service analytics. In the past analytics within an organization was the pinnacle of old style IT: a centralized data warehouse running on specialized hardware. In the modern enterprise this scenario is not acceptable. Analytics plays a crucial role in helping business units become more agile and move faster to respond to the needs of the business and build products customers really want. But they are still bogged down by this centralized, oversubscribed, old style data warehouse model. Cloud based analytics change this completely.
A business unit can now go out and create their own data warehouse in the cloud of a size and speed that exactly matches what they need and are willing to pay for. It can be a small, 2 node, data warehouse that runs during the day, a big 1000 node data warehouse that just runs for a few hours on a Thursday afternoon, or one that runs during the night to give personnel the data they need when they come into work in the morning.
A great example of this is the work global business publication The Financial Times (FT) is doing with analytics. The FT is over 120 years old and has transformed how it has been using the cloud to run Business Intelligence (BI) workloads to completely revolutionize how they offer content to customers, giving them the ability to run analytics on all their stories, personalizing the paper, giving readers a more tailored reading experience. With the new BI system the company is able to run analytics across 140 stories per day, in real time, and increase their agility for completing analytics tasks from months to days. As part of this the FT has also expanded their BI to better target advertising to readers. By using Amazon Redshift they are able to process 120m unique events per day and integrate their internal logs with external data sources, which is helping to create a more dynamic paper for their readers. All of this while cutting their datawarehouse cost by 80%.
Cloud Analytics will enable everything to become smart. These days everything has the ability to become “smart” - a smart watch, smart clothes, a smart TV, a smart home, a smart car. However, in almost all cases this “smartness” runs in software in the cloud not the object or the device itself.
Whether it is the thermostat in your home, the activity tracker on your wrist, or the smart movie recommendations on your beautiful ultra HD TV, all are powered by analytics engines running in the cloud. As all the intelligence of these smart products live in the cloud it is spawning a new generation of devices. A good example here is the work Philips is doing to make street lighting smart with their CityTouch product.
Philips CityTouch is an intelligent light management system for city-wide street lighting. It offers connected street lighting solutions that allow entire suburbs and cities to actively control street lighting to manage the after dark environment in real time. This allows local councils to keep certain streets well lit, to accommodate high foot traffic, bring on lighting during adverse weather, when ambient light dims to a dangerous level, or even turn lighting down, for example in an industrial estate, where there are no people. This technology is already being used in places like Prague and in suburbs of London. CityTouch is using the cloud as the backend technology to run the system and extract business value from large amounts of data collected from sensors installed in the street lights. This data is allowing councils to better understand their cities after dark and employ more efficient light management programmes and avoid too much light pollution which can have an adverse effect on residents and wildlife around cities.
Cloud Analytics improves city life. Related to the above is the ability for cloud analytics to take information from the city environment to improve the living conditions for citizens around the world. A good example is the work the Urban Center for Computation and Data of the City of Chicago is doing. The City of Chicago is one of the first to bring sensors throughout the city that will permanently measure air quality, light intensity, sound volume, heat, precipitation, wind and traffic. The data from these sensor stream into the cloud where it is analyzed to find ways to improve the life of its citizens. The collected datasets from Chicago’s “Array of Things” will be made publically available on the cloud for researchers to find innovate ways to analyze the data.
Many cities have already expressed interest in following Chicago’s lead to use the cloud to improve city life and many are beginning to do the same in Europe such as the Peterborough City Council in the UK. Peterborough City Council is making public data sets available to outsource innovation to the local community. The different data sets from the council are being mashed together where people are mapping, for example, crime data against weather patterns to help the council understand if there are there more burglaries when it is hot and how they should resource the local police force. Or mapping hospital admission data against weather to identify trends and patterns. This data is being made open and available to everyone to drive innovation, thanks to the power of the cloud.
Cloud Analytics enable the Industrial Internet of Things. Often when we think about the Internet of Things (IoT) we focus on what this will mean for the consumer. But we are already seeing the rise of a different IoT - the Industrial Internet of Things. Industrial machinery is instrumented and Internet connected to stream data into the cloud to gain usage insights, improve efficiencies and prevent outages.
Whether this is General Electric instrumenting their gas turbines, Shell dropping sensors in their oil wells, Kärcher with fleets of industrial cleaning machines, or construction sites enabled with sensors from Deconstruction, all of these send continuous data streams for real time analysis into the cloud.
Cloud enables video analytics. For a long time video was recorded to be archived, played back and watched. With the unlimited processing power of the cloud there is a new trend arising: treating video as a data stream to be analyzed. This is being called Video Content Analysis (VCA) and it has many application areas from retail to transportation.
A common area of application is in locations where video cameras are present such as malls and large retail stores. Video is analyzed to help stores understand traffic patterns. Analytics provide the numbers of customers moving as well as dwell times, and other statistics. This allows retailers to improve their store layouts and in-store marketing effectiveness.
Another popular area is that of real time crowd analysis at large events, such as concerts, to understand movement throughout the venue and remove bottlenecks before they occur in order to improve visitor experience. Similar applications are used by transport departments to regulate traffic, detect stalled cars on highways, detect objects on high speed railways, and other transport issues.
Another innovative examples that has taken VCA into the consumer domain is Dropcam. Dropcam analyzes video streamed by Internet enabled video cameras to provide their customers with alerts. Dropcam is currently the largest video producer on the Internet, ingesting more video data into the cloud than YouTube.
VCA is also becoming a crucial tool in sports management. Teams are using video analysis to process many different angles on the players. For example the many video streams recorded during a Premier League match are used by teams to improve player performance and drive specific training schemes.
In the US video analytics is being used by MLB baseball teams to provide augmented real time analytics on video screens around the stadium while the NFL is using VCA to create automatic condensed versions of American football matches bringing the run time down by 60%-70%.
Cloud transforms health care analytics. Data analytics is quickly becoming central to analyzing health risk factors and improving patient care. Despite healthcare being an area that is under pressure to reduce cost and speed up patient care, cloud is playing a crucial role and helping healthcare go digital.
Cloud powers innovative solutions such as Phillips Healthsuite, a platform that manages healthcare data and provides support for doctors as well as patients. The Philips HealthSuite digital platform analyzes and stores 15 PB of patient data gathered from 390 million imaging studies, medical records, and patient inputs to provide healthcare providers with actionable data, which they can use to directly impact patient care. This is reinventing healthcare for billions of people around the world. As we move through 2015 and beyond we can expect to see cloud play even more of a role in the advancement of the field of patient diagnosis and care.
Cloud enables secure analytics. With analytics enabling so many new areas, from online shopping to healthcare to home automation, it becomes paramount that the analytics data is kept secure and private. The deep integration of encryption into the storage and in the analytics engines, with users being able to bring their own encryption keys, ensures that only the users of these services have access to the data and no one else.
In Amazon Redshift data blocks, system metadata, partial results from queries and backups are encrypted with a random generated key, then this is set of keys is encrypted with a master key. This encryption is standard operation practice; customers do not need to do anything. If our customers want full control over who can access their data they can make use of their own Master key to encrypt the data block keys. Customer can make use of the AWS Key Management Service to securely manage their own keys that are stored in Hardware Security Modules to ensure that only the customer has access to the keys and that only the customer controls who has access to their data.
Cloud enables collaborative research analytics. As Jim Gray already predicted in his 4th paradigm much of the research word is shifting from computational models to data-driven sciences. We already see this by many researchers making their datasets available for collaborative real-time analytics in the cloud. Whether these data sets come streamed from Mars or from the bottom of the Oceans, the cloud is the place to ingest, store, organize, analyze and share this data.
An interesting commercial example are the connected sequence systems from Illumina; the sequenced data is directly streamed to the cloud where the customer has access to BaseSpace, a cloud based market place for algorithms that can be used to process their data.
At AWS we are proud to power the progress that puts analytic tools in the hands of everyone. We humbled by what our customers are already doing with our current toolset. But it is still Day One; we will continue to innovate in this space such that our customers can go on to do even greater things.
Every year I enjoy travelling to the South-by-South-West (SXSW) festival as it is ons of the biggest event with many Amazon customers present. Thousand of AWS customers and partners will be in Austin for SXSW Interactive and given the free flowing networking it is a very important feedback opportunity for us. But also many Amazon customers will be there for the Film and the Music festival, and I always enjoy getting feedback from those Amazon consumers and producers that are attending these festivals.
The program is always a bit in flux, but here are the events in the beginning of the week that I am taking part in:
- Sunday 3/15 1-2pm - I will give a talk at Techstars on "The History of Microcroservices at Amazon". There will also be a talk and Q&A about Amazon Lamba at noon. Following my talk there will be a reception.
- Sunday 3/15 4-5pm - I will moderate a panel at ff Massive 2015 about "Scaling a Startup" with Shane Snow, Rami Essaid, Trevor Coleman, and Jordan Kretchmer.
- Monday 3/16 9:30am-1:30pm I will be a judge at the HATCH Startup competition.
- Monday 3/16 5-6pm I will do a fireside chat with Valentin Schöndienst the CEO of Move Fast from Berlin at the German House. It is followed by a Meet The Berliners Party co-sponsored by AWS.
I hope to see you there or at one of the many other events I drop in on.
Grapevine was one of the first systems designed to be fully distributed. It was built at the famous Xerox PARC (Palo Alto Research Center) Computer Science Laboratory as an exercise in discovering what is needed as the fundamental building blocks of a distributed system; messaging, naming, discovery, location, routing, authentication, encryption, replication, etc. The origins of the system are described in Grapevine: An Exercise in Distributed Computing by researchers who all went on to become grandmasters in distributed computing: Andrew Birrell, Roy Levin, Roger Needham, and Mike Schroeder.
For this weekend's reading we will use a followup paper that focusses on the learnings with running Grapevine for several years under substantial load.
Experience with Grapevine: The Growth of a Distributed System, Michael Schroeder, Andrew Birrell, and Roger Needham, in ACM Transactions on Computer Systems, vol. 2, no. 1, February 1984.
Several problems in Distributed Systems can be seen as the challenge to determine a global state. In the classical "Time, Clocks and the Ordering of Events in a Distributed System" Lamport had laid out the principles and mechanisms to solve such problems, and the Distributed Snapshots algorithm, popularly know as the Chandy-Lamport algorithm, is an application of that work. The fundamental techniques in the Distributed Snapshot paper are the secret sauce in many distributed algorithms for deadlock detection, termination detection, consistent checkpointing for fault tolerance, global predicate detection for debugging and monitoring, and distributed simulation.
An interesting anecdote about the algorithm is told by Lamport: "The distributed snapshot algorithm described here came about when I visited Chandy, who was then at the University of Texas in Austin. He posed the problem to me over dinner, but we had both had too much wine to think about it right then. The next morning, in the shower, I came up with the solution. When I arrived at Chandy's office, he was waiting for me with the same solution."
Distributed Snapshots: Determining Global States of a Distributed System K. Mani Chandy and Leslie Lamport, ACM Transactions on Computer Systems 3(1), February 1985.
Disk arrays, which organize multiple, independent disks into a large, high-performance logical disk, were a natural solution to dealing with constraints on performance and reliability of single disk drives. The term "RAID" was invented by David Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987. In their June 1988 paper "A Case for Redundant Arrays of Inexpensive Disks (RAID)" they argued that the top performing mainframe disk drives of the time could be beaten on performance by an array of the inexpensive drives that had been developed for the growing personal computer market. Although failures would rise in proportion to the number of drives, by configuring for redundancy, the reliability of an array could far exceed that of any large single drive.
In 1994 Peter Chen together with Ed Lee and the Berkeley team wrote a computer survey paper that lays out in great detail the background case for disk arrays and goes into the details of the various RAID models.
RAID: High-Performance, Reliable Secondary Storage Peter Chen, Edward Lee, Garth Gibson, Randy Katz and David Patterson, ACM Computing Surveys, Vol 26, No. 2, June 1994.