Amazon on AWS: Seamlessly integrating physical and emerging digital technologies

• 2059 words

The convergence of the physical and digital worlds

One area that personally fascinates me is how digital technologies are increasingly shaping the physical spaces around us, such as our homes and workplaces. Amazon Alexa is a great example of this—an on-demand AI assistant that exists in the cloud but that we can access with our voices to control the lighting in our homes, run our sprinklers, and lock our doors. This is the embodiment of our physical environment evolving due to enhancements provided by digital technologies. The natural language processing, machine learning models, speech synthesis, and all of the other complexity is performed in a digital system that sits beyond the walls of your home but is able to connect to that door lock and perform a physical action on your behalf. For an end user, the beauty of Alexa is that they don’t have to know how any of this works, which parts are physical or digital; it just makes their lives better.

Now, as a technologist, I'm always trying to look at the bigger picture, and when I think about things like Alexa, I also like to think about how these same concepts are applied in other areas. What’s more, you need to understand not just the technology itself, but how it serves your customers and how it fits into and affects its environment.

With these points in mind, let’s move away from our homes and look at industrial use-cases like production lines and the complex operations and machinery that may exist in these environments. Robots are a great example. Robotics is a field that has inspired our imaginations for decades through pop culture, where we see and read about robots accomplishing feats that are impossible for humans. However, as thought-provoking and fun as movies may make them, most robotic applications today are fairly practical and don’t live up to the hype that many people have built up in their minds. In fact, for most implementations, present-day robots operate on basic instruction sets and are incapable of doing anything beyond automating simple, repeatable tasks.

A robot that does just one thing is not a robot, it is just automation.

– Joseph Engelberger – The father of robotics

Now, that’s not because the technology isn’t there for them to do more—technologies such as IoT, computer vision, machine learning, and digital twins are all very real and having a very big impact in other areas, and they are applied to make robots more adaptable. In the coming years, we will see a rapid evolution of robotic capabilities, reliability, and safety, all driven by the seamless integration of emerging technologies, cloud, and newly connected hardware. To be successful in this evolution, as with Alexa, the best robots should work simply and in harmony with the people using them, without those individuals having to worry about what’s happening behind the scenes.

Recently, our Amazon Robotics team published a blog that talked about robotic arms that we use in our Amazon facilities to take the heavy lifting off of our employees and improve safety. The post itself is interesting and well worth the read. But what intrigues me about a system like this is the behind-the-scenes technology that makes things like this tick. While the backend systems for something as fascinating as advanced robotics may seem mundane to the ordinary observer, they’re absolutely essential to making everything work harmoniously.

So today I want to go deeper into this story and look at how Amazon Robotics is using AWS cloud technologies to deliver these amazing innovations.

The next evolution of Amazon robots

At surface level there’s nothing new about using robots in industrial settings. People have been using them to automate undifferentiated, repetitive tasks since the mid-20th century. However, traditional robots have been limited in capability and have generally required extremely consistent conditions. A simple change in workflow or environment could mean complete reconfigurations with significant downtime on a production line, for example. In today’s fast-paced industrial facilities, at Amazon or otherwise, there are many different situations where the items that robots interact with are constantly moving or changing, and unplanned equipment downtime is costly. In Amazon facilities, packages come in different shapes, sizes and conditions, with the information that needs to be scanned, such as on shipping labels, in different places. The packages, from soft mailers to small boxes, can even overlap or rest on top of one another as they move down the conveyor belt. A conventional robot might struggle to differentiate these packages as multiple items.

Unlike the robots of the past, the Robin (Robotic Induct) intelligent robotic arm—described in the above referenced Amazon Robotics article—has enhanced capabilities designed to tackle these challenges. It’s equipped with cutting-edge machine learning algorithms, cameras, computer vision, sensors and grippers to handle situations that are never precisely the same, like randomly overlapping packages. Another robot used by Amazon, the palletizer/depalletizer, calculates how to stack a stable pallet full of different shaped packages in real time. These next-generation robots are part of what enables us to scale operations and deliver goods with fewer mistakes.

Powered by AWS

These next generation robots will contribute to our goal of delivering the right goods, to the right customers, at the right time. To do this, they’re using AWS technologies such as AI for neural networks and computer vision, data streaming, storage and analytics from the edge to the cloud. But how do we make this technology “just work” for the people who are operating or working alongside the robots?

One of the key behind-the-scenes components is what the Amazon Robotics team calls Comprehensive Device Management, or CDM. It’s the underpinning of how Amazon Robotics and their many development teams are able to rapidly grow, monitor and manage their fleets at scale. And rather than the sometimes flashy and most talked about technologies, it’s backend capabilities like this that truly enable scale.

The vision of CDM is to orchestrate the deployment and management of existing device fleets and simplify on-boarding new device types. Robin’s robotic arm and the palletizer/depalletizer is only one example, there are many different types of robotic, human, and hybrid stations within Amazon Robotics’ purview, as well as edge compute devices (station computers) and other peripherals. You can think of CDM as the heart of what makes Amazon’s robotic operations truly smart—a leap in evolution over the robotic devices of the past that require on consistent conditions.

CDM’s impact is felt across three challenges that are quite common among the many industrial customers that I’ve talked to: safely streamlining device provisioning and management (this is particularly challenging in robotics when robots are “always on”), raising security posture and increasing the pace of industrial innovation, and—all at massive scale.

The Amazon Robotics team began working on CDM in 2018, and experimenting with several technologies it landed on an architecture that uses AWS IoT. They chose AWS IoT as it gave them a scalable framework to base the entire CDM system on, with several must-haves: a mature and lightweight messaging protocol (MQTT), a messaging broker, the ability to track individual device state (shadows), the ability to easily obtain credentials and the ability to run on-demand jobs such as firmware updates. In a complex environment such as the one Amazon Robotics builds for, all of these capabilities are critical components for continuous improvement and innovation.

For example, one of the biggest challenges that Amazon Robotics had to solve was managing intrusive operations. An intrusive operation, such as upgrading to new software versions, getting new credentials or anything else that could disrupt the robot’s operation, has to be done at the right time. Just like in any manufacturing or asset-intensive industry, we strive to reduce equipment downtime. When equipment is down, it’s not producing value. And with robots working alongside people, as they do at Amazon, intrusive operations can even have safety implications if not properly managed, like in the case of a robot stopping at an inopportune time to receive an update. CDM’s device agent needs to know when it’s safe and optimal to perform the intrusive operation, and a distinguishing featuring of CDM is that it decouples common device management functions from the robotic applications themselves. The system interfaces through a set of agreed upon MQTT topics and message schemas. CDM provides distinct strategies for how to orchestrate the running of intrusive jobs to maximize throughput and never interrupt work in progress. Using AWS IoT jobs and shadows, each individual device can actually make a request to the application control plane based on its own status to schedule a job for it, instead of having updates pushed down to the fleet of devices from the cloud. This is an important consideration when considering devices other than Robin, such as Amazon Robotics’ fleet of mobile robots which also leverage CDM for maintenance tasks. These mobile robots, which we call Drive Units, add a layer of complexity due to the need to schedule these maintenance jobs at different times in order to continue optimal performance of the larger Drive Unit fleet operations. Using CDM control plane, missions can be scheduled when it makes sense for each individual Drive Unit. The Drive Units will move to a safe place at a safe time (i.e., not operating) to perform the job. When the drive unit is ready, it can go into maintenance mode and then execute the job on its own behalf. When it is done, it will inform the control plane and continue to receive production missions.

Another challenge is that, at Amazon, we are operating at large scale and have multiple individual robotics teams working on different problems. To help these teams innovate faster, we needed to remove undifferentiated heavy lifting, such as developing and managing provisioning services. Because CDM provides a single pane of glass to the shared infrastructure and functionality (e.g., certificate and credential management), each decentralized team of developers can spend more time focusing on their unique business logic and building differentiated services, while maintaining control and ownership of their devices. CDM saves these teams time by providing a standardized provisioning and management process, letting them deploy devices into production and continuing to manage them with minimal touch points.

And last but not least, CDM improves the overall security posture of Amazon Robotics systems. Previously, robotic application teams had different solutions for credential rotation with different frequency requirements. Sometimes, they required manual actions, which are prone to error and can make your assets more vulnerable. CDM uses AWS IoT managed services to automate and standardize device management and credential rotation in accordance with security best practices. With CDM, Amazon Robotics deploys and rotates credentials, such as for device provisioning or wireless authentication, on a per device basis where nothing is shared, reducing potential attack surfaces for devices in Amazon facilities. Because this is managed by AWS IoT, Amazon Robotics is able to mitigate security challenges that could potentially take a lot more resources and effort if they were to build it themselves.

Scaling out

With the Amazon Robotics team just starting on CDM in 2018, they’re still in the process of rolling out CDM and these advanced robots into more facilities, scaling capabilities and improving how safely and reliably Amazon can get products to customers. Already, though, they’re seeing the fruits of their labor by utilizing cloud technologies such as AWS IoT. To handle the increase in data that’s getting collected, processed and analyzed, Amazon is relying on AWS’s ability to scale to even the largest customers’ needs for some of the most complex use cases. Jule Slootbeek, an Amazon Robotics software engineer, may have said it best when he observed, “CDM enables our customers to deploy to the edge as easily as they deploy to the cloud. By providing a single, multi-tenanted, centralized management layer to handle shared functionality, we empower tens to hundreds of customers to deploy thousands to millions of devices.”

Amazon facilities move millions of packages every day, and it’s truly amazing what Amazon Robotics is doing to transform the way we can safely and reliably deliver these to our customers. To achieve that there is so much going on behind the scenes digitally, like CDM, that makes it all happen efficiently at scale, and with continued innovation and improvement. When the physical and digital world merge it all just looks like magic.