April 2006 Archives
In the A Word On Scalability posting I tried to write down a more precise definition of scalability than is commeonly used. There were good comments about the definition at the posting as well as in a discussion at The ServerSide.
To recap in a less precise manner I stated that
- A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added
- An always-on service is said to be scalable if adding resources to facilitate redundancy does not result in a loss of performance.
- A scalable service needs to be able to handle heterogeneity of resources.
There were quite a few comments about the use of performance in the definition. This is how I reason about performance in this context: I am assuming that each service has an SLA contract that defines what the expectations of your clients/customers are (SLA = Service Level Agreement). What exactly is in that SLA depends on the kind of service business you are in; quite a few of the services that contribute to an Amazon.com website have an SLA that is latency driven. This latency will have a certain distribution and you pick a number of points on the distribution as representatives for measuring your SLA. For example at Amazon we also track the latency at the 99.9% mark to make sure all of all customers are getting an experience at SLA or better.
This SLA needs to be maintained if you grow your business. Growing can mean increasing the number of requests, increasing the number of items you serve, increasing the amount of work you do for each request, etc. But no matter along which axis you grow, you will need to make sure you can always meet your SLA. Growth along some axis can be served by scaling up to faster CPUs and larger memories, but if you keep growing there is an end to what you can buy and you will need to scale out. Given that scaling up is often not cost effective, you might as well start by working on scaling out, as you will have to go that path eventually.
I have not seen many SLAs that are purely throughput driven. It is often a combination of the amount of work that needs to be done, the distribution in which it will arrive and when that work needs to be finished, that will lead to a throughput driven SLA. Latency does play a role here as it is often a driver for what throughput is necessary to achieve the output distribution. If you have a request arrival distribution that is non-uniform you can play various games with buffering and capping the throughput at lower than you peak load as long as you are willing to accept longer latencies. Often it is the latency distribution that you try to achieve that drives you throughput requirements.
There were some other points made with respect to what should be part of a scalability definition, among others by Gideon Low @ the serverside thread (I tried to link to his individual response but seem to fail) who make some good points.
- Operationally efficient – It takes less human resources to manage the system as the number of hardware resources scales up.
- Resilient – Increasing the number of resources will also increase the probability of failure of one of those resources, but the impact of such a failure should be reduced as the number of resource grows.
These two points combined with a discussion about cost/capacity/efficiency should be part of a definition of a scalable service. I’ll be thinking a bit about what the right wording should be and will post a proposal later.
I don't want to turn this into a corporate announcement weblog, but I believe some of my new readers may be interested in this.
Go check out Amazon Wire, a podcast about books, music, movies and those who create them.
It is nice that people thought yesterday’s posting was funny. There is however a very serious aspect to it. From the reactions I understand that people think that posting IM handles is too far-out to even consider seriously, which made it a good April’s Fools joke. For me, I do not believe it is ridiculous for a company to engage in real-time interaction.
Amazon continuously innovates the way it interacts with its customers, trying to deliver a better customer experience. From the beginning that has meant engaging our customers to participate in ways that was not as common as it is now. Whether it is about discovering products that you may be interested in or about being as informed as possible about choices, about expressing your opinions or providing relationships, there is always a continuous stream of innovations happening at Amazon to improve the ways that customers can get value out of our services. In many ways this is still day-one; Search-Inside-The-Book is not an endpoint, it is a beginning. For example we now use book content analysis to provide relationships (sips, stats, citations, etc.) between books that was not possible before. And the possibilities to do more are endless.
In my eyes, instant messenger, weblogs, forums, tagging, wikis, etc, are infrastructure. Infrastructure in the sense that it is interesting in what you do with it, in how you use it to provide value. How it can be a vehicle to affect change. Amazon Connect looks to some people like weblogs-for-authors; to others it does not meet all the weblog definition criteria. Whether it does or not is completely irrelevant, what is important is that we wanted authors to have new ways in which they could interact with their readers, and in that way this program delivers the extra value our customers (authors and readers alike) look for at Amazon. The program is very successful and I am sure we will be improving it in ways to make it deliver even more value. I don’t think anybody cares whether new innovations around Amazon Connect will make it look more weblog-like, more IM like, or be something completely new.
For me, everything is game in making Amazon.com the best place for people to look for products, to provide their opinions about them, and to help other customers make good choices. If the best way to achieve that would include posting IM handles, or developing completely new real-time interaction techniques that match Amazon better, we would certainly do so. More importantly, we will continue to push the envelope in any area to deliver a better service.
PS. I didn’t write this to start yet another debate, I just wanted to clarify that there was more to it than just a joke …
In order to get closer to their customers, humanize Amazon, increase sales, and stay modern, Amazon.com has decided to make all Instant Messenger (IM) handles of its employees public. This way Amazon.com customers will get unprecedented access to the talented engineers at Amazon to answer all their questions, or just to have an interesting conversation about a new book or that old sci-fi movie. If you want to know why the shipping prediction date was not really clear, feel free to IM Justin Rudd, and get the details behind the algorithms he used to give Amazon.com customers a fast estimate on when they can expect their purchases.
As we have come to expect by now, Amazon.com is once again revolutionizing the industry with how customers are being served. It is expected other companies will be scrambling to imitate their success, and provide access to their employees also. Industry specialists and sociology professors alike have lauded Amazon.com for really “getting it”, for understanding that IM is the way that future generations will want to communicate.
When asked whether their employees would be given special training to handle the new forms of conversations, a company spokesperson responded that Amazon.com’s CTO, Werner Vogels, will be used as an example of a warm and fuzzy communication style. Rumors are that an early document written by Vogels titled “How to use blunt questions to get to the truth” was rejected as being too effective.
Update: This was indeed for some part an April Fool's joke, but there is also a very serious note to it. For that read part II
