Performance and Scalability
In the A Word On Scalability posting I tried to write down a more precise definition of scalability than is commeonly used. There were good comments about the definition at the posting as well as in a discussion at The ServerSide.
To recap in a less precise manner I stated that
- A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added
- An always-on service is said to be scalable if adding resources to facilitate redundancy does not result in a loss of performance.
- A scalable service needs to be able to handle heterogeneity of resources.
There were quite a few comments about the use of performance in the definition. This is how I reason about performance in this context: I am assuming that each service has an SLA contract that defines what the expectations of your clients/customers are (SLA = Service Level Agreement). What exactly is in that SLA depends on the kind of service business you are in; quite a few of the services that contribute to an Amazon.com website have an SLA that is latency driven. This latency will have a certain distribution and you pick a number of points on the distribution as representatives for measuring your SLA. For example at Amazon we also track the latency at the 99.9% mark to make sure all of all customers are getting an experience at SLA or better.
This SLA needs to be maintained if you grow your business. Growing can mean increasing the number of requests, increasing the number of items you serve, increasing the amount of work you do for each request, etc. But no matter along which axis you grow, you will need to make sure you can always meet your SLA. Growth along some axis can be served by scaling up to faster CPUs and larger memories, but if you keep growing there is an end to what you can buy and you will need to scale out. Given that scaling up is often not cost effective, you might as well start by working on scaling out, as you will have to go that path eventually.
I have not seen many SLAs that are purely throughput driven. It is often a combination of the amount of work that needs to be done, the distribution in which it will arrive and when that work needs to be finished, that will lead to a throughput driven SLA. Latency does play a role here as it is often a driver for what throughput is necessary to achieve the output distribution. If you have a request arrival distribution that is non-uniform you can play various games with buffering and capping the throughput at lower than you peak load as long as you are willing to accept longer latencies. Often it is the latency distribution that you try to achieve that drives you throughput requirements.
There were some other points made with respect to what should be part of a scalability definition, among others by Gideon Low @ the serverside thread (I tried to link to his individual response but seem to fail) who make some good points.
- Operationally efficient – It takes less human resources to manage the system as the number of hardware resources scales up.
- Resilient – Increasing the number of resources will also increase the probability of failure of one of those resources, but the impact of such a failure should be reduced as the number of resource grows.
These two points combined with a discussion about cost/capacity/efficiency should be part of a definition of a scalable service. I’ll be thinking a bit about what the right wording should be and will post a proposal later.
1 TrackBacks
Listed below are links to blogs that reference this entry: Performance and Scalability.
TrackBack URL for this entry: http://mt.vogels.net/mt-tb.cgi/27
Strewth, cobber! It's IT Blogwatch, in which Red Hat buys JBoss for heaps of cash. Not to mention a couple of dags rapping [badly] ... Read More

An operational definition of scalability was provided by Michael D. Kersey on an old newsgroup thread:
http://groups.google.com/group/microsoft.public.inetserver.asp.components/msg/d9846b908f678f15?hl=en&
To quote from that newsgroup post:
“IMO a reasonable definition of scalability for a given platform P and application A is
S(A,P) = R(A,P) / C(A,P)
where
R = Maximum number of requests processed per second by application A on platform P,
C = Cost of hardware and software to develop and support application A on platform P.
I’ve assumed 100% availability for the purposes of this discussion. Availability could be added as an input to the definition if desired. This term displays the expected behavior shown by common usage of the term “scalability”:
1. As throughput R increases, scalability increases,
2. As cost C increases, scalability decreases,
3. Different platforms and different software may be compared using this definition,
4. You can use this definition to estimate costs of a proposed system, given an anticipated user load.
5. Both R and C can be estimated using known techniques.
So using this definition, scalability’s dimensions would be “requests processed per second per dollar”. Given the following known values for a single application Z:
running on platform X:
R(Z) = 1000 requests/second,
C(Z) = $40,000
S(Z) = 1000 requests/second / $40,000 = 0.025
running on not-so-fast but less expensive platform Y:
R(Z) = 500 requests/second,
C(Z) = $10,000
S(Z) = 500 requests/second / $10,000 = 0.05
While platform Y’s throughput (performance) is much less than that of platform Y, Y is much more scalable than (in fact is twice as scalable as) platform X when running application Z.
This definition can also be used to estimate the utility of using various software methodologies. For example, heavy use of components or object technology may or may not change each factor in the definition: the degree to which each is changed determines whether the resultant system is more or less scalable.”
I posted a follow-on to your original post here http://integralpath.blogs.com/thinkingoutloud/2006/04/vogels_on_scala.html
where I talk a little bit about evaluating scalability of an architecture. One of the things I mention is the "Marginal Cost of Scale." Many moons ago, a wise gentleman named Al Goerner impressed the value of this concept upon me. I think it is really relevant to point your allude to at the end of your post about the importance of cost, efficiency, and so on. Thanks for the really interesting posts.
I posted a blog page to explain where the performance goes to (wastes on) when it does not increase with the resource added. I also explored the quests to reclaim the performance, and why it is challenging: In Depth look at Data-Driven Cluster (On Clustering Part VI of VII)