These are the old pages from the weblog as they were published at Cornell. Visit for up-to-date entries.

July 15, 2004

The world is asynchronous

I find that the terms loosely-coupled and asynchronous systems as scalable system structuring concepts are interpreted by many developers as 'using non-blocking techniques' to build the individual software components. These are somewhat related but in essence different concepts.

Blocking and non-blocking in communication are programming paradigms, not essential distributed systems concepts. All communication as we know today is non-blocking or asynchronous. Any notion of ‘blocking’ or synchrony is a programming construct that was introduced as a developer friendly programming paradigm to handle request/response patterns or to achieve guarantees such as reliability or stability. But in reality, at the lowest level messages are always non-blocking.

Both blocking and non-blocking programming styles have their applications, and the choice is often a matter of design preferences combined with the limitations of the target operating system and its thread packages. As shown in Capriccio you can have many threads with blocking operations and still have a well controlled system. And even sometimes it is just easier to fork of a process if your thread system is not good enough, as for example in the Flash web server there are helper processes that do the path parsing using lstat systems call, which is only available in a blocking fashion. It was easier to construct this in a helper process, than to convert it into the pure event driven asynchronous structure of the main server.

The clear notion that all operations are in essence asynchronous is obscured by the fact that some operating systems only provide synchronous interfaces to their asynchronous core. If your OS only gives you a blocking ‘connect’ call, you’re seriously handicapped in making good design decisions. Often we see then that developers construct an non-blocking layer over the blocking systems calls, just to get back to the true asynchronous events as they also occurred in the core of the operation system.

Whether you have an explicit request producer and a response consumer, or whether you have layered an synchronous software layer on top of this with the help of some parallelism, this is a programming decision. And the level of synchrony can be completely application defined: For example consider the case where a document get submitted to the printer; whether the initiator is not blocked at all, is blocked until printer accepted the job, or blocked until the printer has completed the job, is entirely up to the designer/programmer, there is nothing in the communication with the printer that forces the caller to wait.

In my view there is however a serious disadvantage to using synchronous programming techniques, even if they are just layered over a purely asynchronous system and enough parallelism is used to ensure that the overall application will never block. There is the misguided desire to make remote and local operations transparent, which is possible if you program using synchronous techniques. If you want to pretend that a remote call is identical to a local call than you run into trouble, because there are performance and failure scenarios that not covered if you are not aware distributed nature of the application. So for me the main disadvantage is that synchronous operations tend to obscure the distributed aspects of a system. I have written about this before (see "six misconceptions about reliable distributed computing") but it also is the conclusion of papers such as "Note on Distributed Computing" by Jim Waldo and friends.

The appeal of asynchronous, message oriented middleware is that it makes distribution explicit. And if you allow the asynchronous message paradigm to be used as a structuring methodology for all of your application than it is easy to make all stakeholders be aware that this is a distributed application, not a local application with distribution hidden under the covers.

Sometimes your process will just have to wait for another (remote) resource to complete an operation, there is no way around that. You can use multiple threads + blocking call or a "send message + back to eventloop + process response" to do "check_creditcard", both will work. Most important is that you are aware of what is happening under the covers when you issue the call.

But most important: they are programming paradigms, not networking technologies or immutable distributed systems components. And a problem is that often we let our programming dictate the way we see functionality or component interactions.

To build scalable systems we have to step away from what our programming preferences dictate us and involve ourselves in a thinking adventure about how components really need to and can interact on a global scale. I have read many papers and articles on this topic, but I consider Pat Helland almost be the only person who really graps what the complexity of enterprise systems design is going to by like if you really need to scale systems. See for example his "SOA is like the night sky" article. This is the kind of visionary thinking that should motivate all of us to drop our RPC or multicast battle axis' and start thinking about how large systems really should look like.

Posted by Werner Vogels at July 15, 2004 12:34 PM

re: When to use ASMX, ES or Remoting
Weblog: On the road to Indigo
Tracked: July 22, 2004 05:59 PM
re: Requirement Analysis defines your application architecture
Weblog: Klaus Aschenbrenner - Looking into a smart future...
Tracked: January 23, 2005 07:38 PM

In the 80's there was a distributed system called
V (from Stanford I think), that had synchronous calls and it had performance similar to an asynchronous system because it was so much simpler.

Posted by: Bill Petheram on July 16, 2004 07:32 AM

It's an interesting way to look at things. However, I always arranged these concepts in a different way in my head.

Both blocking & non-blocking “programming paradigms” can be deemed to be synchronous. In both these scenarios our piece of code essentially decides when to perform some action, it may or may not wait for the completion.

On the other hand, asynchronism is possible only in push-based /interrupt based systems. Some other component informs our component / piece of code to possibly initiate some work. (Though, it could be argued that there isn't anything as truly interrupt driven, everything is eventually some kind of polling at a lower level)

Posted by: Aseem Bajaj on July 18, 2004 11:39 PM