These are the old pages from the weblog as they were published at Cornell. Visit for up-to-date entries.

November 03, 2003

The Politics of Distribution Transparency

On the last day of the PDC Jim Gray and I were walking back from lunch and Jim said something like 'you must be very happy now that this non-transparency thing is finally catching on'. I have always been a very big advocate of making distribution explicit in all its forms. Although I mainly target the server side, I am willing to sacrifice client/server transparency if necessary.

Jim never agreed, his motto is 'users want transparency' and thus we must do our utmost best to give it to them. My counter argument always is that transparency only works if you can make it perfect. You cannot say it is transparent expect in these 5% of your operations. That is a recipe asking for trouble.

One of the earliest areas we encountered this problem was handling transparently replicated distributed objects. You would use a state machine approach to model the input on objects and make sure two or more object instances processed all the requests in the same order. It would work if the objects were real simple (even though the runtime has to do a lot of complex failure handling), but it would completely run awry when the objects would become more complex. In many cases of production deployment we saw that the replicated objects would be front-ends to other tiers that had no in support for the replication, and a call to the objects would trigger invocation chains through layers that eventually would hit some common objects. Big trouble. It was like introducing multi-threading into a program that was previously single threaded, and you didn't tell anybody about it.

We know that transparently switching from single-treaded to multi-threaded doesn't work, and so doesn't transparently switching from local to distributed processing. Maybe in 95% of the cases you can make it go right, but you will seriously screw up in the other cases. If you want things to scale or run over heterogeneous networks it becomes worse and suddenly you have to account for procedure calls or method invocations that have very special failure and performance semantics.

So,  yes, I am somewhat satisfied that Indigo seems to take the notion that distributed operations should be explicit, that reliability considerations are explicit, that local operations can be grouped to have transactional semantics and thus have an explicit failure model. But I am curious to see whether the Indigo team can withstand the pressure to make things simple and transparent again.

In 1994/1995 I was actively involved with the programming model that later became the Virtual Interface Architecture, and Microsoft contributed a lot of brain cycles to building that model. Only to develop Winsock-Direct some years later, because of the argument that user didn't mind change, they just didn't like changing , and this new explicitly distributed programming model made it way to difficult to port applications to the new system. So a lot of the performance and power of the Interface was stripped for the sake of transparency. And it hasn't made VIA that much more successful.

Lets see whether the Indigo team can withstand the political pressure in this case. Only then will I be satisfied, and it will be up to Jim to buy me some beers.

Posted by Werner Vogels at November 3, 2003 12:41 PM