These are the old pages from the weblog as they were published at Cornell. Visit www.allthingsdistributed.com for up-to-date entries.

April 21, 2003

Concurrency Now and Then

Concurrent programming is the single most difficult systems subject to teach. Not because the concepts are so difficult, as almost every student gets the basics quickly, even continuations and closures. What makes it difficult is that building large, complex systems using parallelism is in general a nightmare for testing and debugging. Ask a programmer what was the nastiest bug he/she ever had to fix and there is a good chance that it will involve some concurrency problems, most likely some memory corruption that occurred only when a certain number of threads were active or interrupts were firing, which can never be replayed using a debugger. Or some race condition that only happens when it is a full moon, it is raining and Jupiter is in the house of Mars. Some of the most difficult projects I have been involved with had to deal with introducing concurrency into a previously single-threaded program. Programs that have not been designed to deal with concurrency explicitly are the most difficult to get right. And just slapping a single monitor around a piece of code just doesn't cut it. And today, a-synchrony and thread-pools are common concepts that programmers have to deal with from the start.

I started thinking some more about this after reading Paul Graham's PyCon keynote "The Hundred-Year Language", which appeared as an article last week. Basically, he states that we have failed to exploit parallelism in our systems, although he does not give any reasons why. He goes on to state that in the future, parallelism will become an optimization late in the development of a program:

And this will (like asking for specific implementations of data structures,) be something that you do fairly late in the life of a program when you try to optimize it. Version 1s will ordinarily ignore any advantages to be had from parallel computation, just as they will ignore advantages to be had from specific representations of data.

This is absolutely the wrong approach to take. If you want to do parallelism correctly, you have make it explicit in all the phases of the design and development process. Every action needs to be examined for the impact of concurrency. Next to the potential bugs and roadblocks, making small subtle mistakes in concurrent programming can kill all the performance gains you were looking for. Just as with introducing distribution in a program, the only way you can deal with potential problems and mistakes is making it explicit, already in the design phase. Hiding it is a recipe for disaster, a remote procedure call is different from a local call, both in terms of performance and in failure semantics. And not only dealing with these system issues, but also dealing with semantic correctness, for some programs it doesn't matter whether request #1 gets executed before request #2, but for others you may have carefully examined.

This doesn't mean we cannot develop better tools for programmers to examine the impact of concurrency. When Sam Ruby writes about parallelism done right, his hope is for a language that will do the right thing. My hope is for tools for everyday use in order to help a programmer who is using an extra thread for UI responsiveness realize which data-structures this thread will touch. It will be the tools for debugging and testing that will allows us to better deal with parallelism in a practical way.

[BTW I don't like the fact the CLR does not allow me to wire a thread to a processor. Jim Miller's response was: 'We do not want you to have this level of control, and why would you want to do that anyway?']

Posted by Werner Vogels at April 21, 2003 11:04 AM
TrackBacks

Comments

Current microprocessors parallelize instruction streams, even executing some instructions speculatively. All without any foreknowledge required by compilers or authors.

Current programming languages manage heaps of storage automatically without requiring explicit code to identify when storage is to be freed.

From my perspective, hiding such concepts is a recipe for success, and making it explicit is a recipe for disaster.

Posted by: Sam Ruby on April 21, 2003 11:28 AM

I agree that there are different levels of granularity for concurrency and not all make sense to expose. Operating systems and compilers virtualize processors to an abstraction which can be expressed in programming languages and exploited by programmers. In general these are abstractions managed by the OS not by the processor itself. This is the level that I would like to continue to be exposed to programmers, and not hidden and introduced as an after thought or optimization as Graham suggests.

With respect to the gc example: many systems provide the programmer with the ability to tightly control when the gc is run, allow object specific cleanup routines when the gc frees instances, to control object placement (in the case of multiple heaps) or to access/import non-gc memory. So the ability to control and manipulate memory allocation remains available as a fundamental service.

Posted by: Werner Vogels on April 21, 2003 11:49 AM

So why do you want to wire a thread to a processor? I can think of a couple reasons -- ensuring responsiveness, optimizing cache usage -- but both would be better handled automatically, assuming we get to the point where it is handled automatically. So what's your motivation?

And by the way, I wonder whether the idea that you should leave concurrency for last might be a lot more tenable if you're dealing with a strict functional language. This would presumably allow you to automatically parallelize across threads on the same machine. It wouldn't deal with distributed failure however (e.g. the network goes down). I suspect Paul wasn't thinking about distributed systems, just parallel systems.

Posted by: Kimberley Burchett on April 27, 2003 12:26 PM

I study in faculty of science.I am selecting major of computerscience.I don't understand about project.What is project? Is it difficult?
Thank a lot

Posted by: Sasitorn Yongyai on December 18, 2003 02:43 AM