My paper to read this weekend was the Alan Demers' seminal paper on epidemic techniques for database replication. I realized that in 2004, before my Amazon days, I already wrote a blog post about the fundamental publications in the area of epidemics, so this seems like a good moment to revisit that with updated links, etc.
History of Epidemics
In the past 6-8 years we have been using various epidemic techniques in building our
reliable and scalable distributed systems
with great success. Now that industry is starting to deal with issues of scale
that can almost only be solved by using epidemic techniques, it becomes
important to produce some basic pointers to the origin of the use of epidemics
in distributed systems.
In a nutshell: An epidemic style of communication or state
sharing gives you a highly robust medium for distributed interaction. The main
- Probabilistic model. This doesn't mean that it gives less
guarantees than a deterministic model, but that we now have a good framework
for reasoning about the spread of information through a system over time.
- A-synchronous communication pattern. Any good epidemic communication
system allows you to operate in a 'fire-and -forget' mode, where, even if the
initial sender fails, all surviving nodes will receive the update (or none
- Autonomous & decentralized actions. Epidemics techniques enable you to take actions
based on the data you received without the need for additional communication to reach
agreement with your partners; you can take decisions autonomously.
- Robust with respect to message loss & node failures. Once a message
has been received by at least one of your peers it is almost impossible to
prevent the spread of the information through the system. In the most popular
demo I give, the system still operates under 90% message loss with limited or no
loss in functionality.
- Rigorous mathematical underpinnings. Finally we have a set
protocols where we can use rigorous mathematical techniques to reason about
the operation of the protocols under all sorts of conditions
techniques have a long history in science, but mainly in biology and
epidemiology, and in mathematics. The bible of epidemics from a theorectic point
of view is:
Epidemic Theory of Infectious Diseases and its Applications N.T.J.
Bailey Hafner Press Second Edition, 1957
This is not a computer science text, but it explains the real
fundamentals. If you interested in more CS oriented texts than the following
list has some general papers that deal with the basics of epidemic communication
or 'gossip' and are a good starting point to learn about fundamentals.
I purposely left the Cornell papers off this list, as not to appear too
self-serving. I believe the work at Cornell has been ground-breaking in bringing
the techniques to a larger CS audience and applying it to building robust and
scalable distributed systems. I'll give a reading list of Cornell work in a
- B. Baker, R. Shostak, Gossips and telephones, Discrete Mathematics
2(1972), pp. 191--193.
- A. Demers, D. Greene, A. Hauser, W.
Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and
Epidemic algorithms for replicated database maintenance.
In Proc. ACM Symp. on the Principles of Distr. Computing,
pages 1--12, August 1987.
- R. Golding and K. Taylor. Group membership in the epidemic style. Technical
Report UCSC-CRL-92-13, University of California, Santa Cruz,
- D. Agrawal, A. El-Abbadi, and R. Steinke. Epidemic algorithms in replicated databases. In
Proc. 16th ACM SIGACT-SIGMOD Symp. Princip. of Database
Systems (PODS), Tucson, Arizona, May 1997
- R. Karp, C. Schindelhauer, S. Shenker, B.
Vocking, Randomized rumor spreading, Proc. IEEE Symp.
Foundations of Computer Science, 2000.