These are the old pages from the weblog as they were published at Cornell. Visit www.allthingsdistributed.com for up-to-date entries.

June 19, 2003

Day 2 at Middleware 2003 - session 2

The session was on middleware for web based systems (talk #3 has an exciting result).

The session started with a very interesting talk by Sameh Elnikety of Rice/EPFL on a performance comparison of 4 different styles of systems for building dynamic content:

  1. Apache/PHP + MySQL
  2. Apache/Tomcat + MySQL
  3. Apache + Tomcat (on a separate machine) + MySQL
  4. Apache + Tomcat + EJB (JOnAS) + MySQL

The benchmarks used resemble an online book store and an online auction site. The paper contains interesting numbers with respect to locking at the database versus the servlet/EJB levels. The systems as well as the benchmark code can be downloaded.

The second presentation is on a machine learning strategy for user web surf behavior combined with a prefetching strategy. This is given by Daby Sow of IBM Research (Watson). It shows the effectiveness of different learning strategies and how it can drive prefetch strategy. The prefetch engine is implemented at the proxy server. The paper shows that the effectiveness all depend on how well the users behavior can be predicted. Is seems that for the majority of the users significant improvement  in latency can be achieved.

The 3rd talk was by Willy Zwaenepoel, who used to be at Rice but is now dean of the information and communication school at EPFL. This work looked at how you can use MySQL on clustered backend machines to avoid to having to buy an expensive Oracle SMP server. The goal is to achieve strong transactional guarantees (one-copy serializability) with good performance. This goes against the seminal paper by Jim Gray who argues that this cannot be done. Jim's argument is that the # conflicts grows with the cluster size and thus the global wait time increases.

Willy's approach produces a solution for reducing the number of conflicts based on the concurrency control and the query scheduling. The solution uses a versioning scheme, assigned to each of the operations, and versions of the database tables. The scheduler is aware of the progress of each of the nodes and assigns new operations to the nodes with the fewest conflicts. The experimental numbers show that it is much better than the scheme where the Gray paper is based on and more surprisingly the performance is close to the loose consistency mechanisms from Kai Shen's USITS paper. TPC-W benchmarks are used in the experiments. Go read this paper!

Posted by Werner Vogels at June 19, 2003 11:42 AM
TrackBacks

Comments