These are the old pages from the weblog as they were published at Cornell. Visit www.allthingsdistributed.com for up-to-date entries.

May 09, 2003

Comparing CLR, Mono, SSCLI and JAVA Performance- Part IV

Good micro-benchmarks are useful for isolating particular issues and providing a means for comparing that features on different platforms. They rarely serve as a good predictor of how applications using the feature will perform, as the complexity of code dynamics in general drives the performance. This is also the case in high-performance computing, even though the low-level benchmarks tell something about integer arithmetic or the standard Math library, it does not predict what happens when all of these are put together.

In our benchmark we have a collection of about 15 application 'kernels' that make up the most common HPC application strategies. We have converted these to C# from other languages and are currently validating the correctness. The first converted benchmark suite that has passed the validation is SciMark, which consists of five kernels:

  • Fast Fourier Transform (FFT) performs a one-dimensional forward transform of 4K complex numbers. This kernel exercises complex arithmetic, shuffling, non-constant memory references and trigonometric functions. The first section performs the bit-reversal portion (no flops) and the second performs the actual Nlog(N) computational steps.
  • Jacobi Successive Over-Relaxation (SOR) on a NxN grid exercises typical access patterns in finite difference applications; for example, solving Laplace's equation in 2D with Drichlet boundary conditions. The algorithm exercises basic "grid averaging" memory patterns, where each A(i,j) is assigned an average weighting of its four nearest neighbors.
  • Monte Carlo A financial simulation using Monte Carlo techniques to price products derived from the price of an underlying asset. The integration approximates the value of Pi by computing the integral of the quarter circle y = sqrt(1 - x^2) on [0,1]. The algorithm exercises random-number generators, synchronized function calls, and function inlining.
  • Sparse Matrix Multiply This benchmark uses an unstructured sparse matrix stored in compressed-row format with a prescribed sparsity structure. This kernel exercises indirection addressing and non-regular memory references.
  • LUFact Computes the LU factorization of a dense NxN matrix using partial pivoting. Exercises linear algebra kernels (BLAS) and dense matrix operations. The algorithm is the right-looking version of LU with rank-1 updates. Also known as the LINPACK benchmark.

SciMark has become interesting over the years because it also generates a single number from these five kernels which can be used as an HPC performance predictor for a particular platform. It is run with two different memory models (small and large.)

The graphs below are self-explanatory and answer one of our research questions: the commercial CLI implementation, the Microsoft .NET CLR 1.1, is competitive with the best JVMs when executing complex HPC benchmarks.




The other parts in this series of postings handle basic arithmetic & JIT code generation, loop overhead and exception handling, and the performance of the standard Math library. Posted by Werner Vogels at May 9, 2003 10:34 AM

TrackBacks

Comments

Hello there,

The results of tests you provide are very doubtful, in my opinion, concerning sscli. I do not believe that there is fundamental difference in commercial .net runtime and sscli. Probably you ran tests on checked or fastchecked, but not release, build.

P.S. It would be much more convenient if you would published you pages on standard http ports. It is not always possible to access port 9000 (e.g. this port is closed on my job place).

Posted by: Vladimir Nesterovsky on May 13, 2003 12:49 AM

I've been running the SciMark tests on JVMs. The recently introduced JDK 1.4.2 beta trumps even IBM's JDK 1.3 by over 28%.

http://www.freeroller.net/page/ceperez/20030521#jdk_1_4_2_beta

Posted by: Carlos E. Perez on May 30, 2003 12:12 PM