Good micro-benchmarks are useful for isolating particular issues and providing a means for comparing that features on different platforms. They rarely serve as a good predictor of how applications using the feature will perform, as the complexity of code dynamics in general drives the performance. This is also the case in high-performance computing, even though the low-level benchmarks tell something about integer arithmetic or the standard Math library, it does not predict what happens when all of these are put together.

In our benchmark we have a collection of about 15 application 'kernels' that make up the most common HPC application strategies. We have converted these to C# from other languages and are currently validating the correctness. The first converted benchmark suite that has passed the validation is SciMark, which consists of five kernels:

*Fast Fourier Transform*(FFT) performs a one-dimensional forward transform of 4K complex numbers. This kernel exercises complex arithmetic, shuffling, non-constant memory references and trigonometric functions. The first section performs the bit-reversal portion (no flops) and the second performs the actual*Nlog(N)*computational steps.*Jacobi Successive Over-Relaxation*(SOR) on a*NxN*grid exercises typical access patterns in finite difference applications; for example, solving Laplace's equation in 2D with Drichlet boundary conditions. The algorithm exercises basic "grid averaging" memory patterns, where each*A(i,j)*is assigned an average weighting of its four nearest neighbors.*Monte Carlo*A financial simulation using Monte Carlo techniques to price products derived from the price of an underlying asset. The integration approximates the value of Pi by computing the integral of the quarter circle*y = sqrt(1 - x^2)*on*[0,1]*. The algorithm exercises random-number generators, synchronized function calls, and function inlining.*Sparse Matrix Multiply*This benchmark uses an unstructured sparse matrix stored in compressed-row format with a prescribed sparsity structure. This kernel exercises indirection addressing and non-regular memory references.*LUFact*Computes the LU factorization of a dense NxN matrix using partial pivoting. Exercises linear algebra kernels (BLAS) and dense matrix operations. The algorithm is the right-looking version of LU with rank-1 updates. Also known as the LINPACK benchmark.

SciMark has become interesting over the years because it also generates a single number from these five kernels which can be used as an HPC performance predictor for a particular platform. It is run with two different memory models (small and large.)

The graphs below are self-explanatory and answer one of our research questions: the commercial CLI implementation, the Microsoft .NET CLR 1.1, is competitive with the best JVMs when executing complex HPC benchmarks.

The other parts in this series of postings handle basic arithmetic & JIT code generation, loop overhead and exception handling, and the performance of the standard Math library.
Posted by Werner Vogels at May 9, 2003 10:34 AM

TrackBacks

Hello there,

The results of tests you provide are very doubtful, in my opinion, concerning sscli. I do not believe that there is fundamental difference in commercial .net runtime and sscli. Probably you ran tests on checked or fastchecked, but not release, build.

P.S. It would be much more convenient if you would published you pages on standard http ports. It is not always possible to access port 9000 (e.g. this port is closed on my job place).

I've been running the SciMark tests on JVMs. The recently introduced JDK 1.4.2 beta trumps even IBM's JDK 1.3 by over 28%.

http://www.freeroller.net/page/ceperez/20030521#jdk_1_4_2_beta

Contact info

Seattle, WA, 98104

weblog [at] vogels.net

History

These are the historical pages from the All Things Distributed weblog as they were published at weblogs.cs.cornell.edu. For up to date information, new entries, atom/rss feeds, and more, visit the www.allthingsdistributed.com.