What performance does KLAT2 really get?
It really gets over 64 GFLOPS on 32-bit ScaLAPACK. Using an “untuned” 80/64-bit version, KLAT2 gets a very respectable 22.8 GFLOPS. These aren’t theoretical numbers, they are the real thing. The theoretical we-will-never-see-that numbers are 179 and 89 GFLOPS, respectively, for 32-bit and 80/64-bit floating point. Yes, we know ScaLAPACK is only one application and not a very general one at that. We have other stuff running as well. In fact, we submitted an entry for a Gordon Bell price/performance prize based on running a complete CFD package on KLAT2. The only code in common between ScaLAPACK and the CFD package is the LAM MPI library that we modified to understand KLAT2’s FNN.