next up previous

2.7 Performance Models     continued...

The term ``theoretical peak MFLOPS'' refers to how many operations per second would be possible if the machine did nothing but numerical operations. It is obtained by calculating the time it takes to perform one operation and then computing how many of them could be done in one second. For example, if it takes 8 cycles to do one floating point multiplication, the cycle time on the machine is 20 nanoseconds, and arithmetic operations are not overlapped with one another, it takes 160ns for one multiplication, and

so the theoretical peak performance is 6.25 MFLOPS. Of course, programs are not just long sequences of multiply and add instructions, so a machine rarely comes close to this level of performance on any real program. Most machines will achieve less than 10% of their peak rating, but vector processors or other machines with internal pipelines that have an effective CPI near 1.0 can often achieve 70% or more of their theoretical peak on small programs.

Using metrics such as CPI, MIPS, or MFLOPS to compare machines depends heavily on the programs used to measure execution times. A benchmark is a program written specifically for this purpose. There are several well-known collections of benchmarks. One that is be particularly interesting to computational scientists is LINPACK, which contains a set of linear algebra routines written in Fortran. MFLOPS ratings based on LINPACK performance are published regularly [8]. Two collections of a wider range of programs are SPEC (System Performance Evaluation Cooperative) and the Perfect Club, which is oriented toward parallel processing. Both include widely used programs such as a C compiler and a text formatter, not just small special purpose subroutines, and are useful for comparing systems such as high performance workstations that will be used for other jobs in addition to computational science modelling.