next up previous

2.7 Performance Models     continued...

A second reason clock rate by itself is an inadequate measure of performance is that it doesn't take into account what happens during a clock cycle. This is especially true when comparing systems with different instruction sets. It is possible that a machine might have a lower clock rate, but because it requires fewer cycles to execute the same program it would have higher performance. For example, consider two machines, A and B, that are almost identical except that A has a multiply instruction and B does not. A simple loop that multiplies a vector by a scalar (the constant 3 in this example) is shown in the table below. The number of cycles for each instruction is given in parentheses next to the instruction.

Table 3: View.

The first instruction loads an element of the vector into an internal processor register X. Next, machine A multiplies the vector element by 3, leaving the result in the register. Machine B does the same operation by shifting and adding, i.e. 3x = 2x + x. B copies the contents of X to another register Y, shifts X left one bit (which multiplies it by 2), and then adds Y, again leaving the result in X. Both machines then store the result back into the vector in memory and branch back to the top of the loop if the vector index is not at the end of the vector (the comparison and branch are done by the dbr instruction). Machine A might be slightly slower than B, but since it takes fewer cycles it will execute the loop faster. For example if A's cycle time is 9 MHz (.11s per cycle) and B's cycle time is 10 MHz (.10s per cycle) A will execute one pass through the loop in 1.1s but B will require 1.2s.