next up previous

3 High Performance Computer Architecture

As described in Section 3.6 the performance of a computer system is defined by three factors. The time to execute a program is a function of the number of instructions to execute, the average number of clock cycles required per instruction, and the clock cycle time:

Lowering the clock cycle time is mostly a matter of engineering, through the use of more advanced materials or production techniques that allow the construction of smaller (and thus faster and more efficient) circuits. In this section we will survey several techniques for designing architectures that improve the other two factors.

The common thread that runs through all these techniques is parallelism, which is achieved by replicating basic components in the system. For example, an architect may use four adder/multiplier units instead of one inside the CPU, or connect two or more memories to the CPU in order to increase bandwidth, or connect two or more processors to one memory in order to increase the number of instructions executed per unit time, or even replicate the entire computer (processor, memory, and I/O connections) in a network of machines that all work together on the same program. Parallelism has existed in the minds of computer architects from the time of Charles Babbage in the early 19th century, and has been manifested in a large number of machines in a variety of ways that might be classified in distinct levels [13]: