The advantages of programming in a high level language include abstraction and portability. Abstraction means programmers can describe algorithms in a ``high level'' notation that is independent of details about the machine that will execute the algorithm. Portability is a byproduct of abstraction that allows programs to be run on a wide variety of computers as long as there is a compiler that will translate them for each machine.
In most programming situations reality is close to the ideal. Compilers for many high level languages are very good at generating efficient and portable code for typical computer systems, so programmers are able to express algorithms in high level languages and expect them to run efficiently on almost any machine. There may be a few isolated places where a programmer who invests a lot of effort may be able to write a more efficient routine in assembly language (the native language of the machine), but it is hardly ever worth the effort to write an entire program in assembly language. Obviously when all or part of a program is written in assembler it is not as abstract, since assembly language is the language of the machine and not the language of the application, and it is no longer portable from one machine to another.
It is not necessary to write a lower level program in order to
compromise abstraction and portability. As a simple example,
suppose a program multiplies a value x by 2. The obvious way
to write this in C is ``2*x''. But if a programmer knows the
machine that will execute this program has a slow multiplication
instruction, and knows that integers are represented in binary,
she can use the expression ``x << 1''
instead.
The resulting program is less abstract than
the original, since one
of its operations is defined in machine level terms (shifting a
pattern of bits) instead of mathematical terms (multiplication).
It is less portable, since it now runs only on machines that use
binary to represent integers (a pretty safe bet) and is efficient
only on machines that shift bits in a single operation.
There are situations where programmers need to use knowledge of the underlying computer system in order to optimize programs written in high level languages, and computational scientists will often find themselves in these situations. If a program runs for days, or even weeks, an optimization that improves execution by just a few percent can save many hours, which translates into real savings if the program runs at a supercomputer center where the scientist pays for CPU time. Another factor is that high performance computers used by computational scientists are much more complicated than other machines, and a compiler may not be able to translate a program efficiently without a little help from the programmer. A common situation is a loop written in Fortran, which, if written carefully, can be translated into a single instruction for a vector processor. Computational scientists often use the newest machines, and these are the machines most likely to have immature compilers. It takes several years experience with real programs for compiler writers to learn how to develop optimizations that will more fully exploit the capabilities of the underlying machine, and in many cases the theory behind the optimizations has yet to be worked out. For example, techniques for optimal mapping of independent pieces of a parallel program so they can be executed simultaneously on different nodes in a parallel processor is an active area of research in computer science. Programmers who use parallel processors often need to allocate tasks themselves using system- dependent library routines to send information from one task to another. Knowing how processors are interconnected will have an impact on how efficiently messages are passed.
Computer scientists used three related terms to describe the general area of low-level machine organization. Computer architecture is the study of the components that make up computer systems and how they are interconnected. Computer organization is concerned with the implementation of a computer architecture. As an example of the difference between architecture and implementation, consider the vector supercomputers from Cray Research. These machines have very similar architectures from a programmer's point of view: processors in these systems have the same number of internal registers (temporary storage) for both vector and scalar data, they have the same basic instruction set, and operands in main memory have the same formats. The systems have very different organizations, however, since they may have a different number of processors, memory sizes may vary, operands are transferred from memory to the processor in different ways, and the time to execute an instruction varies from one system to another. Computer engineering refers to the actual construction of a system: lengths of wires, sizes of circuits, cooling and electrical requirements, etc. Programmers often use knowledge of a system's architecture, and sometimes organization, to optimize performance of their programs, but rarely, if ever, are they concerned with engineering aspects.
The goal of this chapter is to introduce the basic concepts of computer architecture and organization in order to allow computational scientists to recognize when programs are not as efficient as they could be and to transform them so they make better use of the underlying machine. We cannot hope to present a comprehensive collection of performance tips for languages and systems likely to be used in computational science. Rather we aim to give enough background information on common structures such as vector processors and cache memories so you will be able to (a) recognize when your program is not performing near the capacity of your system, (b) understand performance improvement techniques recommended by the compiler writers and/or system architects of your system, and (c) decide whether the benefits of increased performance are worth sacrificing abstraction and portability.
Another, closely related, goal is to provide the necessary
background in computer architecture to evaluate competing
algorithms to decide which is likely to be the most efficient for
a given machine, even before they are expressed in a programming
language. Performance will depend on several factors, for example
how data is laid out in memory and the patterns in which it is
accessed. In many cases an algorithm with worse asymptotic
complexity may turn out to be the best for a certain class of
machines because it requires fewer processors or less memory
(see Numerical Linear Algebra
).
The chapter begins with an overview of basic computer architecture. It describes the main components of a typical system and how they interact. Section 3, is on the architecture and implementation of modern high performance systems, including vector processors and parallel processors; readers who are already familiar with basic computer architecture may want to skip directly to this section. In both sections we will describe the basic concepts, introduce performance models, and discuss factors that limit efficient use of the machines. Programming issues, for example organizing loops so they can be executed more efficiently by a vector processor or accessing data structures in ways that are most efficient for certain memory systems, are discussed in the chapter on programming languages