Fundamental physical phenomena, such as thermal generation/dissipation properties and electronic signal speeds, place theoretical and practical limits on the computation speeds of single processor systems. Though these limits are currently roughly in the ``gigaflop" (a billion numerical operations per second) range, some contemporary applications of computational science require substantially greater speeds, as will most applications on the scale of grand challenge problems. It is becoming more feasible to scale up computational capacity by adding processors than by increasing single processor speed. It is likely that all future applications involving massive amounts of computation will make significant use of parallelism.
Applications may employ (either or both of) two principal forms of parallelism, which here will be termed ``data parallelism" and ``process parallelism". Data parallelism involves performing a similar computation on many data objects simultaneously. The prototypical such situation, especially for computational science applications, is simultaneous operations on all the elements of an array-for example, dividing each element of the array by a given value (e.g., normalizing the pivot row in matrix reduction).