Thus in both cases the parallel version involves more scalar operations than does the sequential version, but the number of parallel operations is astoundingly low in comparison. The effective cost of a parallel operation, in terms of a scalar operation, currently varies widely from system to system, but the trend appears to be (and certainly this is not inconsistent with theoretical possibility and the inexorable march of technology) asymtotic toward scalar operation costs. Viewed in these terms, the data-parallel version of Gaussian elimination is indeed attractive.

Finally, a word on the Fortran 90 intrinsic function SPREAD, used in the primary reduction operation in both Simple_Gauss and Pivot_Gauss. SPREAD replicates (spreads) a scalar into a one-dimensional array, or replicates an n-dimensional array into an n+1-dimensional array. The scalar-to-one- dimensional array form is that used here, and is just what the doctor ordered to convert the scalar operation G(i,j)=G(i,L)*G(L,j) into a whole-array operation on G. L is ``constant" in this expression, in the loops over i and j, and thus must be ``spread" in these places to fill out the array for the whole-array operation. Understanding this is key to, and the most difficult part of, assimilating a good feel for the data-parallel versions of this algorithm. SPREAD has three arguments: the first is the scalar or array to be spread, the second is the dimension over which the spreading occurs (and must be one for spreading a scalar), and the third is the number of replications (N or N+1 is these cases).

Retrieve the Fortran 90 codes for Simp_Gauss and Pivot_Gauss and a sample transcript of a compilation: gauss90.f90, gauss90.compilation.