To illustrate realistic uses of data parallelism, this example presents two forms of the classic Gauss elimination algorithm for solving systems of linear equations. This particular example is chosen because of the near-universal familiarity with Gaussian elimination, so that maximum attention can be paid to the data parallel techniques with a minimum of distraction from becoming familiar with the problem. One form of the example, called Simple_Gauss, marches the pivot down the main diagonal of the matrix (called Grid or G in the code below); the other form, called Pivot_Gauss, implements the more complicated but more robust maximum pivot strategy for Gaussian elimination.
Both Simple_Gauss and Pivot_Gauss have two versions - a scalar sequential version and a data-parallel version. As presented, the data-parallel version would compile and run; the sequential version is commented out with ``!!" at the beginning of these lines. If these comment characters are removed, and the lines ending with ``!!!!" comments are commented out, the sequential version would compile and run.
The sequential versions contain no array operations (except for the initialization of G) and are characterized by the familiar scalar do-loops over the matrix. The data-parallel replacements for these loops immediately follow so that the sequential and parallel versions can be conveniently compared. Some liberty (though not much) has been taken with the presentation of these examples in an attempt to make these comparisons easy and most useful. Also in order to facilitate these comparisons, the entire matrix is reduced for each pivot, rather than just those columns needing reduction, so that each version of each algorithm involves about twice as many (scalar element) operations as are really necessary; with some loss of clarity the algorithms can easily be adjusted to limit the number of operations accordingly.
An analysis of these algorithms shows that the sequential version of Simple_Gauss has about scalar operations, and the parallel version has about scalar operations in 10 parallel operations. The sequential version of Pivot_Gauss has about scalar operations, and the parallel version has about scalar operations in parallel operations. For reasonably large values of N, these results are summarized in Table 6. The last column of the table (idealistically) assumes that a data-parallel operation takes the same time as a scalar operation.
Thus in both cases the parallel version involves more scalar operations than does the sequential version, but the number of parallel operations is astoundingly low in comparison. The effective cost of a parallel operation, in terms of a scalar operation, currently varies widely from system to system, but the trend appears to be (and certainly this is not inconsistent with theoretical possibility and the inexorable march of technology) asymtotic toward scalar operation costs. Viewed in these terms, the data-parallel version of Gaussian elimination is indeed attractive.
Finally, a word on the Fortran 90 intrinsic function SPREAD, used in the primary reduction operation in both Simple_Gauss and Pivot_Gauss. SPREAD replicates (spreads) a scalar into a one-dimensional array, or replicates an n-dimensional array into an n+1-dimensional array. The scalar-to-one- dimensional array form is that used here, and is just what the doctor ordered to convert the scalar operation G(i,j)=G(i,L)*G(L,j) into a whole-array operation on G. L is ``constant" in this expression, in the loops over i and j, and thus must be ``spread" in these places to fill out the array for the whole-array operation. Understanding this is key to, and the most difficult part of, assimilating a good feel for the data-parallel versions of this algorithm. SPREAD has three arguments: the first is the scalar or array to be spread, the second is the dimension over which the spreading occurs (and must be one for spreading a scalar), and the third is the number of replications (N or N+1 is these cases).
Retrieve the Fortran 90 codes for Simp_Gauss and Pivot_Gauss and a sample transcript of a compilation: gauss90.f90, gauss90.compilation.