next up previous

8.4 Parallelism in Sparse Factorization     continued...

We see from this discussion that the elimination tree serves to characterize the parallelism that is unique to sparse factorization. In particular, the height of the elimination tree gives a rough measure of the parallel computation time, and the width of the elimination tree gives a rough measure of the degree or multiplicity of large-grain parallelism. These measures are only very rough, however, since the medium level parallelism also plays a major role in determining overall performance. Still, we can see that short, bushy elimination trees are more advantageous then tall, slender ones in terms of the large-grain parallelism available. And just as the fill in the Cholesky factor is very sensitive to the ordering of the matrix, so is the structure of the elimination tree. This suggests that we should choose an ordering to enhance parallelism, and indeed this is possible (see, e.g., [62,63,64]) but such an objective may conflict to some degree with preservation of sparsity. Roughly speaking, sparsity and parallelism are largely compatible, since the large-grain parallelism is due to sparsity in the first place. However, these two criteria are by no means coincident, as we will see by example below.

We now illustrate these concepts using a series of simple examples. Figure 6 shows a small one-dimensional mesh with a `natural' ordering of the nodes, the nonzero patterns of the corresponding tridiagonal matrix A and its Cholesky factor L, and the resulting elimination tree . On the positive side, the Cholesky factor suffers no fill at all and the total work required for the factorization is minimal. However, we see that the elimination tree is simply a chain, and therefore there is no large-grain parallelism available. Each column of L depends on the immediately preceding one, and thus they must be computed sequentially. This behavior is typical of orderings that minimize the bandwidth of a sparse matrix: they tend to inhibit rather than enhance large-grain parallelism in the factorization. (As previously discussed in Section 7, there is in fact little parallelism of any kind to be exploited in solving a tridiagonal system in this natural order. The operations involve only a couple of flops each, so that even the `medium-grain' tasks are actually rather small in this case.)

Figure 6: One-dimensional grid and corresponding tridiagonal matrix (left), with Cholesky factor and elimination tree (right).