next up previous

8.4 Parallelism in Sparse Factorization

We now examine in greater detail the opportunities for parallelism in sparse Cholesky factorization and various algorithms for exploiting it. One of the most important issues in designing any parallel algorithm is selecting an appropriate level of granularity, by which we mean the size of the computational subtasks that are assigned to individual processors. The optimal choice of task size depends on the tradeoff between communication costs and the load balance across processors. We follow Liu [61] in identifying three potential levels of granularity in a parallel implementation of Cholesky factorization:

(1)
fine-grain, in which each task consists of only one or two floating point operations, such as a multiply-add pair,
(2)
medium-grain, in which each task is a single column operation, such as or ,
(3)
large-grain, in which each task is the computation of an entire group of columns in a subtree of the elimination tree.

Fine-grain parallelism, at the level of individual floating point operations, is available in either the dense or sparse case. It can be exploited effectively by a vector processing unit or a systolic array, but would incur far too much communication overhead to be exploited profitably on most current generation parallel computers. In particular, the communication latency of these machines is too great for such frequent communication of small messages to be feasible.