We begin by considering the main features of sparse Cholesky factorization that affect its performance on serial machines. Algorithm 8.1 gives a standard, column-oriented formulation in which the Cholesky factor L overwrites the initial matrix A, and only the lower triangle is accessed:
The outer loop in Algorithm 8.1
is over successive columns of A. The
current column (indexed by j) is modified by a multiple of each
prior column (indexed by k); we refer to such an operation as
. The computation performed by the inner loop (indexed by
i) is a saxpy. After all its modifications have been
completed, column j is then scaled by the reciprocal of the square
root of its diagonal element; we refer to this operation as
.
As usual, this is but one of the 3! ways of ordering the triple-nested
loop that embodies the factorization.
The inner loop in Algorithm 8.1
has no effect, and thus may as well be
skipped, if . For a dense matrix A, such an event is too
unlikely to offer significant advantage. The fundamental difference
with a sparse matrix is that
is in fact very often zero, and
computational efficiency demands that we recognize this situation and
take advantage of it. Another way of expressing this condition is
that column j of the Cholesky factor L does not depend on prior
column k if
, which not only provides a computational
shortcut, but also suggests an additional source of parallelism that
we will explore in detail later.