We begin by considering the main features of sparse Cholesky factorization that affect its performance on serial machines. Algorithm 8.1 gives a standard, column-oriented formulation in which the Cholesky factor L overwrites the initial matrix A, and only the lower triangle is accessed:
The outer loop in Algorithm 8.1 is over successive columns of A. The current column (indexed by j) is modified by a multiple of each prior column (indexed by k); we refer to such an operation as . The computation performed by the inner loop (indexed by i) is a saxpy. After all its modifications have been completed, column j is then scaled by the reciprocal of the square root of its diagonal element; we refer to this operation as . As usual, this is but one of the 3! ways of ordering the triple-nested loop that embodies the factorization.
The inner loop in Algorithm 8.1 has no effect, and thus may as well be skipped, if . For a dense matrix A, such an event is too unlikely to offer significant advantage. The fundamental difference with a sparse matrix is that is in fact very often zero, and computational efficiency demands that we recognize this situation and take advantage of it. Another way of expressing this condition is that column j of the Cholesky factor L does not depend on prior column k if , which not only provides a computational shortcut, but also suggests an additional source of parallelism that we will explore in detail later.