Medium-grain parallelism, at the level of operations on entire columns,
is also available in either the dense or the sparse case. This level
of granularity accounts for essentially all of the parallel speedup in
dense factorization on current generation parallel machines, and it is
an extremely important source of parallelism for sparse factorization
as well. This parallelism is due primarily to the fact that many
operations can be computed

simultaneously by different
processors. For many problems, such a level of granularity provides a
good balance between communication and computation, but scaling up to
very large problems and/or very large numbers of processors may
necessitate that the tasks be further broken up into chunks based on a
two-dimensional partitioning of the columns. One must keep in mind,
however, that in the sparse case an entire column operation may require
only a few floating point operations involving the sparsely populated
nonzero elements in the column. For a matrix of order **n** having a
planar graph, for example, the largest embedded dense submatrix to be
factored is roughly of order , and thus a sparse problem must
be extremely large before a two-dimensional partitioning becomes
essential.