Medium-grain parallelism, at the level of operations on entire columns,
is also available in either the dense or the sparse case. This level
of granularity accounts for essentially all of the parallel speedup in
dense factorization on current generation parallel machines, and it is
an extremely important source of parallelism for sparse factorization
as well. This parallelism is due primarily to the fact that many
operations can be computed
simultaneously by different processors. For many problems, such a level of granularity provides a good balance between communication and computation, but scaling up to very large problems and/or very large numbers of processors may necessitate that the tasks be further broken up into chunks based on a two-dimensional partitioning of the columns. One must keep in mind, however, that in the sparse case an entire column operation may require only a few floating point operations involving the sparsely populated nonzero elements in the column. For a matrix of order n having a planar graph, for example, the largest embedded dense submatrix to be factored is roughly of order , and thus a sparse problem must be extremely large before a two-dimensional partitioning becomes essential.