High Performance Fortran (HPF) [27]
permits the user to define a virtual
array of processors, align actual data structures like matrices and arrays with
this virtual array (and so with respect to each other), and then to layout the
virtual processor array on an actual machine. We describe the
layout functions **f**
offered for this last step. The range of **f** is a
rectangular array of processors
numbered from up to .
Then all **f** can be parameterized
by two integer parameters and as follows:

Suppose the matrix **A** (or virtual processor array) is .
Then choosing
yields a column of processors, each containing some number
of complete rows of **A**.
Choosing yields a row of processors. Choosing and
yields a * blocked layout*, where **A** is
broken into subblocks,
each of which resides on a single processor.
This is the simplest two-dimensional layout one could
imagine (we used it in the previous section),
and by having large subblocks stored on each processor
it makes using the BLAS on
each processor attractive.
However, for straightforward matrix algorithms that process
the matrix from left to right
(including
Gaussian elimination, * QR* decomposition, reduction to tridiagonal form,
and so on), the
leftmost processors will become idle early in the computation
and make load balance poor.