next up previous

5 Data Layouts on Distributed Memory Machines     continued...

High Performance Fortran (HPF) [27] permits the user to define a virtual array of processors, align actual data structures like matrices and arrays with this virtual array (and so with respect to each other), and then to layout the virtual processor array on an actual machine. We describe the layout functions f offered for this last step. The range of f is a rectangular array of processors numbered from up to . Then all f can be parameterized by two integer parameters and as follows:

Suppose the matrix A (or virtual processor array) is . Then choosing yields a column of processors, each containing some number of complete rows of A. Choosing yields a row of processors. Choosing and yields a blocked layout, where A is broken into subblocks, each of which resides on a single processor. This is the simplest two-dimensional layout one could imagine (we used it in the previous section), and by having large subblocks stored on each processor it makes using the BLAS on each processor attractive. However, for straightforward matrix algorithms that process the matrix from left to right (including Gaussian elimination, QR decomposition, reduction to tridiagonal form, and so on), the leftmost processors will become idle early in the computation and make load balance poor.