To understand the speedups of LAPACK over LINPACK in Table 1, and in general to be able to understand which algorithms are fast and which are slow, we need to understand memory hierarchies. Memory hierarchies were discussed in some detail in the chapter on Computer Architecture , and we just review them briefly here, emphasizing their importance to algorithm design.
Computer memories are built as hierarchies, with a series of different kinds of memories ranging from very fast, expensive, and therefore small memory at the top of the hierarchy, down to slow, cheap and very large memory at the bottom. For example, registers typically form the fastest memory, then cache, main memory, disks, and finally tape as the slowest, largest and cheapest. Useful floating point operations can only be done on data at the top of the hierarchy, in the registers. Since an entire large matrix cannot fit in the registers, it must be moved up and down through the hierarchy, transferred up to the registers when work needs to be done, and transferred back down to the main memory (or disk or tape) when it is no longer needed. It takes time to move between levels in the memory hierarchy, and moving is slower the farther down in the hierarchy one goes. Indeed, one such data movement takes far longer than performing a floating point operation. So the limiting factor for many algorithms is not the time needed for floating point operations, but the time to move data in the memory hierarchy. Therefore, a clever algorithm will attempt to minimize the number of these memory moves (even if it might mean doing a few more floating point operations).