In an interleaved memory, the memory is divided into a set of banks. An interleaved memory with banks is said to be -way interleaved. One way of allocating virtual addresses to memory modules is to divide the memory space (the set of all possible addresses a processor can generate) into contiguous blocks. If there are banks, memory location would reside in bank number (ignoring remainders). In an interleaved memory, however, consecutive addresses reside in different banks: memory location is in bank number . For example, suppose there are 4 banks, each containing 256 bytes. The block-oriented scheme would assign virtual addresses to the first bank, to the second bank, and so on. The interleaved scheme would assign addresses 0, 4, 8, to the first bank, 1, 5, 9, to the second bank, etc. (Figure 6).
However the memory space is split up among the banks, as long as requests are sent to two different banks they can be handled simultaneously. The processor can request a transfer from location on one cycle, and on the next cycle request information from location . If and are in different banks, the information will be returned on successive cycles. Note that the latency of the request, i.e. the number of cycles a processor has to wait before receiving the contents of location , is not affected. However the bandwidth is improved; if there are enough banks the memory system can potentially send information at a rate of one word per processor cycle, regardless of what the memory cycle time is.
The decision to allocate addresses as contiguous blocks or in interleaved fashion depends on how one expects information to be accessed. Programs are compiled so instructions reside in successive addresses, so there is a high probability that after a processor executes the instruction at location it will execute the instruction at (Section 2.2). Compilers can also allocate vector elements to successive addresses, so operations on entire vectors can take advantage of interleaving. For these reasons, vector processors universally have some form of interleaved memory. However, shared memory multiprocessors use the block-oriented scheme since memory referencing patterns in an MIMD system are quite different. There the goal is to connect a processor to a single memory and use as much information as possible from that memory before switching to another memory.
Systems often provide some flexibility in fetching vector elements. In some systems it is possible to load every element, for example when fetching elements of a vector v that is stored in consecutive memory cells with the memory would return , , The interval between elements is known as the stride. One interesting use of this feature is in accessing matrices. If the stride is set to one more than the number of rows, a single memory request will return the diagonal elements (assuming column major layout and the columns are stored contiguously). Using a stride may cancel any benefits of interleaving if programmers are not careful. In an extreme case, setting the stride to the degree of interleaving means every item is fetched from the same bank and the time between successive elements will be the memory cycle time.