2.2 Memories



next up previous
Next: 2.3 Buses Up: 2 Basic Computer Architecture Previous: 2.1 Processors

2.2 Memories

 

Memories are characterized by their function, capacity, and response times. Operations on memories are called reads and writes, defined from the perspective of a processor or other device that uses a memory: a read transfers information from the memory to the other device, and a write transfers information into the memory. A memory that performs both reads and writes is often just called a RAM, for random access memory. The term ``random access'' means that if location M[x] is accessed at time , there are no restrictions on the address of the item accessed at time . Other types of memories commonly used in systems are read-only memory, or ROM, and programmable read-only memory, or PROM (information in a ROM is set when the chips are designed; information in a PROM can be written later, one time only, usually just before the chips are inserted into the system). For example, the Apple Macintosh, shown in Figure 1, had a PROM called the ``toolbox'' that contained code for commonly used operating system functions.

The smallest unit of information is a single bit, which can have one of two values. The capacity of an individual memory chip is often given in terms of bits. For example one might have a memory built from 64Kb (64 kilobit) chips. When discussing the capacity of an entire memory system, however, the preferred unit is a byte, which is commonly accepted to be 8 bits of information. Memory sizes in modern systems range from 4MB (megabytes) in small personal computers up to several billion bytes (gigabytes, or GB) in large high-performance systems. Note the convention that lower case b is the abbreviation for bit and upper case B is the symbol for bytes.

The performance of a memory system is defined by two different measures, the access time and the cycle time. Access time, also known as response time or latency, refers to how quickly the memory can respond to a read or write request. Several factors contribute to the access time of a memory system. The main factor is the physical organization of the memory chips used in the system. This time varies from about 80 ns in the chips used in personal computers to 10 ns or less for chips used in caches and buffers (small, fast memories used for temporary storage, described in more detail below). Other factors are harder to measure. They include the overhead involved in selecting the right chips (a complete memory system will have hundreds of individual chips), the time required to forward a request from the processor over the bus to the memory system, and the time spent waiting for the bus to finish a previous transaction before initiating the processor's request. The bottom line is that the response time for a memory system is usually much longer than the access time of the individual chips.

Memory cycle time refers to the minimum period between two successive requests. For various reasons the time separating two successive requests is not always 0, i.e a memory with a response time of 80 ns cannot satisfy a request every 80 ns. A simple, if old, example of a memory with a long cycle time relative to its access time is the magnetic core used in early mainframe computers. In order to read the value stored in memory, an electronic pulse was sent along a wire that was threaded through the core. If the core was in a given state, the pulse induced a signal on a second wire. Unfortunately the pulse also erased the information that used to be in memory, i.e. the memory had a destructive read-out. To get around this problem designers built memory systems so that each time something was read a copy was immediately written back. During this write the memory cell was unavailable for further requests, and thus the memory had a cycle time that was roughly twice as long as its access time. Some modern semiconductor memories have destructive reads, and there may be several other reasons why the cycle time for a memory is longer than the access time.

Although processors have the freedom to access items in a RAM in any order, in practice the pattern of references is not random, but in fact exhibits a structure that can be exploited to improve performance. The fact that instructions are stored sequentially in memory (recall that unless there is a branch, PC is incremented by one each time through the fetch-decode-execute cycle) is one source of regularity. What this means is that if a processor requests an instruction from location at time , there is a high probability that it will request an instruction from location in the near future at time . References to data also show a similar pattern; for example if a program updates every element in a vector inside a small loop the data references will be to v[0], v[1], ... This observation that memory references tend to cluster in small groups is known as locality of reference.

Locality of reference can be exploited in the following way. Instead of building the entire memory out of the same material, construct a hierarchy of memories, each with different capacities and access times. At the top of the hierarchy there will be a small memory, perhaps only a few KB, built from the fastest chips. The bottom of the hierarchy will be the largest but slowest memory. The processor will be connected to the top of the hierarchy, i.e. when it fetches an instruction it will send its request to the small, fast memory. If this memory contains the requested item, it will respond, and the request is satisfied. If a memory does not have an item, it forwards the request to the next lower level in the hierarchy.

The key idea is that when the lower levels of the hierarchy send a value from location to the next level up, they also send the contents of , , etc. If locality of reference holds, there is a high probability there will soon be a request for one of these other items; if there is, that request will be satisfied immediately by the upper level memory.

The following terminology is used when discussing hierarchical memories:

The performance of a hierarchical memory is defined by the effective access time, which is a function of the hit ratio and the relative access times between successive levels of the hierarchy. For example, suppose the cache access time is 10ns, main memory access time is 100ns, and the cache hit rate is 98%. Then the average time for the processor to access an item in memory is

Over a long period of time the system performs as if it had a single large memory with an 11.8ns cycle time, thus the term ``effective access time.'' With a 98% hit rate the system performs nearly as well as if the entire memory was constructed from the fast chips used to implement the cache, i.e. the average access time is 11.8ns, even though most of the memory is built using less expensive technology that has an access time of 100ns.

Although a memory hierarchy adds to the complexity of a memory system, it does not necessarily add to the latency for any particular request. There are efficient hardware algorithms for the logic that looks up addresses to see if items are present in a memory and to help implement replacement policies, and in most cases these circuits can work in parallel with other circuits so the total time spent in the fetch-decode-execute cycle is not lengthened.



next up previous
Next: 2.3 Buses Up: 2 Basic Computer Architecture Previous: 2.1 Processors



verena@csep1.phy.ornl.gov