Explain various high-speed memories such as interleaved memories and caches.

45views

written 9.9 years ago by

teamques10 ★ 70k

The main purpose of introduction of a memory hierarchy was showing the CPU a single large memory space having a small access time. If the main memory of a computer is arranged as a collection of physically separate banks each with its own address buffer register (ABR) and data buffer register (DBR), memory access cab continue in more than one bank at a time.

Interleaved Memory:

Interleaved memory is a design made to compensate for the relatively slow speed of dynamic random-access memory (DRAM).This is done by spreading memory addresses evenly across memory banks. Thus contiguous memory reads and writes are done using each memory bank in turn, resulting in higher memory throughputs due to reduced waiting for memory banks to become ready for desired operations.
The idea of interleaved memory is shown in Figure 9 below:

As shown in Figure 9, the lower order k bits of the address are used to select the module (Memory bank) and higher order m bits give a unique memory location in the memory bank that is selected by the lower order k bits. Thus in this way consecutive memory locations are stored on different memory banks.
Whenever requests to access consecutive memory locations are being made several memory banks are kept busy at any point in time. This results in faster access to a block of data in the memory and also results in higher overall utilization of the memory system as a whole. If k bits are allotted for selecting the bank as shown in the diagram, there have to be total 2k banks. This ensures that there are no gaps of nonexistent memory locations.
Consider an example to understand the effect of memory interleaving

Cache blocks are of 8 words each.

On a read miss in the cache the DRAM is accessed. Now consider it takes 1 clock cycle to send the address to the main memory. The first word access in the main memory takes 8 cycles and subsequent accesses take 4 clock cycles per word. Calculate access time of one block (8 words from the main memory).

Case 1: No memory interleaving that is only one memory bank/module

Access time = 1 + 8 + (7x 4) + 1=38 cycles.

Case 2: Memory interleaving having 4 memory banks

Once the address arrives to the main memory, in the first 8 clock cycles the 4 memory banks have 4 words ready to be sent. They are sent one word at a time in the next 4 clock cycles. During this time the next 4 words are being accessed simultaneously in each block and sending them takes another 4 cycles (1 cycle/word)

Access time= 1+8+4+4= 17 cycles.

Thus block transfer time is reduced by more than a factor of 2.

Caches:
- Caches are high speed memories realized using SRAM technology that stores a small subset of the data in the main memory and that the CPU can access directly and with minimal delay. Data is transferred in and out of the cache as and when required by the CPU.
- Caches nowadays are included on the processor chip itself. This is mainly because when data is transferred between different chips (i.e. if cache is present as a separate chip), considerable delays are introduced.
- However since processor chip size can’t be increased beyond a certain size cache size on processor chips are usually very limited. All high performance processors have some form of (no matter how small) on chip cache memory. This cache can again be divided into separate instruction and data caches. This is called a Split cache. Having a combined cache offers greater hit rate as it offers flexibility in mapping new information in the cache. Split caches on the other hand make it possible to simultaneously access both the caches. This leads to increased performance but needs more complicated circuitry for the parallel access.
- High Performance Processors have also included multiple levels of cache. L1 cache is present on chip and L2 is implemented using SRAM technology and is external to the processor. However, L2 cache in a much smaller size can be included on chip as well thus giving multi-level on chip cache.
- If two levels of cache are used and as L1 is more close to the processor it is important to design the L1 cache in such way that promotes faster access by the processor. This is because its access time has a large effect on the clock rate of the processor. A cache even if it is on chip cannot be accessed at the same speed as the CPU registers. The cache is much bigger and complex.
- Accesses to cache can be speeded up by accessing multiple words at the same time and then transferring it one by one to the processor for execution. This technique is used in many processors.
- The second level and subsequent levels of cache are now designed to be much slower than the L1 cache but have to be big enough to have a high hit rate.

ADD COMMENT EDIT