Cache Memory is a small, fast memory that holds a fraction of the overall contents of the memory. Its mathematical model is defined by its size, number of sets, associativity, block size, sub-block size, fetch strategy, and write strategy. Any node in the cache hierarchy can contain a common cache or two separate caches for instruction and or data.
- First-Level-Cache :
This is the cache closest to the CPU and is called the L1 cache. If there is a split in this cache for data and instruction they are referred to as L1I for instruction and L1D for data.
- Second-Level Cache :
This is also called secondary cache when looking globally at the placement of caches between CPU and main memory.
- Main Memory :
This is the last level of memory. This is the last place the CPU will look for data.
- Memory Hierarchy :
When there is only one cache between CPU and main memory it is difficult to call it a multi-layer cache hierarchal structure. For it to be called so there should usually be two caches between the CPU and the main memory.Caches closer to the CPU are called upstream and or predecessor caches and caches closer to the main memory are called downstream and or successor caches.
- Block :
The unit of data for which there is an address tag is called a block. It is also called a Line. This tag tells the program which portion of the main memory is occupying the cache.
- Set :
This is a collection of blocks the tags for which are checked in parallel. If there is only one set the cache is associative. Usually what happens is that a continuous set of address bits selects which memory block can reside in it.
- Associativity :
The number of blocks in a set is called the degree of Associativity of the cache structure. If the number is one it is a direct-mapped cache.
- Sub-block :
It is a unit of datum with which a valid bit is associated. Its size is less than equal to block size.
- Fetch Size :
The maximum amount of memory that can be fetched from the next memory level is called the fetch size. It is a multiple of sub-block size and can be small or larger than block size.
- Read :
A read request to a cache is a request to present a consecutive collection of words of a predefined length at a given address. The CPU generates instruction fetch and load references both of which are read.
- Write :
It contains an address and a predefined number of sub-blocks and a mask.
- Read Miss :
It is a read request for data not contained completely contained in the cache. A miss occurs either when none of the tags in the appropriate set matches the high-order address bits of the request, or you can see that in Reas miss higher-order bits request when one or more of the requested sub-blocks in a matching block is invalid.
- Local (Read) Miss Ratio :
The number of read misses in a cache divided by the total number of read requests to that cache.
- Global (Read) Miss Ratio :
The number of read misses to that cache divided by the number of read requests generated by the CPU.
- Solo (Read) Miss Ratio :
The miss ratio of the cache in a memory hierarchy when it is the only cache in the hierarchy.
- (Local) Read Traffic Ratio :
This is the number of words fetched from the next level in the hierarchy divided by the number of words fetched from the cache. Global traffic ratio is the same numerator divided by the number of words fetched by the CPU.
- (Local) Write Traffic Ratio :
This is the ratio of the number of words written out by a cache to the number of words written out to the previous level.
- Fetch Strategy :
The basic choices are write-through and write-back, but they must be accompanied by a selection of write buffering (width and depth) and a strategy for dealing with a write miss.
- Replacement Strategy :
The most common choices are Random and Least Recently Used (LRU). For a direct-mapped cache, there is only one block per set, so there is no choice.