Cache Memory Performance
Types of Caches :
- L1 Cache : Cache built in the CPU itself is known as L1 or Level 1 cache. This type of cache holds most recent data so when, the data is required again so the microprocessor inspects this cache first so it does not need to go through main memory or Level 2 cache. The main significance behind above concept is “Locality of reference”, according to which a location just accessed by the CPU has a higher probability of being required again.
- L2 Cache : This type of cache resides on a separate chip next to the CPU also known as Level 2 Cache. This cache stores recent used data that cannot be found in the L1 Cache. Some CPU’s has both L1 and L2 Cache built-in and designate the separate cache chip as level 3 (L3) Cache.
Cache that is built into the CPU is faster than separate cache. Separate cache is faster than RAM. Built-in Cache runs as a speed of a microprocessor.
- Disk Cache : It contains most recent read in data from the hard disk and this cache is much slower than RAM.
- Instruction Cache Vs Data Cache : Instruction or I-cache stores instructions only while Data or D-cache stores only data. Distinguishing the stored data by this method recognizes the different access behavior pattern of instructions and data. For example : The programs need to involve few write accesses, and they often exhibit more temporal and spatial locality than the data they process.
- Unified Cache Vs Split Cache : A cache that stores both instructions and data is referred to as a unified cache. A split cache on other hand, consist of two associated but largely independent units – An I-cache and D-cache. This type of cache can also be designed to deal with two independent units differently.
The performance of the cache memory is measured in terms of a quantity called Hit Ratio. When the CPU refers to the memory and reveals the word in the cache, it’s far stated that a hit has successfully occurred. If the word is not discovered in the cache, then the CPU refers to the main memory for the favored word and it is referred to as a miss to cache.
- Hit Ratio (h) :
Hit Ratio (h) = Number of Hits / Total CPU references to memory = Number of hits / ( Number of Hits + Number of Misses )
The Hit ratio is nothing but a probability of getting hits out of some number of memory references made by the CPU. So its range is 0 <= h <= 1.
- Average Access Time ( tavg ) :
tavg = h X tc + ( 1- h ) X ( tc + tm ) = tc + ( 1- h ) X tm
Let tc, h and tm denote the cache access time, hit ratio in cache and and main access time respectively.
Average memory access time = Hit Time + Miss Rate X Miss Penalty
Miss Rate : It can be defined as he fraction of accesses that are not in the cache (i.e. (1-h)).
Miss Penalty : It can be defined as the addition clock cycles to service the miss, the extra time needed to carry the favored information into cache from main memory in case of miss in cache.
Types of Cache misses :
- Compulsory Miss (Cold start Misses or First reference Misses) : This type of miss occurs when the first access to a block happens. In this type of miss, the block must be brought into the cache.
- Capacity Miss : This type of miss occurs when a program working set is much bigger than the cache storage capacity. Blocks need to be discarded as keeping all blocks is not possible during program execution.
- Conflict Miss (Collision Misses or Interference Misses) : This miss is found majorly in the case of set associative or direct mapped block placement strategies, conflict miss occur when several blocks are mapped to the same set or block frame.
- Coherence Miss (Invalidation) : It occurs when other external processors ( e.g. I/O ) updates memory.
CPU Performance :
CPU time divide into clock cycles that spends for executing packages/programs, and clock cycles that spend for waiting for memory system. Cache hits are part of regular CPU cycle.
CPU time = ( CPU execution clock cycles + memory stall clock cycles ) X Clock Cycle time
1. Memory Stall Clock cycles ( for write-back cache ) :
- Memory Stall Clock-cycles = Read Stall-cycles + Write Stall-cycles
- Read-Write Cycle = ( Read/Programs ) X Read miss rate X read miss penalty
- Write-Stall Cycle = ( Write/Programs ) X Write miss rate X Write miss penalty + Write Buffer Stalls
2. Memory Stall Clock cycles ( for write-through cache ) :
- Assume write buffer stalls are negligible. Every access (read/write) treated similar.
- Memory Stall Clock-cycles = ( Memory Access/Program ) X Miss Rate X Miss Penalties
- Memory Stall Clock-cycles = (Instructions/Program ) X ( Misses/Instructions ) X Miss Penalties
Measuring and Improving Cache Performance :
1. Technique used to minimize the average memory access time :
- Reducing hit time, miss penalty or miss rate.
- Reducing Miss penalty X Miss rate.
2. Techniques for reducing Hit time :
- Small and Simple cache.
- Trace caches and pipelined cache access
- Avoid time loss in address translation.
3. Techniques for reducing Miss Penalty :
- Usage of Multi-level cache.
- Giving priority to read misses over write.
- Victim Caches
4. Techniques for reducing Miss Rate :
- Increased Block size
- Higher Associativity.
- Compiler optimization
- Large Cache.
5. Techniques for reducing ( Miss Rate X Miss Penalty ) :
- Non- blocking cache
- Hardware pre-fetching
- Compiler controlled pre-fetching