Open In App

Factors affecting Cache Memory Performance

Computers are made of three primary blocs. A CPU, a memory, and an I/O system. The performance of a computer system is very much dependent on the speed with which the CPU can fetch instructions from the memory and write to the same memory. Computers are using cache memory to bridge the gap between the processor’s ability to execute instructions and the time it takes to fetch operations from main memory.

Time taken by a program to execute with a cache depends on



While Engineering any product or feature the generic structure of the device remains the same what changes the specific part of the device which needs to be optimized because of client requirements. How does an engineer go about improving the design? Simple we start by making a mathematical model connecting the inputs to the outputs.

Execution Time = Instruction Count x Cycles per Instruction x Cycle Time
=Instruction Count x (CPU Cycles per Instr. + Memory Cycles per Instr.) x Cycle Time
=Instruction Count x [CPU Cycles per Instr. +(References per Instr. x Cycles per References)] x Cycle Time



These four boxes represent four major pain points that can be addressed to have a significant performance change either positive or negative on the machine. The first element of the equation the number of instructions needed to perform a function is dependent on the instruction set architecture and is the same across all implementations. It is also dependent on the compiler’s design to produce efficient code. Optimizing compilers to execute functions with fewer executed instructions is desired.

CPU cycles per instructions are also dependent on compiler optimizations as the compiler can be made to choose instructions that are less CPU intensive and have a shorter path length. Pipelining instructions efficiently also improve this parameter which makes instructions maximize hardware resource optimization.

The average number of memory references per instruction and the average number of cycles per memory reference combine to form the average number of cycles per instruction. The former is a function of architecture and instruction selection algorithms of the compiler. This is constant across implementations of the architecture.  

Instruction Set Architecture :

Compiler Technology :

CPU Implementation :

The micro-architecture is dependent upon the design philosophy and methodology of the Engineers involved in the process. Take a simple example of making a circuit to take input from a common jack passing it through an amplifier then storing the data in a buffer. 

Two approaches can be taken to solve the problem which is either putting a buffer in the beginning and putting two amplifiers and bypassing the current through either which would make sense if two different types of signals are supposed to be amplified or if there is a slight difference in the saturation region of the amplifiers. Or we could make a common current path and introduce a temporal dependence upon the buffer in which data is stored thereby eliminating the need for buffers altogether.

Minute differences like these in the VLSI microarchitecture of the processor create massive timing differences in the same Instruction Set Implementations by two different companies.

Cache and Memory Hierarchy :

This is again dependent upon the use case for which the system was built. Using a general-purpose computer also called a Personal Computer which can perform a wide variety of mathematical calculations and produce wide results but reasonably accurate for non-real-time systems in a hard real-time system will be very unwise.

A very big difference will be the time taken to access data in the cache.

A simple experiment may be run on your computer whereby you may find the cache size of your particular model of processor and try to access elements of an array around that array a massive speed down will be observed while trying to access an array greater than the cache size.

Article Tags :