Factors affecting Cache Memory Performance

Computers are made of three primary blocs. A CPU, a memory, and an I/O system. The performance of a computer system is very much dependent on the speed with which the CPU can fetch instructions from the memory and write to the same memory. Computers are using cache memory to bridge the gap between the processor’s ability to execute instructions and the time it takes to fetch operations from main memory.

Time taken by a program to execute with a cache depends on

The number of instructions needed to perform the task.
The average number of CPU cycles needed to perform the desired task.
The CPU’s cycle time.

While Engineering any product or feature the generic structure of the device remains the same what changes the specific part of the device which needs to be optimized because of client requirements. How does an engineer go about improving the design? Simple we start by making a mathematical model connecting the inputs to the outputs.

Execution Time = Instruction Count x Cycles per Instruction x Cycle Time
=Instruction Count x (CPU Cycles per Instr. + Memory Cycles per Instr.) x Cycle Time
=Instruction Count x [CPU Cycles per Instr. +(References per Instr. x Cycles per References)] x Cycle Time

These four boxes represent four major pain points that can be addressed to have a significant performance change either positive or negative on the machine. The first element of the equation the number of instructions needed to perform a function is dependent on the instruction set architecture and is the same across all implementations. It is also dependent on the compiler’s design to produce efficient code. Optimizing compilers to execute functions with fewer executed instructions is desired.

CPU cycles per instructions are also dependent on compiler optimizations as the compiler can be made to choose instructions that are less CPU intensive and have a shorter path length. Pipelining instructions efficiently also improve this parameter which makes instructions maximize hardware resource optimization.

The average number of memory references per instruction and the average number of cycles per memory reference combine to form the average number of cycles per instruction. The former is a function of architecture and instruction selection algorithms of the compiler. This is constant across implementations of the architecture.

Instruction Set Architecture :

Reduced Instruction Set Computer (RISC) –
Reduced Instruction Set Computer (RISC) is one of the most popular instruction set. This is used by ARM processors and those are one of the most widely used chips for products.
Complex Instruction Set Computer (CISC) –
Complex Instruction Set Computer (CISC) is an instruction set architecture for the very specialized operation which has been researched and studied upon so thoroughly that even the processor microarchitecture is built for that specific purpose only.
Minimal instruction set computers (MISC) –
Minimal instruction set computers (MISC) the 8085 may be considered in this category compared to modern processors.
Explicitly parallel instruction computing (EPIC) –
Explicitly parallel instruction computing (EPIC) is an instruction set that is widely used in supercomputers.
One instruction set computer (OISC) –
One instruction set computer (OISC) uses assembly only.
Zero instruction set computer (ZISC) –
This is a neural network on a computer.

Compiler Technology :

Single Pass Compiler –
This source code is directly converted into machine code.
Two-Pass Compiler –
Source code is converted onto an intermediate representation which is converted into machine code.
Multipass Compiler –
In this source code is converted into intermediate code from the front end then it is converted into intermediate code after the middle-end then passed to the back end which is converted into machine code.

CPU Implementation :

The micro-architecture is dependent upon the design philosophy and methodology of the Engineers involved in the process. Take a simple example of making a circuit to take input from a common jack passing it through an amplifier then storing the data in a buffer.

Two approaches can be taken to solve the problem which is either putting a buffer in the beginning and putting two amplifiers and bypassing the current through either which would make sense if two different types of signals are supposed to be amplified or if there is a slight difference in the saturation region of the amplifiers. Or we could make a common current path and introduce a temporal dependence upon the buffer in which data is stored thereby eliminating the need for buffers altogether.

Minute differences like these in the VLSI microarchitecture of the processor create massive timing differences in the same Instruction Set Implementations by two different companies.

Cache and Memory Hierarchy :

This is again dependent upon the use case for which the system was built. Using a general-purpose computer also called a Personal Computer which can perform a wide variety of mathematical calculations and produce wide results but reasonably accurate for non-real-time systems in a hard real-time system will be very unwise.

A very big difference will be the time taken to access data in the cache.

A simple experiment may be run on your computer whereby you may find the cache size of your particular model of processor and try to access elements of an array around that array a massive speed down will be observed while trying to access an array greater than the cache size.

Article Tags :

Compiler Design

Computer Organization and Architecture

GATE CS