Open In App

Latency and Throughput in System Design

Last Updated : 19 Mar, 2024
Like Article

Latency can be seen as the time it takes for data or a signal to travel from one point to another in a system. It encompasses various delays, such as processing time, transmission time, and response time. Latency is a very important topic for System Design. Performance optimization is a common topic in system design, Performance Optimization is a part of Latency. In this article, we will discuss what is latency, how latency works, and How to measure Latency, we will understand this with an example.


1. Latency meaning


Latency refers to the time it takes for a request to travel from its point of origin to its destination and receive a response.

  • Latency represents the delay between an action and its corresponding reaction
  • It can be measured in various units like seconds, milliseconds, and nanoseconds depending on the system and application.

What does it involve?

Latency involves so many things such as processing time, time to travel over the network between components, and queuing time.

  • Round Trip Time: This includes the time taken for the request to travel to the server, processing time at the server, and the response time back to the sender.
  • Different Components: Processing time, transmission time (over network or between components), queueing time (waiting in line for processing), and even human reaction time can all contribue to overall latency.

2. How does Latency work?

The time taken for each step—transmitting the action, server processing, transmitting the response, and updating your screen—contributes to the overall latency.

Example: Let see an example when player in an online game firing a weapon.

  • If your latency is high: You press “fire.”
  • The command travels through the internet to the server, which takes time.
  • The server processes the shot.
  • The result travels back to your device.
  • Your screen updates the result.

During this time, another player might have moved or shot you, but their actions haven’t reached your device yet due to latency. This can result in what’s called “shot registration delay.” Your actions feel less immediate, and you might see inconsistencies between what you’re seeing and what’s happening in the game world.

The working of Latency can be understood by two ways:

  • Network Latency
  • System Latency

2.1 What is Network Latency?


Network Latency is a type of Latency in system design, it refers to the time it takes for data to travel from one point in a network to another.

We can take the example of email, think of it as the delay between hitting send on an email and the recipient actually receiving it. Just like overall latency, it’s measured in milliseconds or even microseconds for real time application.

Problem Statement:

Imagine sending a letter to a friend across the country. The time it takes from dropping the letter in the mailbox to its arrival in your friends hand is analogus to network latency.

However, instead of physical transportation, data travels as packets through cables, routers, switches. Here is the following ways

  • Intiation: You click send on an email, triggering data packets information.
  • Encapsulation: Data gets divided into packets with routing header and checksums.
  • Transmission: Packets travel through various network devices like routers and switches.
  • Processing: Each device checks headers, routes packets through the network, and adds some delay.
  • Propagation: Data travels as electrical or light signals through cables or wireless frequencies.
  • Destination: Packets reach your frinds computer, get reassembled, and the email content is displayed.

2.2 What is System Latency?

System latency refers to the overall time it takes for a request to go from its origin in the system to its destination and receive a response

Think of Latency as the “wait time” in a system.

Problem Statement:

Clicking a button on a website, lets suppose login or SignUp button.

  • User action: You click the button, sending a request to the web server.
  • Processing: The server receives the request, processes it (database access, calculations, etc.).
  • Response: The server generates a response and sends it back to your browser.
  • Rendering: Your browser receives the response, parses it, and updates the webpage accordingly.

The time between clicking and seeing the updated webpage is the system latency. It includes processing time on both client and server, network transfers, and rendering delays.

3. How does High Latency occur?

The causes of latency can vary depending on the context, but here are some general point:

  • Physical Distance: The farther apart sender and receiver are, the longer data packets take to travel, increasing latency.
  • Network Congestion: When many devices use the network simultaneously, data packets can get stuck in queues, leading to delays.
  • Inefficient Network Infrastructure: Outdated equipment, overloaded cables, and inefficient routing protocols can contribute to slower data transfer.
  • Wireless Interference: Signal interference in Wi-Fi networks can cause delays and packet loss, impacting latency.
  • Slow Hardware: Processors, storage devices, and network cards with limited processing power can bottleneck performance and increase latency.
  • Software Inefficiency: Unoptimized code, inefficient algorithms, and unnecessary background processes can slow down system responsiveness.
  • Database Access: Complex database queries or overloaded databases can take longer to process and generate responses, affecting system latency.
  • Resource Competition: When multiple applications or users share resources like CPU or memory, they can introduce delays waiting for their turn, increasing overall latency.

4. How to measure Latency?

There are various ways to measure latency. Here are some common methods:

  • Ping: This widely used tool sends data packets to a target server and measures the round-trip time (RTT), providing an estimate of network latency between two points.
  • Traceroute: This tool displays the path data packets take to reach a specific destination, revealing which network hops contribute the most to overall latency.
  • MTR (traceroute with ping): Combines traceroute and ping functionality, showing both routing information and RTT at each hop along the path.
  • Network monitoring tools: Dedicated network monitoring tools offer comprehensive analysis of network performance, including latency metrics for different components and connections.
  • Time stamps: Inserting time stamps at various points within a system’s code can measure the time it takes for specific operations to occur, revealing bottlenecks and areas for optimization.
  • Performance profiling tools: Specialized profiling tools track resource usage and execution times within a system, providing detailed insights into system latency contributors.
  • Application performance monitoring (APM) tools: Similar to network monitoring tools for networks, APM tools monitor the performance of applications, including response times and latency across various components.

Tips for accurate measurement:

  • Repeat measurements: Latency can fluctuate due to various factors, so performing multiple measurements and averaging the results can provide a more accurate picture.
  • Control variables: Minimize external influences by controlling variables like network load or system resource usage during measurements.
  • Use appropriate tools: Choose tools specific to the type of latency you’re measuring and ensure they are accurate and calibrated for your environment.

5. Example for calculating the Latency

5.1 Problem Statement

Calculate the round-trip time (RTT) latency for a data packet traveling between a client in New York City and a server in London, UK, assuming a direct fiber-optic connection with a propagation speed of 200,000 km/s.

  • Distance: Distance between NYC and London: 5570 km
  • Propagation speed: 200,000 km/s 
  • Constraints: Assume no network congestion or processing delays.
  • Desired Output: RTT latency in milliseconds.

5.2 Problem Statement

Calculate the average latency for a user clicking a button on a web application hosted on a server with a 5 ms processing time. Assume a network latency of 20 ms between the user’s device and the server.

  • Network latency: 20 ms
  • Server processing time: 5 ms 
  • Constraints: Assume no additional processing delays on the client-side. 
  • Desired Output: Average latency in milliseconds.

6. Use Cases of Latency

6.1 Latency in Transactions

In the context of transactions, latency refers to the time it takes for a request (e.g., initiating a payment) to be processed and the response (e.g., confirmation or completion) to be received. Steps Involved in this process are:

  • Initiating Payment: You tap your phone to make a payment using Google Pay.
  • Network Transmission: The payment request is sent to the payment processor through the Internet.
  • Payment Processing: The processor verifies the transaction details and your account balance.
  • Response Transmission: The result (approved or declined) is sent back to your phone.
  • Confirmation Screen: Your phone displays the payment confirmation.

Example: Gpay, Paytm

Note: High latency here might cause a delay between your payment initiation and the confirmation on your screen. However, the delay is usually short, especially for contactless payment systems designed for quick transactions.

6.2 Latency in Gaming

In the context of gaming, latency refers to the delay between a player’s action and the corresponding response they see on their screen. Let’s take the example of shooting a gun in an online shooter game:

Steps Involved:

  • Player Action: You press the “fire” button to shoot your virtual gun.
  • Network Transmission: The information about your action is sent to the game server via the internet.
  • Server Processing: The game server receives your action, processes it, and determines the result (hit or miss).
  • Response Transmission: The result of your action is sent back to your device.
  • Your Screen Update: Your device displays the outcome of your shot.

further read: High Latency vs Low Latency

7. What is Throughput?

Throughput generally refers to the rate at which a system, process, or network can transfer data or perform operations in a given period of time. It is often measured in terms of bits per second (bps), bytes per second, transactions per second, etc. It is calculated by taking a sum of number of operations/items processed divided by the amount of time taken.

For example, an ice-cream factory produces 50 ice-creams in an hour so the throughput of the factory is 50 ice-creams/hour.


Here are a few contexts in which throughput is commonly used:

  1. Network Throughput: In networking, throughput refers to the amount of data that can be transmitted over a network in a given period. It’s an essential metric for evaluating the performance of communication channels.
  2. Disk Throughput: In storage systems, throughput measures how quickly data can be read from or written to a storage device, usually expressed in terms of bytes per second.
  3. Processing Throughput: In computing, especially in the context of CPUs or processors, throughput is the number of operations completed in a unit of time. It could refer to the number of instructions executed per second.

8. Difference between Throughput and Latency (Throughput vs. Latency)






The number of tasks completed in a given time period.

The time it takes for a single task to be completed.

Measurement Unit

Typically measured in operations per second or transactions per second.

Measured in time units such as milliseconds or seconds.


Inversely related to latency. Higher throughput often corresponds to lower latency.

Inversely related to throughput. Lower latency often corresponds to higher throughput.


A network with high throughput can transfer large amounts of data quickly.

Low latency in gaming means minimal delay between user input and on-screen action.

Impact on System

Reflects the overall system capacity and ability to handle multiple tasks simultaneously.

Reflects the responsiveness and perceived speed of the system from the user’s perspective.

9. Factors affecting Throughput

  1. Network Congestion:
    • High levels of traffic on a network can lead to congestion, reducing the available bandwidth and impacting throughput.
    • Solutions may include load balancing, traffic prioritization, and network optimization.
  2. Bandwidth Limitations:
    • The maximum capacity of the network or communication channel can constrain throughput.
    • Upgrading to higher bandwidth connections can address this limitation.
  3. Hardware Performance:
    • The capabilities of routers, switches, and other networking equipment can influence throughput.
    • Upgrading hardware or optimizing configurations may be necessary to improve performance.
  4. Software Efficiency:
    • Inefficient software design or poorly optimized algorithms can contribute to reduced throughput.
    • Code optimization, caching strategies, and parallel processing can enhance software efficiency.
  5. Protocol Overhead:
    • Communication protocols introduce overhead, affecting the efficiency of data transmission.
    • Choosing efficient protocols and minimizing unnecessary protocol layers can improve throughput.
  6. Latency:
    • High latency can impact throughput, especially in applications where real-time data processing is crucial.
    • Optimizing routing paths and using low-latency technologies can reduce delays.
  7. Data Compression and Encryption:
    • While compression can reduce the amount of data transmitted, it may introduce processing overhead.
    • Similarly, encryption algorithms can impact throughput, and balancing security needs with performance is crucial.

10. Methods to improve Throughput

  1. Network Optimization:
    • Utilize efficient network protocols to minimize overhead.
    • Implement Quality of Service (QoS) policies to prioritize critical traffic.
    • Optimize routing algorithms to reduce latency and packet loss.
  2. Load Balancing:
    • Distribute network traffic evenly across multiple servers or paths.
    • Prevents resource overutilization on specific nodes, improving overall throughput.
  3. Hardware Upgrades:
    • Upgrade network devices, such as routers, switches, and NICs, to higher-performing models.
    • Ensure that servers and storage devices meet the demands of the workload.
  4. Software Optimization:
    • Optimize algorithms and code to reduce processing time.
    • Minimize unnecessary computations and improve code efficiency.
  5. Compression Techniques:
    • Use data compression to reduce the amount of data transmitted over the network.
    • Decreases the time required for data transfer, improving throughput.
  6. Caching Strategies:
    • Implement caching mechanisms to store and retrieve frequently used data locally.
    • Reduces the need to fetch data from slower external sources, improving response times and throughput.
  7. Database Optimization:
    • Optimize database queries and indexes to improve data retrieval times.
    • Use connection pooling to efficiently manage database connections.
  8. Concurrency Control:
    • Employ effective concurrency control mechanisms to manage simultaneous access to resources.
    • Avoid bottlenecks caused by contention for shared resources.


Thus, it can be said that latency is a pivotal factor in system design, which impacts user experience and the performance of applications on a large scale. It’s essential to manage latency effectively, especially when scaling systems, to ensure a responsive and seamless experience for users across various applications and services.

Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads