Open In App

Designing Zoom | System Design

Last Updated : 18 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Creating an app like Zoom may seem simple from the user’s perspective, but in reality, it’s a complex task involving hundreds of software engineers working for years. Zoom, like other similar apps, requires careful planning and design to provide seamless video conferencing services worldwide. This article explains how Zoom works and how it handles a lot of cases.

zoom-banner-new

1. Requirements of Zoom System Design

1.1 Functional Requirements of Zoom System Design

  • One-to-one calling feature: The Platform should support if one person is making a call to another person.
  • Support Group video calling: The Platform should support group video calls so that a group of people can come together on a group video call or an audio call and should be able to view other’s screens or anything.
  • Audio/Video/Screen share: The Platform should support calls that may be audio calls or video calls, and people would also be allowed to do a screen share. Here, video and screen sharing are the same thing which are fundamentally implemented in the same way. It is just a stream of video content whose input sources are now different.
  • Recording the video: The Platform could be able to record the video call so that the record will be available to users so that they could look at it afterward.

1.2 Non-Functional Requirements of Zoom System Design

  • Super Fast(High Latency): Platforms have to be super fast. Here low latency would not be good enough. Generally, systems like YouTube have low latency, and the video can be buffered on poor connectivity, it is accepted there. But on video calls you can’t have a lag which results in a bad user experience and users don’t like this
  • High availability: The system should be highly available. In terms of hardware, it should be fault tolerant and also distributed geographically, to provide low latency and more durability.
  • Data loss: In some cases, data loss is accepted. When we are watching any video if we miss a couple of frames of the video, generally we don’t realize it. Especially in the case of a video call even if a few frames are missed it doesn’t make much difference.

2. Capacity Estimation

Let’s assume we have 1 billion users. Assuming 1 billion users with 100 million group video calls daily, the Zoom App needs to handle approximately 58,000 requests per second to provide a scalable backend.

  • Storage Estimation: Approximately 2.32 GB of storage is required per day. So that we can accommodate the data generated by user activities, messages, and any other things in the app. Video recording up to 100mb and 1,00,000 video recordings then 10TB of video per day. Managing and storing such high volumes of recordings is mandatory.
  • Traffic Estimation: Group calls up to 100 people per video chat. In general 10 TB a day for the video calls. This is required to handle such high bandwidth requirements.
  • Bandwidth Estimation: 100ms latency cause bad experience, so calculating the best we can have 64ms for one way trip and 128ms for round trip. We should attain low latency so that we are optimizing network routing.
  • Memory Estimation: For a normal connectivity, 1 GB per hour then per day its approx. 24 GB. This is average estimate we are assuming.

3. High-Level Design of Zoom System Design

At the heart of Zoom’s success is its robust infrastructure, which includes key features like Zoom clients, distributed data centers, web infrastructure, and new technologies like HTTP tunnels Let’s explore how each feature has contributed to Zoom’s impressive growth and we have overcome the challenges.

High-Level-Design-of-Zoom-System-Design

Zoom Client

The Zoom client acts as a gateway for users to access virtual sessions. Its ease of use and easy integration across devices has made it a favorite of millions of users around the world. With the rise of remote work the Zoom client played a key role in ensuring seamless communication even in bandwidth restricted environments.

Distributed data centers

Zoom’s distributed data centers form the backbone of its business, handling the processing and storage of large volumes of conference data This decentralized approach not only ensures scalability but drives reliability higher by the risk of termination. As demand increased, Zoom rapidly scaled its data center infrastructure to accommodate users, thus maintaining optimal performance.

Web design

The Zoom web design supports its client application, providing users with a consistent experience across browsers. This allows participants to seamlessly participate in sessions without having to download additional software. Due to increased demand, Zoom beefed up its network infrastructure to handle the increased traffic, providing unfettered access to its users worldwide .

HTTP Tunnel

To handle firewall and proxy restrictions, Zoom uses the HTTP tunneling mechanism, which binds Zoom traffic in HTTP requests and responses. This allows users to interact with Zoom sessions even in a limited web environment.

3.1 How does Data Flow in the zoom client happens?

Data-Flow-in-Zoom-Application

TCP

In TCP, everything is in the form of packets. Information packets are sent from client to server.

  • Let’s say client wanted to send packet-P1 he will do this by saying, I’m sending packet P1, which is one for this kind of communication started and sends it to the server.
  • Server then sends an acknowledgement back to the client saying, I have received P1 packet.
  • Till a fixed point in time, if a client didn’t receive an acknowledgement, client will again sends the same packet P1. And furthermore, the client send the P2,P3… and every packet has 2,3… numbers so that server can understand the order he is receiving.
  • TCP does it’s best to have no data loss happening between client and server.

Here in our case, TCP is bad suggestion for Zoom.

Let us see why?

  • In the non-functional requirements, we are okay with some data loss but we want the communication to be fast. Here in TCP to establish a connection three-way handshake is done and sequencing of packets.
  • The TCP header is nearly 20 bytes. This is a lot of information and we should try to minimize this.
  • Congestion control this would sometimes even slow down sending the packets from the client itself. For video-chat application this will slow down the user experience.

Then, the other best option we have is UDP

UDP

UDP is also a Transport Layer Protocol like TCP. They run at the same layer but they are functionally different. It is an lossy protocol.

udp-new

Let us see how UDP Works:

Imagine we have a client and it want to send information to server. UDP keeps on sending irrespective to the payload of whether the server is receiving it or not.

For example:

we have packets P1, P2, P3 to send. The client sent P1 and the server received it. Client sends P2, suppose server has not received P2 due to congestion or packet loss or anything, the client doesn’t send P2 again it moves to P3 and suppose server got P3.

TCP vs UDP

  • Reliability: TCP ensures reliable data delivery with error detection, retransmission of lost packets, and in-order delivery. UDP does not guarantee reliability or packet ordering.
  • Connection: TCP establishes a connection before data transfer, ensuring a reliable, bidirectional channel. UDP is connectionless, allowing data to be sent without prior setup.
  • Overhead: TCP has higher overhead due to its reliability features, including acknowledgment packets and flow control mechanisms. UDP has lower overhead since it lacks these features.
  • Congestion Control: TCP includes congestion control mechanisms to prevent network congestion by adjusting data transmission rates. UDP does not have built-in congestion control, leaving it to applications to handle congestion.

Drawbacks of UDP

  • It has a drawback the packets may come out of order. It might happen server might get P3 first and then P1. But in TCP, we can have the packets ordered.
  • But for video conferencing apps it is better to miss couple of frames or reordered than to compromise on fastness.

Note: We will be using TCP for all the communication between a client and a server which does not involve video transfer. Only the video transferring happens then that would happen on UDP.

4. Low-Level Design of Zoom System Design

Let us now discuss about the low-level design of zoom system design

low-Level-Design-of-Zoom-System-Design-copy

Below is the explanation of the above low-level design image:

  • Everything in orange is a user interface. Most likely these will be mobile applications.
  • Things in Grey are the load balancers + reverse proxy + authentication authorisation layer.
  • Things in blue are the web services or the UDP services that we have developed.
  • And things in pink would be the databases, data stores or some kind of clusters that we will use.

When a user, let’s say U1, wants to start a call with another user, U2, the process involves several backend components working together seamlessly.

1. WebSocket Handler

  • It maintains live connections with active users and facilitates bidirectional communication between users and server.
  • It utilizes WebSocket technology for persistent connections and handles incoming messages and route them to appropriate recipients.
  • We can deploy multiple WebSocket Handlers behind a load balancer to distribute incoming connections evenly. And better to implement connection pooling and efficient message routing algorithms to handle high traffic.

2. WebSocket Manager

  • This can manage mapping between WebSocket Handler machines and users.
  • Also ensures correct routing of messages between users and WebSocket Handlers.
  • It maintain a distributed data store to store mappings efficiently and utilizes consistent hashing or another suitable algorithm to distribute mappings evenly.
  • To design for horizontal scalability to handle increasing numbers of users and WebSocket Handlers.
  • Implement strategies for fault tolerance and recovery in case of failures.

3. Signalling Service

  • Signalling service initiates and coordinate communication between users.
  • It checks for call conditions and coordinate with User Service and implements APIs for call initiation, termination, and status updates.
  • Integrates with User Service for user authentication and authorization.
  • It Ensure the service can handle concurrent call requests efficiently. Also, uses asynchronous processing and message queues for scalability and fault tolerance.

4. User Service

  • User Service is a repository for user data. It handles user authentication, authorization, and access control.
  • It utilizes a database to store user information securely.
  • This implements APIs for user registration, login, and profile management.
  • We can design for horizontal scalability to handle increasing user base. And by using caching mechanisms we can reduce database load and improve performance.

5. Connector (STUN Server)

  • This assists users in discovering their publicly accessible IP addresses. Facilitate peer-to-peer connection establishment.
  • Implements STUN protocol for IP address discovery and integrates with WebSocket Handler and Signalling Service for communication.
  • This deploys multiple instances of the STUN Server for redundancy and load distribution. We can monitor and scale resources based on demand to ensure availability.

6. Handshake for Connection Details

  • Exchanges information about available bitrate, codec support, and bandwidth between users.
  • Defines protocols and message formats for exchanging connection details.
  • This ensures compatibility between different clients and devices.
  • We can optimize message formats and protocols for efficiency to handle high message throughput.

7. Establishing Peer-to-Peer Connection

  • We can establish a direct connection between users for real-time communication by enabling packet exchange for video call transmission.
  • It utilizes WebRTC technology for peer-to-peer communication.
  • Implements NAT traversal techniques for connectivity across different network configurations.
  • This monitors connection and adjust resources dynamically to maintain optimal performance.

8. Fallback to TURN Server

  • Act as an intermediary for relaying messages between users when peer-to-peer connection fails.
  • Deploys TURN Server instances for relaying messages.
  • By integrating with WebSocket Handler and Signalling Service for fallback mechanism.
  • Ensures TURN Server instances are deployed in geographically distributed locations for low latency.
  • We should monitor server load and scale resources as needed to handle increased traffic.

9. Handling Bandwidth Changes

  • Log events into Kafka for processing when bandwidth fluctuates during calls.
  • Defines event schemas and topics for logging bandwidth changes.
  • Implements Kafka producers for publishing events.
  • By implementing partitioning and replication strategies for fault tolerance and high availability.

Important Scenarios

  1. Group Conversations:
    • It is Peer-to-peer for small groups, Call Server for large groups. Transcoding for different user bandwidths and codecs. Analytics events logged. Dynamically adjusts bandwidth.
    • The clients can dynamically switch from peer-to-peer to Call Server
  2. Recording:
    • Logger service records chunks of conversation. File created and stored in distributed file systems. Notifications will be received by users with link of recording.
  3. For Live video:
    • Aggregates video and audio inputs from cameras and microphones and transcoders convert input streams for different devices.
    • Call Servers receive the transcoded streams and distribute them via Content Delivery Networks (CDNs) they handle session manangement, user authentication etc.
    • WebSocket Manager Coordinates communication between different Call Servers and manages the WebSocket connections between clients and servers.
    • Provides real time data exchange and provide fault tolerance and load balance to maintain the performance as well.

Here Call Server is close to the users, soo that latency would be minimized. From Call Sever to users, there could be a lot of hops, we want to minimize the number of hops over here because this is where the data is getting replicated multiple times.

5. Microservices used in Zoom System Design

Zoom’s structure has user management service, meeting scheduler service, video streaming services, chat services, record management services, notification services and so on. Some of them are mentioned below:

  • User Management Service: Manages user authentication, registration, and profiles.
  • Meeting Scheduler Service: Facilitates meeting scheduling and organization.
  • Video Streaming Service: Enables real-time audio and video communication during meetings.
  • Chat Service: Supports real-time messaging and collaboration among participants.
  • Recording Management Service: Manages recording functionalities for meetings and webinars.
  • Notification Service: Sends out notifications and reminders for meetings and updates.

6. API Design of Zoom System Design

API-Design-of-Zoom-System-Design

Zoom API provides these set of endpoints:

  • Generate access tokens for authenticated users to access Zoom API resources securely.
  • Endpoints for creating, updating, and deleting meetings, enabling users to manage schedules and retrieve detailed meeting information, allowing third-party applications to display meeting details and manage meeting-related tasks.
  • Endpoints to retrieve user details, update user profiles, and manage user roles and permissions, and ability to list users and create new user accounts via API.
  • Endpoints for listing recordings, retrieving recording details, and managing recording settings and permissions, enabling users to access and retrieve recording URLs or download recordings, and shares record meeting content.
  • Endpoints for sending chat messages, retrieving chat history, and managing chat channels and groups, enabling real-time communication and facilitates team collaboration.

7. Database Design of Zoom System Design

  • Zoom’s database design is about user management and recording functionalities.
  • A user table that should store necessary user information, while a separate recording table manages recorded sessions and required information.
  • Permissions are set to record access tables to facilitate user access to recordings.

Database-Design-of-Zoom-System-Design

Zoom uses public clouds like AWS o host metadata of meetings, web applications and other services. It uses AWS for real-time traffic. For educational users zoom expands to Oracle Cloud.

In this Design:

  • AWS hosts the tables for Users, Meetings, and Recordings.
  • Oracle Cloud Database hosts the table for Educational Resources.
  • Users table is shared between AWS and Oracle Cloud Database.
  • Meetings table includes details about scheduled meetings, with a foreign key reference to the Users table for the host.
  • Recordings table stores information about recorded meetings, linked to the Meetings table via a foreign key.
  • Educational Resources table contains details about educational materials, with an uploader_id referencing the Users table.

8. How does Zoom handle Scalability?

  • Zoom’s architecture distributes meetings across its data center network, allowing users to join meetings via the closest data center, ensuring scalability and a reliable video experience for large gatherings.
  • Unlike legacy systems that rely on resource-intensive Multipoint Control Units (MCUs), Zoom’s multimedia routing delivers multiple video streams directly to clients, reducing computing requirements and enabling scalability for meetings with thousands of participants.
  • Each video stream in Zoom can adjust to multiple resolutions, eliminating the need for separate encoding and decoding processes for each endpoint. This optimization enhances performance and scalability while providing varying levels of video quality based on device capabilities and network conditions.
  • Zoom’s quality-of-service application layer optimizes video, audio, and screen-sharing experiences based on each device’s capabilities and available bandwidth. This proactive approach ensures the best possible user experience across diverse network conditions.
  • With support for distributed architecture and multimedia routing, Zoom can accommodate meetings with thousands of participants, ensuring seamless video and audio communication for large-scale events.

9. Conclusion

Zoom’s design encompasses numerous components and strategies to ensure seamless, reliable, and scalable video communication services for its extensive user base. Its key focus on efficiency, fault tolerance, and adaptability positions Zoom as a leading platform in modern video conferencing, providing an outstanding communication experience.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads