How to design a Live Video Streaming System Like ESPN

Last Updated : 20 Mar, 2023

A Brief Overview

In quite recent years, there has been increasing demand and growth in digital video processing and communication technology in the form of live streaming. Video streaming, that also live, requires a large amount of data to be processed and transferred within some limited bandwidth of communication channels.

However, even with all the advancements in current times in video streaming technology, there are lots of challenges to be overcome through a well-thought and executed architecture.

The most common issue while building such an application is making it Scalable. Not only the current user capacity it corresponds to, but also future of the app, scaling it to further people. Let’s follow along on how to mitigate all the issues, and develop a reliable and scalable solution for the video streaming system.

Requirement Gathering of what a system like ESPN must possess

Functional Requirements of the System

Latency or delay time should be the minimum possible, to keep the sync between the live event and stream(maximum of 60 seconds – for better user experience).
Video conversion to different resolutions and codecs(compression and decompression of media file), to automatically convert to higher or lower resolutions according to the bandwidth of the user.
Should be scalable to a large number of concurrent users. As the audience expands, the system must be able to manage a growing number of viewers.
System should be fault-tolerant, and availability should be high.
The system must be safe from unauthorized access or harmful attacks.
In order to reach the broadest potential audience, the system must be compatible with a wide range of devices and operating systems.

Architectural Design of ESPN

Architectural Design (HLD) of ESPN

Required Components of the above design

Video Source

It is the first component in a live-streaming application. It refers to the device that captures the video content that is being streamed. The most common video sources are webcams, smartphones, digital cameras, professional video cameras, and microphones.

Service for Video Transformation (Video Encoder)

The transformation service converts the video stream to different codecs and resolutions. The video stream comes in as a raw input and the job scheduler with the help of several worker nodes carries out this task of conversion. The new raw video stream is pushed to a message queue. The worker nodes’ task is to subscribe to this message queue, and once the video is converted, these nodes then push it to another message queue. The video encoding process involves compressing the video to reduce its size while maintaining quality. Popular video encoding formats include H.264 and H.265, and some of the popular encoding software include OBS (Open Broadcaster Software), XSplit, and Wirecast.

Database for raw video data

We want fault tolerance to not lose any video data, so we will use a database to store this raw data.

Distributed File Service

After worker nodes complete processing the video, the result should also be stored in a distributed file service for fault tolerance.

Message Queues

A message queue can help decouple different components of the streaming system, such as the video encoder and the content delivery network. By using a message queue to pass messages between these components, each component can operate independently, reducing the risk of failures or bottlenecks. It can be used as a buffer to handle spikes in traffic. Examples include Kafka and Pub/Sub.

Streaming Server

The streaming server is the core component of a live-streaming application. It receives the compressed video feed from the encoding software and distributes it to the viewers over the internet through CDNs. A streaming server can be hosted either on-premises or in the cloud, and it needs to be scalable to accommodate large numbers of viewers.

CDN

Let’s look at why caching with the help of CDN plays such an important role in this application:

A Content Distribution Network(CDN) aims to put content as close as possible to the users. When the user requests a video, the app will find the nearest server with the video. The app then streams the video from there to the device.

The most significant benefits of CDN are speed and reliability. Moving the video source as close as possible to the people watching it will make the viewing experience much faster and more reliable.

Along with optimizing page content and libraries, CDN also ensures that the application has a faster load time. It loads content from the nearest location of the user.

Caching can be used to improve the delivery of video content to viewers. By caching popular video content at edge locations(multiple servers) around the world with the help of CDN, content can be delivered to viewers more quickly and with lower latency, improving the overall quality of the viewing experience.
Caching encoding data, such as pre-encoded video segments or keyframes, can reduce the load on the video encoder and improve the system’s ability to handle spikes in traffic.
Caching metadata, such as video titles, descriptions, and thumbnails, can help reduce the load on the database and improve the performance of the system.

Popular CDNs for live video streaming include Akamai, Cloudflare, and Fastly.

Now, comes the final part where we look out for ways to transfer the video to the end users:

RTMP vs HLS/DASH for transferring videos to end users?

Let’s look at both of them one by one,

RTMP

It stands for Real-Time Messaging Protocol. RTMP is better than TCP, because there is no data loss through RTMP and is more reliable.

RTMP allows low-latency streaming and is commonly used for live-streaming applications, such as gaming, sports, and events. It can also be used for on-demand video delivery, where the video content is pre-recorded and played back to viewers at a later time,eg: Hotstar

HLS/DASH

HLS (HTTP Live Streaming) and DASH (Dynamic Adaptive Streaming over HTTP) are two popular streaming protocols used to deliver video content over the internet. Both HLS and DASH enable the adaptive bitrate streaming (ABS) technique, which adjusts the quality of the video based on the viewer’s available network bandwidth, device capabilities, and other factors.

HTTP Live Streaming is encoded in different resolutions and bitrates. As the connection gets slower, the protocol adjusts the requested bitrate to the available bandwidth. Therefore, it can work on different bandwidths, such as 3G or 4G. Hence, it provides the best UX based on the user’s bandwidth.

Now, to the actual comparison, HLS/DASH are relatively lower quality connections as compared to RTMP, so here we make a tradeoff by choosing real-time over quality. As it is a streaming protocol it can do this judiciously with good usage of bandwidth and in real-time.

Capacity Estimation per live stream

Let’s find out how much data we need to process according to each live stream – taking a football match as an example,

Let’s make some assumptions at first,

Suppose, footage which is captured in a particular kind of match on ESPN (football) is of 4K.

Now, for better understanding, we take now four kind of resolutions to serve: 1080p, 720p, 480p, and 360p.

So, codecs: 4

Duration of football match: 1hour and 30 minutes

So, let’s assume the size of footage: 4 GB

Now, size of 720p footage: 4/2 = 2GB
size of 480p footage: 4/4 = 1GB
size of 360p footage: 4/8 = 0.5GB

So, the total storage for all resolutions = 7.5 GB.

and, the total storage for all resolutions and codecs = 7.5*4 = 30 GB of data to be processed per live stream.

Let’s find out how much data will be transferred to CDN in a single live stream,

Let’s make some assumptions at first,

Number of users: 10 Million

% of users having HD resolution: 50%

% of live stream users watch on an average: 40%

Size of footage in standard resolution(480p): 4/4 = 1GB

Therefore, Total Data Transfer to CDN = (Video Size * Number of users for each resolution) * Average Watch % of the live stream

= (4 GB * 50/100 * 10^6 + 1 GB * 50/100 * 10^6) * 40/100 = 1.25 * 10^6 GB = 1.25 PB of data transfer will take place to CDN in a single stream.

Let’s find out how much time would it take to send data from the live stream to the user’s device,

Let’s make some assumptions at first,

Amount of data consumed by user in a second = 4GB/1.5 hour ~ 700kB/sec

Time taken to transfer 700kB of data to nearest CDN = 1sec

Time taken to transfer 700kB of data to user from CDN = 250ms

Total travel time for the stream to reach the user’s device = 1 + 0.25 + 0.25 = 1.5sec

Processing time assuming ffmpeg/Handbrake/VideoLan running at 2x of video speed = 1 / 2 = 0.5s

Total latency = 1.5 + 0.5 =2s.

Suggest improvement

Design Video Sharing Service System like Youtube

Share your thoughts in the comments