Open In App

Design a system to count likes, dislikes and comments on YouTube videos

YouTube is the world’s largest video-sharing platform, with billions of users uploading and consuming content daily. The success of a YouTube video can be measured in various ways, including the number of likes, dislikes, and comments it receives. Tracking these engagement metrics is essential for content creators and the platform itself.

Requirements

Functional Requirements

Non-Functional Requirements

Capacity Assumptions

To estimate the scale of the system and to get the idea about the storage requirements, we have to make some assumptions about the data queries and the average size of videos uploaded.

Following are some assumptions for the given design.

Storage Estimates:

Let’s assume on average:



1 comment is of 30 characters = 30 Bytes.

Total size of comments per day = (30 * 1 billion) = 30 GB.

Total comments size per month = (30 * 30) Gb = 900 GB

If we store comments for 10 years, size = 9000 GB = 9TB.

High-Level Design

The design is read intensive as more users will fetch the comments and likes/dislikes than the users who will actually comment and like the videos. At a high level, our system will need to handle two core flows:

Read path to serve video metadata and engagement stats:

Write path to track new engagement like likes, comments:

We’ll need the following components:

Client Apps

Web and mobile clients will send request to the load balancer.

Load Balancer

Load Balancer’s primary role is to evenly distribute incoming network traffic or requests across a group of servers or resources. This distribution helps optimize resource utilization, prevent server overload, and ensure high availability and reliability of services. Load balancers enhance performance, scalability, and fault tolerance, making them essential for web applications, websites, and other services where multiple servers are deployed.

API Services

API services manage connections to other services handling client requests and core logic. They retrieve video data from the database and authenticate users, providing user IDs upon login, ensuring secure access to resources.

Comment Post and Count Services

It will store the comment posted by the user for the video into the database. It will also manage the counts for like/dislike and comments for the video.

Database

It will store all the metadata permanently and can be retrieved on demand.

What happens when the user clicks the like/dislike button again?

We don’t want our users to like and dislike an already liked/disliked video. But, what should happen if a user likes or dislikes a video again:

We have to create a metadata which stores information about if a user has already like or disliked the video.

Database Design

VideoStats

{
– video_id
– view_count
– like_count
– dislike_count
– comment_count
}

The video_id field connects each engagement stat row with the specific video it refers to in the Videos table.

The view_count, like_count, dislike_count and comment_count fields store the latest engagement metrics for each video.

LikedStats

{
– video_id
– user_id
– is_liked
– is_disliked
– time_stamp
}

This metadata stores information about when the user has liked the video or not and can be used to check his previous like or dislike action for the video.

We will use two boolean values for like and dislike as both can be false if user neither likes and dislikes the video.

Which Database we will use ?

To achieve this, a NoSQL database is often the preferred choice, as it excels in managing big data and offers easy scalability. Two popular options for this purpose are MongoDB and CouchBase. NoSQL databases are particularly well-suited for this scenario because they can store and retrieve unstructured or semi-structured data efficiently, which is common in metadata storage for videos.

Communicating with the servers:

We will use the rest API to request and post all data through the API Servers.

Why we will use RestAPI for communicating on server?

A RESTful API is a suitable choice for implementing a system to count likes, dislikes, and comments on YouTube videos due to its simplicity, scalability, and compatibility with standard HTTP methods. This design promotes clean and intuitive interactions, making it easier for developers to understand and work with the API. It also aligns with the principles of the HTTP protocol, utilizing standard methods like GET for reading data and POST for creating new data, which simplifies the implementation and usage of the API. RESTful APIs are highly scalable and suitable for handling the enormous scale of a platform like YouTube, where millions of users interact with videos daily.

API structure:

For posting a comment:

HTTP Method: POST

Endpoint: ‘/videos/{video_id}/comments’

Request Body:

{
“video_id”: “{video_id},
“user_id”: “{user_id}”,
“comment_text”: “Your comment text here”
}

Response: success message(200).

For liking the video

HTTP Method: POST

Endpoint: ’/videos/{video_id}/likes’

Request Body:

{
“video_id”: “{video_id}”,
“user_id”: “{user_id}”
}

Response: success message(200).

Disliking a Video:

HTTP Method: POST

Endpoint: ‘/videos/{video_id}/dislikes’

Request Body:

{
“video_Id”: “{video_id}”,
“user_id”: “{user_id}”
}

Response: Success message(200)

Getting all the stats

HTTP Method: GET

Endpoint: ‘/videos/{video_id}/statistics’

Response: JSON object containing count for comments, likes and dislikes.

Microservices Used

Let’s drill deeper into each microservices:

Authentication services

They are responsible for ensuring the security and access control of a platform. When a user is not logged in, they can access fundamental functionalities such as retrieving counts, but more advanced actions like liking or disliking videos and posting comments are restricted. However, upon successful login, these services authenticate the user’s identity and provide them with a user ID, granting full access to the application’s features and personalized content.

Post Services

It checks if the user has liked/disliked a video or if the user wants to post any comment. If a user wants to post a comment then the request is pushed into the Comment Queue. The comment upload service is the consumer of the queue which further processes the request. We can use any message queue like Kafka for this purpose. Kafka has a high throughput, and is highly scalable with low latency(as low as 2ms). If the user has liked/disliked a video then the request is pushed to the Like Queue of which the Like Finder is the consumer.

Comment Upload Service

It pulls the request from the Comment Queue and stores the comment into the database and calls the comment count services. It also pushes notification to the Notification Queue about the successful post of the comment. The Notification Service pulls the request from the queue and then further notifies the user/client.

Comment count services

It increments the comments in batch and then periodically updates the database with new comment count. Updating comments in a batch leads to less calls to the database which reduces the database load significantly.

Like finder

It searches the database and checks for any previous like or dislike for the video that the user has again liked or disliked and then accordingly provides the data to Like/Unlike call services. We can use ElasticSearch for searching the data from our NoSql database. Elasticsearch is a distributed search and analytics engine which can search data in distributed databases.

Like/Unlike Count Services

It increments and decrements the likes and dislikes according to the data by Like Finder and then counts the likes and dislikes in batch and updates the database. It also pushes a notification to the Notification Queue about the change of the like and dislike status.

Notification Service

It pulls the notification data from the Notification Queue and notifies the user about the status of the posted comments, likes and dislikes.

Read Services

It takes the read request about the counts and then searches the metadata of VideoStats in the Cache. Elasticsearch can be used to search the metadata from the cache and database. If there is a cache miss (data not found in the cache), then the data is searched in the database and the updated to the cache for future use. It also keeps track of most frequent queries and then updates the cache accordingly.

Cache

Caches play a pivotal role in optimizing data access and system performance by acting as high-speed, temporary storage. They store frequently accessed data, allowing for rapid retrieval without the need to access the primary data source, such as a database, each time. In this context, an important caching strategy known as Write Through is employed.

This strategy is well-suited for systems where data is read far more frequently than it’s written. For implementing this caching strategy, Redis Cache is often chosen due to its in-memory capabilities and high-performance characteristics, making it a valuable asset in applications seeking to strike a balance between speedy data access and consistent data updates.

DataBase

It stores all the metadata and statistics of the videos. To effectively handle the demanding read and write workloads, the data is shared and replicated across multiple nodes or servers. This replication ensures data redundancy and fault tolerance, making the system more resilient. Additionally, to enable efficient distributed searches, the data is indexed, allowing for rapid retrieval of specific information.

Workflow

The general workflow will be:

For posting a comment:

When Video is liked or disliked:

When a user likes or dislikes a video, several steps ensure everything works smoothly.

For Pulling the read count

When a user clicks on a video to play it, a chain of actions is set in motion. An API request is triggered to gather metadata about the video. This request travels through the load balancer, which then directs it to the API Services. The API Services further guide the request to the Read Services, where ElasticSearch is utilized to search for the requested data within the cache.

Scalability Considerations

Our setup is highly scalable:

We can start small and scale horizontally as the system grows. The asynchronous nature ensures high read and write throughput.

Conclusion

In this article, we have explored a scalable architecture to track YouTube-scale video statistics:

By separating reads and writes streams, using message queues, and scaling databases horizontally, we can build a robust architecture that scales well with video demand.


Article Tags :