Open In App

Design Facebook’s live update of comments on posts | System Design

Last Updated : 04 Oct, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

A live comment system, like Facebook’s, works by employing real-time communication technologies, such as WebSockets, to establish a persistent connection between users and the server. When a user posts a comment, it’s immediately sent to the server, which then broadcasts the comment to all users viewing the same post in real-time. This allows users to see new comments as they are posted without the need to manually refresh the page.

How does Facebook’s live update of comments on posts work?

Facebook’s live comment feature operates by leveraging real-time communication technologies. When a user posts a comment on a live post, the system immediately sends this comment to a central server. This server, often using WebSocket technology, establishes a persistent connection with all users who are currently viewing the same post.

Once the server receives the new comment, it broadcasts it to all connected users in real-time. This means that anyone watching the post can see the newly posted comment without having to manually refresh the page. The result is a seamless and engaging user experience that fosters real-time interaction and conversation among users during live events or discussions on the platform.

Requirements

  • Users should see new comments on posts in real time without manually refreshing the page.
  • Updates should be delivered to users with minimal delay.
  • The system should be able to handle a massive number of concurrent users and comments.
  • Ensure that only authorized users can post comments and that the system is protected against spam and abuse.

Capacity estimation

About 3 billion monthly active users on Facebook in 2023.
every minute there are 510,000 comments posted on Facebook.
Total Comments post in One hour = 510,000*60
= 30,600,000

Low level design

In the Low-Level Design (LLD) of Facebook’s live comment system, the technical intricacies come into play to create a seamless real-time commenting experience for users. Key aspects of this design include the implementation of WebSocket technology for real-time communication, robust client-side handling through JavaScript, and a scalable server-side infrastructure capable of handling millions of concurrent users.

1. WebSocket Implementation

Facebook uses WebSocket technology to establish persistent connections between users and the server. This allows for real-time communication without the need for continuous polling. WebSocket libraries like Socket.io or libraries in programming languages such as Python and Node.js are commonly used to handle these connections efficiently.

2. Client-Side Handling

JavaScript plays a crucial role in managing WebSocket connections on the client-side. It establishes a connection with the server when a user accesses a live post. The client-side JavaScript listens for incoming comments through the WebSocket connection and updates the comment section in real time. User interface components may also include features like comment moderation, reactions, and notifications.

3. Server-Side Handling

The server-side infrastructure should be robust enough to handle a massive number of concurrent WebSocket connections. Load balancers distribute incoming WebSocket connections across multiple servers to ensure even load distribution. each live post typically has a dedicated channel or room on the server to manage comments for that specific post efficiently.

4. Authentication and Authorization

To maintain security and prevent spam or abuse, WebSocket connections must be authenticated. Facebook’s server verifies the user’s identity before allowing them to participate in the live comment stream. Access controls ensure that only authorized users can post comments.

5. Comment Posting

When a user submits a comment, it’s sent to the server via WebSocket, which immediately validates and stores it in the database. The server then broadcasts the new comment to all connected WebSocket clients within the corresponding channel. Facebook’s live comment system employs real-time push mechanisms to distribute comments to users as they are posted.

6. Data Storage

Comments are stored in a database for retrieval and historical reference. Facebook likely uses distributed databases or sharding to handle the high volume of comments efficiently. The database schema includes fields for comments, timestamps, user information, and associations with specific posts.

7. Comment Retrieval

When a user joins a live post discussion, the server retrieves existing comments from the database and sends them to the user’s WebSocket connection. Pagination and efficient querying are essential for managing large comment datasets while ensuring low latency.

Database Design

Designing the database for Facebook’s live comment system requires careful consideration of data structure, performance, and scalability. Here’s a detailed explanation of the database design for this feature :

Db---Fc-Comment

1. Database Selection

Facebook likely uses a distributed database system capable of handling high write and read loads efficiently. Options may include NoSQL databases like Cassandra or document databases like MongoDB for scalability. Relational databases like MySQL or PostgreSQL could also be used for structured data storage.

2. Indexing

efficient indexing is crucial for rapid comment retrieval. The primary key should typically be based on the Comment ID, ensuring uniqueness and fast access. Secondary indexes on fields like User ID and Post ID support efficient querying.

3. Sharding

To handle the immense volume of comments, Facebook likely employs sharding techniques. Comments can be divided into shards, each managed by a separate database server. Sharding keys may include User ID or Post ID, ensuring even distribution of data across shards.

4. Caching

Caching frequently accessed comments can reduce database load and improve response times. Facebook may use in-memory caching systems like Memcached or Redis. Cache expiration policies ensure that comments are up-to-date.

5. Database Schema

Comments_table:

comment_id – to identify the comment

user_id (Foreign key) – publisher of the comment

post_id (Foreign key) – ID of the associated post

content – content of the comment

created_at – timestamp of creation

Post_table:

post_id – identifier of the video

title – title of the post

user_id (Foreign key) – post creater

created_at – timestamp of creation

Users_table:

user_id – identifier of the user

name – name of the user

email – email of the user

profile_image – the profile image of user

last_login – last login timestamp of the user

High Level Design

HLD---Fc

In the High-Level Design (HLD) of Facebook’s live update of comments on posts, the architecture revolves around real-time data streaming, enabling users to engage in dynamic conversations seamlessly. This design encompasses elements such as WebSocket communication, scalable server infrastructure, and efficient data distribution to ensure users receive live comment updates with minimal delay.

Facebook’s architecture likely combines elements from both strategies to provide the best possible user experience. This involves optimizing the balance between write and read latency while ensuring the global availability and consistency of live comments on posts. There are some approaches to design facebook’s live update of comments as below:

Approaches to implement Facebook’s live update of comments on posts

1. Write Globally and Read Locally

In this scenario, “write globally” involves storing comments on a global scale, often using multi-master database replication to maintain consistency across data centers worldwide. “Read locally” means that when a user views a post and its associated comments, the comments are fetched from a nearby server to reduce read latency. This approach prioritizes low-latency read access for users but may introduce some write latency due to the need to replicate data globally and maintain data consistency.

Write-Globally-And-Read-Locally

2. Write Locally and Read Globally

“Write locally” means that when a user posts a comment, it is initially stored on a server that is physically close to the user. This minimizes the time it takes for the comment to be saved and ensures a smooth user experience.

“Read globally” means that when other users access the post to view comments, the system retrieves those comments from a centralized or globally distributed database. This centralization allows for efficient access to comments by users worldwide.

Facebook employs a multi-region data storage strategy, where comments are initially written to a nearby data center but eventually replicated to other regions for durability and global access. This approach balances low write latency with fast global read access.

This approach divided in two parts as below-

  • Pull-based model
  • Push-based model

Pull-based model

The data is always fetched by querying the servers in data centers around the world while the data is always written to the database server in the local data center. The pull-based concept of writing locally and reading widely is the name given to this method. If there have been more recent comments published since the last query run, it can be determined using the timestamp on the live comments.

The local data center’s server will compile all of the published live comments and send the client the result. Because the data is not replicated globally, the bandwidth utilization is substantially smaller. Due to the server’s need to query all of the world’s data centers, the pull-based architecture will, however, have reduced latency.

Push-based model

Data is always written to the local data center’s distributed database server. The write action is broadcast by the server to numerous data centers around the world when a client writes a live comment on a Facebook post. This method is referred to as the push-based model of local writing and global reading. Writing locally and reading worldwide using the push-based methodology dramatically decreases the use of expensive bandwidth and reduces latency, enabling real-time live comments.

Writing locally and reading worldwide using the push-based methodology dramatically decreases the use of expensive bandwidth and reduces latency, enabling real-time live comments on post. So we use Write locally and read globally through broadcasting (push-based model) for implementing the Facebook’s live update of comments on posts.

How to scale Facebook’s live update of comments System?

To scale Facebook’s live update of comments on posts system effectively, it’s essential to focus on optimizing operational complexity, ensuring data durability, enhancing fault tolerance and high availability, managing concurrency, reducing latency, and planning for scalable growth. By carefully addressing these factors, the system can provide a seamless and responsive user experience while accommodating a growing user base and increasing data volumes.

1. Load balancer:

In order to scale the system and provide fault tolerance, load balancers should be added between the various system layers. For scalability, the services ought to make use of all of the servers’ resources. To meet elastic demand and handle potential traffic surges, autoscaling can be enabled. The trade-off for horizontal scaling is a complex architecture, higher infrastructure expenditures, and ongoing maintenance expenses.

2. Latency:

The live commenting service is set up across a number of data centers to keep users close to the servers and reduce latency. The live comment microservice system’s end-to-end latency can be measured using Apache Samza. The following factors contribute to the live comment service’s relatively low latency:

  • There is just one in-memory lookup from the subscription store
  • one key value lookup from the (disk-based) endpoint store.
  • There are very few network hops.

Additionally, since frequent disk I/O may create a bottleneck, Redis can be set up using the cache-aside method to improve the speed of endpoint store lookups. For cache invalidation, the Redis cache’s TTL must be set to a period of time that is sufficiently brief.

3. Concurrency:

The number of concurrent connections supported by the gateway servers can be increased by server tuning. The following actions could be taken to enhance the functionality of the gateway servers:

  • Decrease the stack size per thread to raise the maximum number of threads
  • Reduce the heap’s RAM allocation to raise the thread count cap.
  • Raise the load balancer’s and server’s combined maximum number of open connections
  • The per-process file descriptor limit should be increased
  • Change the kernel setting to expand the number of TCP connections the server will accept

4. Dispatcher:

A modern server’s dispatcher can deal with up to 5000 requests per second. A contemporary gateway server that uses the actor paradigm may handle up to 100,000 simultaneous SSe client connections.

The dispatcher and gateway servers work together to multiply data, which greatly increases the scalability of the live commenting service on facebook’s post. The actor model-based gateway server is another multiplicative element. For greater throughput, the actor model enables the reuse of a pool of threads. By deploying new servers, the live comment on facebook’s post service can be horizontally scaled.

5. High Availability:

The traffic pattern for live comments on exceptionally popular Facebook posts typically starts out steeply before dropping off completely.The following actions can be taken to address the issue with the thundering herd caused by live comments on the Facebook post:

Clients can use request coalescing to fetch older comments in order to predict the load before it reaches the service limits. The load balancer directs clients to gateway servers with available capacity. It also implements backpressure and exponential backoff on services.

Microservices Used

1. Real-Time Update Service:

This microservice handles the real-time communication between clients (users) and the server. It manages WebSocket connections and ensures that live comments are pushed to connected clients in real time. It uses WebSocket libraries and protocols like WebSockets or technologies like Server-Sent events (SSe) to establish and manage real-time connections. This service also handles events like comment submission, reactions, and replies.

2. Comment Storage Service:

This microservice is responsible for storing and retrieving comments, post metadata, and associated user information. It ensures efficient data storage and retrieval for live comments. It employs a distributed database system optimized for read and write operations. Data may be partitioned or sharded based on factors like post ID or geographic location to distribute the load evenly.

3. User Authentication Service:

To maintain the security and privacy of the platform, this microservice verifies user identities and permissions. It ensures that only authorized users can participate in live comment discussions. It uses authentication protocols like OAuth or OpenID Connect to authenticate users. Access controls are enforced to manage user permissions, including posting comments and reacting to posts.

4. Notification Service:

This microservice handles notifications related to live comments and post interactions. It pushes notifications to users when they receive comments, likes, or replies. It uses messaging queues and notification delivery mechanisms to ensure timely and reliable notifications. It can also support mobile push notifications and email alerts.

5. Global Distribution Service:

To reduce latency for global users, this microservice ensures that data is distributed to and served from data centers in multiple geographic regions. It employs geo-routing and content delivery networks (CDNs) to direct users to the nearest data center. Data is replicated across data centers, and intelligent routing ensures users access data locally.

These microservices work together to create a robust and responsive live comment update system on Facebook. each microservice has a specific role and function, contributing to the overall user experience, scalability, security, and reliability of the platform during live events and discussions.

API Used

1. WebSocket API:

WebSocket is a communication protocol that enables bidirectional, real-time data exchange between clients (users) and the server. It is a fundamental API for implementing real-time updates, including live comments. The WebSocket API allows clients to establish persistent connections to the server, facilitating instant message delivery. It enables the server to push live comments and updates to connected clients without the need for continuous polling.

2. HTTP ReST API:

ReST (Representational State Transfer) APIs are used for standard HTTP requests and responses. While not real-time, they are essential for various functionalities, such as fetching posts, user profiles, and post metadata. The ReST API provides endpoints for clients to retrieve and submit data. For live comments, it supports endpoints to retrieve comment history, post details, and user information. Clients can make HTTP GeT and POST requests to interact with these endpoints.

3. Authentication and Authorization API:

Authentication APIs handle user login and authorization processes. They ensure that only authenticated users can participate in live comment discussions. OAuth 2.0 or similar authentication protocols are commonly used. These APIs verify user identities, issue access tokens, and manage user permissions. They integrate with the user authentication service to grant or deny access to live comment features.

4. Notification API:

The Notification API facilitates the delivery of real-time notifications related to live comments, such as new comments, likes, replies, and mentions. It enables clients to subscribe to specific notification channels or topics, ensuring that users receive timely updates. It uses WebSocket or push notification mechanisms to push notifications to viewer.

5. Analytics and Insights API:

The Analytics and Insights API collects data related to user interactions with live comments and posts. It provides data for generating insights and statistics. This API offers endpoints for querying analytics data, such as comment engagement metrics, user activity, and post performance. It integrates with data analytics platforms and tools for data processing and reporting.

These APIs collectively enable the functionality and performance of Facebook’s live comment update system. They play crucial roles in real-time communication, data retrieval, security, content moderation, scalability, analytics, and global distribution, ensuring a seamless and responsive user experience during live events and discussions on the platform.

Components details

1. WebSocket Server:

The WebSocket server manages WebSocket connections from clients and enables bidirectional, real-time communication. It’s the backbone of live comment updates. WebSocket servers use libraries and protocols like WebSocket or Server-Sent events (SSe) to establish and maintain persistent connections with clients. They handle incoming WebSocket requests and manage message broadcasting to connected clients.

2. Database Servers:

Database servers store and retrieve comment data, post metadata, and user information. They ensure data integrity and efficient data access. Database servers use distributed database systems like MySQL, PostgreSQL, or NoSQL databases such as MongoDB. Data may be partitioned or sharded to distribute the load. Replication mechanisms ensure data durability and availability.

3. Authentication and Authorization Server:

This server manages user authentication and authorization processes. It verifies user identities and grants access permissions to live comment features. It uses authentication protocols like OAuth 2.0, OpenID Connect, or custom authentication mechanisms. It integrates with user databases and generates access tokens for authorized users.

4. Content Moderation engine:

The content moderation engine analyzes live comments in real time to identify and flag inappropriate or spammy content. It integrates with machine learning models, NLP algorithms, and predefined moderation rules. It scans comment text, images, and user behavior to detect problematic content. Detected content may be flagged for manual review or blocked.

5. Scalability and Load Balancing Controller:

This component manages the scaling of microservices and load balancing to distribute traffic evenly across servers. It uses load balancers, auto-scaling mechanisms, and orchestration tools like Kubernetes or Docker Swarm. It monitors server resource utilization and adjusts the number of microservice instances based on real-time demand.

6. Analytics and Insights engine:

The analytics and insights engine collects, processes, and analyzes data related to user interactions with live comments and posts. It generates insights and statistics. It integrates with data analytics platforms, databases, and data processing pipelines. Machine learning algorithms may be employed to derive actionable insights from the data.

7. Global Distribution Controller:

The global distribution controller manages geographic routing and ensures data is served from the nearest data center to minimize latency for users. It uses geographic data, load balancing algorithms, and content delivery networks (CDNs) to direct users to the optimal data center. Data replication and synchronization across data centers are coordinated by this component.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads