Open In App

Design Reddit | System Design

Designing Reddit is not just about sharing posts and comments; it’s about managing a bustling community, ensuring everyone’s voice is heard, and delivering personalized content to keep users engaged. Let’s dive into the system design of Reddit to handle a huge amount of user-generated content while making sure everyone gets their slice of the conversation.



What is Reddit?

Reddit is an American social media platform and online community where registered users can submit content, such as text posts, links, images, and videos. Other users can then vote on and discuss these posts, creating a dynamic and interactive environment. Reddit is a widely used platform that has had a significant impact on online discussions and content sharing.



Requirements for Reddit System Design

Functional Requirements for Reddit System Design:

Non-Functional Requirements for Reddit:

Capacity Estimation for Reddit System Design

To estimate the scale of the system and to get the idea about the storage requirements, we have to make some assumptions about the data.

Traffic Estimation of Reddit System Design

Daily Active Users (DAU): 100,000
Average API requests per user: 100 requests/day
Total Daily API Requests: 100,000 * 100 : 10,000,000 requests/day
Daily new posts: 10,000
Daily new comments: 500,000

Storage Estimation of Reddit System Design

Average post size: 500 KB
Average comments per post: 50
Total Daily Storage: (10,000 * 500 KB) + (500,000 * 500 KB) = 2500 GB/day
Total Monthly Storage: (2500*30) GB = 75 TB
Assuming we store data for 5 years:
Total Storage for 5 Years: 2500 GB/day * (365 * 5) = 4,562,500 GB (approximately 4.56 PB)

Bandwidth Estimation of Reddit System Design

Bandwidth for API Requests

Average request size: 5 KB (considering headers and payload)
Total Daily Bandwidth for API Requests: 10,000,000 * 5 KB = 50 GB/day
Total Bandwidth for 5 Years: (50 GB/day )* (365 * 5 ) = 91.25 TB

Bandwidth for Content Delivery

Average video size: 20 MB
Daily video views: 50,000
Total Daily Bandwidth for Video Streaming: 20 MB * 50,000 = 1 TB/day
Total Bandwidth for 5 Years: 1 TB/day * 365 * 5 = 1.825 PB

These revised estimates provide an overview of the server capacity required in terms of traffic, storage, and bandwidth for a Reddit-like platform while storing data over a span of 5 years.

Uses Case Diagram for Reddit System Design

Below is the explanation of the components of the diagram above:

Getting Started and Sharing Content

Engagement and Interactions

User Involvement and Community Connections

Joining Communities and Messaging

Staying Updated with Notifications

Low Level Design(LLD) in Reddit System Design

The low level components are:

High Level Design(HLD) of Reddit System Design

The design is read intensive as more users will fetch the conte nts than the users who will actually upload the contents. At a high level, our system will need to handle two core flows:

Uploading the contents

Streaming the contents

Client Interaction

Users access the platform via various clients, including web browsers, mobile apps, and desktop applications. These clients communicate with the backend services through APIs to perform actions like posting content, interacting with posts, and accessing user-specific feeds.

Load Balancer

Incoming user requests are distributed across multiple backend servers using a load balancer. This ensures even distribution of traffic and prevents any single server from becoming overwhelmed.

API Servers

API servers receive requests from clients and route them to the appropriate microservices or backend components. They handle authentication, manage user sessions, and direct requests to services like post creation, comment handling, or user profile management.

Post Services

Responsible for creating, editing, and managing posts. Includes functionalities for uploading images, videos, texts and adding comments, voting, and content moderation.

Authentication Services

Manages user accounts, authentication, and profile settings.

Feed Services

Provides personalized feeds based on user preferences and interactions.

CDN (Content Delivery Network)

Stores and delivers static content like images, videos, and other media to users globally, ensuring faster load times and reduced server load.

Microservices Used in Reddit System Design

Load Balancer

It is responsible for distributing incoming traffic efficiently across multiple servers or resources. It acts as a traffic manager, ensuring that no single server gets overwhelmed by handling all user requests, thereby optimizing the platform’s performance, reliability, and responsiveness.

Post Services

The post services manage user requests to upload diverse content types such as images, text, or links. Upon receiving a user’s submission, they forward the content to the moderation services for assessment. Upon receiving positive feedback from moderation, the post services proceed to publish the content.

Subreddit Services

The Subreddit services oversee the creation and administration of subreddits, holding authority over their data. Users interact with these services to subscribe or unsubscribe from subreddits and set varying levels of access. Additionally, they facilitate user notifications regarding subreddit activities, such as new post uploads, by leveraging requests sent to the fanout services.

Fanout Services

Fanout Services primarily handle the distribution of new posts to users’ feeds based on their subscriptions or follows. Two models govern their operation:

Let us explain this service using an example:

Celebrity Problem: The “celebrity problem” arises when a user amasses a significant following, leading to scalability and performance challenges within the platform. Addressing this involves employing a hybrid approach:

Upvote/Downvote Services

When a user submits an upvote or downvote on a post or comment, this service handles the request. It accesses the database to retrieve the current count of upvotes and downvotes associated with the specific post or comment. Based on the user’s action, it modifies these counts accordingly. For better understanding of the working of Upvote/Downvote services, you can refer to this article

Recommendation Services

The Recommendation Services access all user metadata from the database. Using machine learning models, they predict the types of posts users might prefer and then push them to users’ feeds. The model must adhere to specific criteria: fairness—ensuring no post is favored without reason, scalability to handle a large number of posts, and low latency in predicting user interests.

We can update our algorithm through two methods.

Messaging Services

Messaging Services facilitate user connections and message exchanges. The users will be connected through WebSocket. We opt for WebSocket connections due to several advantages:

Notification Services

These services handle the delivery of real-time notifications to users, alerting them about various activities within the platform. They encompass a wide range of notifications, including new post alerts, comments on subscribed threads, direct messages, mentions, or interactions such as likes or shares on their content.

Function of Notification System:

Comment Services

The comment services within the platform facilitate user engagement by allowing users to engage in discussions, provide feedback, and interact with posts. These services handle the creation, editing, and deletion of comments associated with posts. They ensure that comments are linked to the appropriate posts and manage the threading or hierarchical structure of discussions.

Database Design in Reddit Design

In the above diagram, we have discussed about the database design:

Users




{
userID (Primary Key)
username
email
password(Hash)
other user-related fields (e.g., Profile Info, Preferences)
}

Posts




{
postID (Primary Key)
userID (Foreign Key)
title
content (Text, Links, Media)
type (Text, Link, Image, Video)
time_stamp
upvotes
downvotes
other post-related fields
}

Comments




{
commentID (Primary Key)
postID (Foreign Key)
userID (Foreign Key)
parentCommentID (For nested comments)
content
timeStamp
upvotes
downvotes
other comment-related fields
}

Subreddits




{
subredditsID (Primary Key)
name
description
createdAt
other community-related fields
}

User_Subscriptions




{
subscriptionID (Primary Key)
userID (Foreign Key)
communityID (Foreign Key)
createdAt
}

User_Interactions




{
interactionID (Primary Key)
userID (Foreign Key)
targetID (PostID/CommentID)
interactionType (Upvote/Downvote/Comment)
timestamp
other interaction-related fields
}

Which Database we should use for Reddit?

The database serves as the repository for user-generated content, encompassing posts, videos, images, comments, upvotes, and downvotes. This data undergoes replication and sharding across multiple databases to ensure redundancy and reliability.

API used for communicating with the servers in Reddit

RESTful APIs (Representational State Transfer) are an ideal choice for the Reddit system design due to their simplicity, flexibility, and compatibility with various client applications. Reddit, being a large-scale platform, benefits from RESTful APIs’ statelessness, allowing for scalability and reduced server load. These APIs enable straightforward communication between clients and servers, offering a uniform interface for accessing and manipulating resources like posts, comments, and user profiles.

User Registration




Endpoint: 'POST /api/users/register'




{
  "username": "example_user",
  "email": "user@example.com",
  "password": "examplePassword123"
}

User Login




Endpoint: 'POST /api/users/login'




{
  "username": "example_user",
  "password": "examplePassword123"
}

User Profile




Endpoint: 'GET /api/users/{userID}/profile'

Returns user profile information.

Update User Profile




Endpoint: 'PUT /api/users/{userID}/profile’




{
  "bio": "New bio description",
  "preferences": {
    "theme": "dark",
    "notifications": true
  }
}

Create Post




Endpoint: 'POST /api/posts/create'




{
  "title": "Title of the post",
  "content": "Text, link, or media content",
  "type": "text/link/media"
}

Add Comment to Post




Endpoint: 'POST /api/posts/{postID}/comment'




{
  "content": "Comment text"
}

Upvote Post




Endpoint: 'POST /api/posts/{postID}/upvote'

Downvote




Endpoint: 'POST /api/posts/{postID}/downvote'

Subscriptions & Feeds:

Follow Subreddit




Endpoint: 'POST /api/subreddits/follow'




{
  "subreddit": "subreddit_name"
}

User Feed




Endpoint: 'GET /api/users/{userID}/feed'

Retrieves personalized feed based on subscriptions and user interactions.

Further Optimizations in Reddit Design

The system can undergo additional optimization to enhance its performance and scalability.

Conclusion

In conclusion, this system design for our platform caters to a diverse range of user interactions, content sharing, and community engagement. By implementing robust authentication processes, content moderation, and efficient workflows, we ensure a secure and enriching user experience. Incorporating scalable solutions like database sharding, indexing, and caching allows us to manage increasing volumes of data effectively, maintaining performance and responsiveness as our user base grows. The hybrid push-pull model for fanout services mitigates the celebrity problem, ensuring optimal content delivery without overloading the system. Continuous improvement in recommendation algorithms, real-time messaging, and proactive notifications enhances user engagement. With a strong emphasis on data security, compliance, and user experience, our design lays the foundation for a thriving and sustainable social platform.


Article Tags :