Open In App

Design a webpage that can show the status of 10M+ users including: name, photo, badge and points | System Design

Last Updated : 04 Jan, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

We’ve got this huge community—over 10 million strong—and you want to build a webpage where everyone’s details, like their names, photos, those cool badges they’ve earned, and their points, can all be seen. That’s a massive load of information to handle. Achieving this goal necessitates an efficient and scalable system architecture capable of handling immense data volumes without compromising on performance or user experience.

Design-a-webpage-that-can-show-the-status-of-10M-users-banner

1. Requirements of the Webpage System Design

1.1. Functional Requirements of the Webpage System Design

  • Users should be able to log in securely to access the data.
  • The system should display user names, profile pictures, earned badges, and points of 10 million users.
  • Users should be able to search for specific profiles by name or filter based on badges, points, or other criteria.
  • Enable CRUD (Create, Read, Update, Delete) operations for user profiles with appropriate access controls.
  • The system should efficiently handle a user base exceeding 10 million, ensuring quick access to user profiles without performance degradation.

1.2. Non Functional Requirements of the Webpage System Design

  • The web page should load quickly, displaying user profiles within milliseconds of a request.
  • Response times for search queries or profile retrieval should be optimized, even with concurrent user access.
  • Ensure robust measures for user data security, including encryption of sensitive information and secure authentication protocols.
  • The system should maintain high availability, minimizing downtime and ensuring users can access their profiles consistently.
  • The system should scale horizontally to accommodate increasing user loads without compromising performance.
  • An intuitive and user-friendly interface allowing easy navigation and information retrieval.
  • Implement redundant systems or failover mechanisms to prevent data loss or service interruptions.

2. Capacity Estimation of the Webpage System Design

Here are some capacity assumptions we can make for this system:

2.1. Traffic Estimates

Active Users = 10 million
Daily Active Users (DAU) = 5 million
Profile requests per user per day = 10
Total requests per day = 50 million

Avg QPS = 50 million / (24 hrs x 3600 sec/hr) = ~580 requests/sec
Peak QPS = 2 x 580 requests/sec = 1160 requests/sec

2.2. Storage Estimates

200 bytes for name = 200MB
500 bytes for photo = 500MB
10 bytes for badge ID = 10MB
4 bytes for points = 4MB
Total storage per user = 714 bytes

For 10 million users: Storage needed = 7.14 TB

2.3. Bandwidth Estimates

Profile data fetched per request

200 bytes (name)
500 bytes (photo)
10 bytes (badge)
4 bytes (points)
Total = 714 bytes
Daily active users = 5 million
Profile requests per user per day =10
Requests per day = 50 million

Daily Bandwidth: 50 million x 714 bytes = ~35 GB/day

3. High Level Design of the Webpage System Design

At a high level, the system manages two primary functionalities:

Design-a-webpage-hld

3.1. Updating User Status

Once authenticated, users have the capability to modify their profile information, including their name and profile image. Additionally, any changes in points or badges earned should be reflected and updated accordingly.

3.2. Reading User Status

The system design encompasses the retrieval of comprehensive user details, including names, profile points, and badge information, ensuring access to this data for the vast community of over 10 million users.

3.3. System Components of the Webpage System Design

  • Load Balancer:
    • The load balancer acts as the pivotal gateway, efficiently distributing incoming requests among multiple web servers.
    • Employing sophisticated algorithms, it ensures an equitable allocation of traffic, thereby optimizing system performance by preventing overload on individual servers.
  • Web Servers:
    • It functions as the primary interface between users and the system, web servers handle HTTP requests and dynamically generate user interface pages.
    • It collaborate with application servers to retrieve necessary data, presenting it in a user-friendly format.
  • Authentication Services:
    • Authentication services authenticate user credentials during login attempts, generate tokens or session identifiers upon successful validation, and manage the security context throughout the user’s session.
    • Handles user authentication processes to ensure secure access to the system.
  • Application Servers:
    • Application servers play a crucial role in system design by providing a runtime environment for applications, facilitating communication between the application and various components, and managing application-related tasks.
    • Implements business logic, interacts with database servers, and manages caching layers.
  • Write Services:
    • It serves as the core component managing all write operations pertaining to user profile information, encompassing functionalities such as creating, updating, or deleting data.
    • Its primary aim revolves around segregating write functionalities from read services, ensuring independent scalability for different facets of the system.
    • These functions facilitate user detail updates, encompassing modifications to user names, profiles, points, and badges.
  • Read Services:
    • Responsible for retrieving and presenting details for all 10 million users, showcasing this information in a comprehensive dashboard.
  • Database:
    • Stores user profile data and utilizes sharding for scalability.
    • It stores user profile details like names, points, and badges and relies on MySQL due to its relational structure.
  • Cache:
    • Stores frequently accessed data in memory, reducing the database load.
    • This approach significantly reduces the need to constantly fetch data from disk storage, streamlining the user experience.
    • To make the most of our memory resources, we’ll implement cache compression, allowing us to store more data efficiently.
    • When the server requests this information, it’ll be instantly available to users, ensuring quick access.`

4. Database Design of the Webpage System Design

4.1. Users Table:

  • UserID (Primary Key): Unique identifier for each user.
  • UserName: Field to store user names.
  • PhotoURL: URL to user profile photos.
  • Points: Field to store user points.
  • BadgeID: Foreign key to the Badges table.

{
-UserID
-UserName
-PhotoURL
-UserPoints
-UserBadgeID
}

4.2. Badges Table:

  • BadgeID (Primary Key): Unique identifier for each badge.
  • BadgeName: Name of the Badge
  • BadgeDescription: Description of the badge.

{
-BadgeID
-BadgeName
-BadgeDescription
}

5. How to show the status of 10 million users

To efficiently handle the display of 10 million user records, we’re organizing the data into manageable batches and showing only what fits on the screen at a time.

  • This approach involves storing the current, previous, and upcoming batches for accessibility purposes.
  • Our system will load three batches, anticipating user scrolling in both directions.
  • This setup ensures smooth transitions as it updates the current batch and keeps nearby batches ready in the cache.
  • Whenever users scroll, the system seamlessly introduces new batches on the screen while updating the cache to reflect these changes.

6. Communicating with the servers in Webpage System Design

6.1. User Authentication

Description: Allows users to securely log in to access the system.
Endpoint: `/auth/login`
Method: POST

Request:

POST /auth/login

{
“username”: “user123”,
“password”: “securepassword123”
}

Response:

{
“accessToken”: “eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9…”,
“expiresIn”: 3600
}

6.2. Retrieve User Profile

Description: Retrieves detailed information about a specific user.
Endpoint: `/users/{userID}`
Method: GET

Request:

GET /users/123456

Response:

{
“userID”: “123456”,
“name”: “Salik Alim”,
“photoURL”: “https://example.com/profile.jpg”,
“points”: 500,
“badge”: “Gold”
}

6.3. Search Users

Description: Allows searching for users based on specified criteria.
Endpoint: `/users/search`
Method: GET

Request:

GET /users/search?query=Salik&filters={“badge”:”gold”}&page=1&limit=10

Response:

{

“results”: [
{
“userID”: “123456”,
“name”: “Salik Alim”,
“photoURL”: “https://example.com/profile.jpg”,
“points”: 500,
“badge”: “Gold”
// …more user data
}
],
“totalResults”: 25
}

“`

6.4. Update User Profile

Description: Allows users to update their profile information.
Endpoint: `/users/{userID}`
Method: PUT

Request

PUT /users/123456

{
“name”: “Khabib Nurmagomedov”,
“photoURL”: “https://example.com/newprofile.jpg”,
“points”: 600,
“badge”: “Diamond”
}

6.5. Request for Next Batch

Description: Requests for next batch of data to be displayed on the screen.
Endpoint: `/users/nextBatch`
Method: GET

Request

GET /users/nextBatch?lastUserID=100&pageSize=50

Response

{
“users”: [
{
“userID”: 101
“name”: “User101”,
“photoURL”: “https://example.com/user101.jpg”,
“points”: 250,
“badge”: “Silver”
},
{
“userID”: 102,
“name”: “User102”,
“photoURL”: “https://example.com/user102.jpg”,
“points”: 180,
“badge”: “Gold”
},

// … more user data
],

“totalUsers”: 1000
}

7. Database Architecture of the Webpage System Design

Database

The database architecture is based on MySQL, a reliable and robust relational database management system.

Sharding

To efficiently manage the substantial data load and facilitate scalability, the database employs a sharding technique using the userID hash.

  • Sharding divides the database into multiple shards, each functioning as a master node responsible for handling writes for specific user subsets.
  • This division allows for a more balanced distribution of write operations across different shards, optimizing the system’s capacity to manage a vast user base.
  • Additionally, read replicas, associated with each master shard, enhance read performance by efficiently handling read traffic.

Master-Slave Architecture

Within this architecture, the master-slave replication mechanism plays a pivotal role. Data changes originating from the master node are asynchronously replicated to corresponding slave databases. These changes are captured as events on the master and then transmitted to the slave databases, ensuring that they maintain up-to-date copies of the data.

  • The primary role of the slave databases is to service read traffic, leveraging the replicated data from the master for efficient and responsive query processing.
  • However, it’s essential to note that all write operations must initially occur on the master node to maintain the integrity and consistency of the data within the master-slave architecture.
  • In terms of connectivity, the application servers directly interact with the master shards for executing write operations. This direct connection ensures immediate data consistency and reliability.
  • Moreover, queries from application servers are load balanced across the read replica slaves, optimizing the distribution of read traffic and enhancing overall system responsiveness.
  • Scaling the data store is efficiently managed within this architecture. By adding more read replica slaves, the system horizontally scales for read operations, allowing it to efficiently handle increased read traffic.

Simultaneously, sharding partitions and scales write operations across multiple master nodes, distributing the write workload evenly and facilitating efficient management of expanding user data.

Overall, this database architecture, employing sharding, master-slave replication, and strategic connectivity and scaling mechanisms, provides a resilient, scalable, and responsive framework to effectively manage a large and dynamic user base while maintaining data integrity, availability, and performance.

8. Low Level Design of the Webpage System Design

Design-a-webpage

8.1. Client

The user will use desktop, mobile or web platform to either update their data or to check the status of other users.

8.2. Load Balancer

The load balancer acts as the pivotal gateway, efficiently distributing incoming requests among multiple web servers. Employing sophisticated algorithms, it ensures an equitable allocation of traffic, thereby optimizing system performance by preventing overload on individual servers.

8.3. Web Servers

It functions as the primary interface between users and the system, web servers handle HTTP requests and dynamically generate user interface pages. It collaborate with application servers to retrieve necessary data, presenting it in a user-friendly format.

8.4. Authentication Services:

Authentication services authenticate user credentials during login attempts, generate tokens or session identifiers upon successful validation, and manage the security context throughout the user’s session.

8.5. Write services

It serves as the core component managing all write operations pertaining to user profile information, encompassing functionalities such as creating, updating, or deleting data. Its primary aim revolves around segregating write functionalities from read services, ensuring independent scalability for different facets of the system.

Functioning as a pivotal component, it comprises several key elements:

  • HTTP Request Handling:
    • This microservice adeptly manages various HTTP request types, including POST, PATCH, and DELETE.
    • For instance, a POST request will create a new user profile, while a PATCH request will update existing profile information.
  • Input Validation and Sanitization:
    • Before processing incoming requests, this microservice diligently validates and sanitizes input parameters.
    • For instance, it ensures that user-provided data meets specified criteria and doesn’t contain malicious content that could compromise system integrity.
  • Database Interaction with MySQL Shard:
    • Interfacing directly with the MySQL shard, it executes commands to modify data within the database.
    • For instance, a PUT request may update a user’s profile details, while a DELETE request may remove obsolete profile information.

8.6. Database

It stores user profile details like names, points, and badges and relies on MySQL due to its relational structure and adherence to maintaining high data consistency following ACID properties.

  • Ensuring uniform data distribution, the UserID-based sharding splits writes across various database servers.
  • This approach guarantees synchronized replicas in a master-slave replication setup for each shard, empowering parallel data reading by read-only slave copies.
  • Additional slaves are introduced to enhance read efficiency, while indexing strategies optimize query speed and reduce system response times.

8.6.1 Master-Slave Setup

  • Master Database:
    • This primary database will handle all write operations (inserts, updates, deletes).
    • It will serve as the authoritative source for data modifications.
  • Slave Databases:
    • Multiple slave databases replicate data from the master.
    • They will handle read operations, distributing the load from read-heavy task.
    • Replication will occur asynchronously, ensuring data consistency across the system.

8.6.2 Sharding Implementation

  • Sharding involves dividing the database into smaller, more manageable subsets known as shards.
  • We will be sharding our data horizontally where each shard will contain a distinct portion of the dataset.
  • A shard key determines how data is distributed across shards. We will be using UserID as the shard key.

8.7. How the Database will work in Webpage System Design?

8.7.1 Write Operations (Master)

  • All write operations, such as inserting new user data or updating existing records, are directed to the master database.
  • The master database processes these write requests and ensures data integrity and consistency across the system.
  • After a successful write, the changes are replicated asynchronously to the slave databases to maintain data redundancy.

8.7.2 Read Operations (Slaves)

  • Read-intensive operations, like retrieving user profiles or non-critical data, are directed to the slave databases.
  • The sharding mechanism intelligently routes read queries based on the shard key to the relevant shard.
  • Each shard manages its subset of data, allowing parallel processing of read queries across multiple shards, thereby improving read performance significantly.

8.8. Caching Servers

We’re utilizing Redis cache to hold batches of points, badge and name of users, specifically intended for displaying information on the screens.

  • This approach significantly reduces the need to constantly fetch data from disk storage, streamlining the user experience.
  • To make the most of our memory resources, we’ll implement cache compression, allowing us to store more data efficiently.
  • When the server requests this information, it’ll be instantly available to users, ensuring quick access.
  • Simultaneously, we’ll keep the cache updated by loading another batch of data.
  • This way, users will consistently have swift access to the required information.
  • Additionally, Redis acts as a repository for frequently searched data. By storing commonly accessed information, it helps optimize the system’s performance and response times.
  • 8.9. Search Servers
  • We will use Elasticsearch for advanced searches, allowing users to find data using various criteria. First, the special indexers will pull out the necessary information from our MySQL databases.
  • Then, Logstash(It is most often used as a data pipeline for Elasticsearch, an open-source analytics and search engine) will tidy up and transform this data, making sure it’s ready for action.
  • Once formatted into neat JSON documents, this data finds its home in Elasticsearch, where smart indices are created to speed up search tasks.
  • When users seek information, the frontend connects directly with Elasticsearch to begin the search.
  • Elasticsearch then undertakes the task of matching user IDs with stored data, retrieving precisely the requested information.

8.10. Image Servers(CDN)

The Image Servers function as a cloud storage system housing all the images.

  • We can use Amazon S3 for this purpose. Additionally, a CDN, such as CloudFront, is employed to swiftly deliver these images to users.
  • Whenever updated images are available, the Write Services are responsible for pushing them into the cloud storage. Moreover, it is also responsible for caching frequently accessed images to ensure quick retrieval.

9. Work Flow of Webpage System Design

There are three workflows that we have to manage:

9.1. Changing the status of the users

  • To update record statuses, users first authenticate through our authentication services.
  • The data change request is then routed to the write services.
  • The request is then pushed to the write queue, which accesses the master database to make the necessary modifications.
  • These updates are subsequently mirrored in the slave database for consistency.
  • The photos are then pulled from the write queue and stored in the CDN for quick delivery to the users.

9.2. Accessing the status of 10 million users

  • For accessing the vast user database of 10 million records, users authenticate via our authentication services.
  • Their request to view user statuses goes through the read services, which fetch a batch of data from the cache for immediate display.
  • As users reach the end of a batch, new requests are sent to the servers for the next batch, ensuring seamless data presentation.
  • Meanwhile, subsequent data batches are stored in the cache for quick retrieval.
  • The photos of all the users are stored in the CDN, which is then accessed according to the batch.

9.3. Searching for any specific user

  • When searching for any record, the search request is directed to our Search Services.
  • It uses elastic search for searching the queries.
  • Initial search attempts occur in the cache, which stores frequently accessed data.
  • If the data is found there, it’s promptly returned to the user.
  • If data is not found in the cache, the database is queried, and the results are provided to the user, with updates made to the cache for future searches.

10. How to make the system Scalable?

The system employs several techniques to enable scalability:

  • Database sharding and master-slave replication:
    • The database uses sharding to partition data across multiple database servers based on the UserID.
    • This distributes writes more evenly. Master-slave replication also creates read replicas that can handle read traffic, preventing overload on the master.
    • Adding more slaves scales reads.
  • Caching:
    • Frequently accessed user name, badge, and point data is cached, reducing trips to the database.
    • Compression allows more data to fit in the cache.
    • As load increases, the cache can be scaled horizontally by adding more cache servers.
  • Load balancing and horizontal scaling:
    • A load balancer evenly distributes requests across multiple web servers.
    • More web and application servers can be added easily to handle more traffic.
  • Search engine:
    • Elasticsearch is used for search to allow efficient querying.
    • It scales well horizontally by distributing data and queries across shards. More shards can be added to scale search capacity.
  • CDN for images:
    • User images are stored on a CDN which is designed to scale dynamically to handle large traffic volumes spread across the globe.
  • Asynchronous processing:
    • Write requests are handled asynchronously for faster responses.
    • The data is eventually consistent across the system.
  • Microservices:
    • Different components like authentication, read/write services are separated into distinct services that can scale independently as per demand.

11. Conclusion

In summary, the system handles immense data volumes and high traffic demands, supported by a robust combination of database sharding, master-slave replication, intelligent caching, load balancing, microservices, and horizontal scaling, ensures responsive user experiences with consistent high availability and low-latency interactions. With additional features like advanced search capabilities and CDN-based image delivery, the system achieves scalability, maintaining peak performance even amidst substantial user growth.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads