Open In App

Design Dropbox – A System Design Interview Question

System Design Dropbox, You might have used this file hosting service multiple times to upload and share files or images. System Design Dropbox is a quite common question in the system design round. In this article, we will discuss how to design a website like Dropbox.



1. Requirements Gathering for Dropbox System Design

Functional Requirements:

Non Functional Requirements:

2. Capacity Estimation for Dropbox System Design

Storage Estimations:

Assumptions:



The total number of users = 500 million.
Total number of daily active users = 100 million
The average number of files stored by each user = 200
The average size of each file = 100 KB
Total number of active connections per minute = 1 million

Storage Estimations:

Total number of files = 500 million * 200 = 100 billion
Total storage required = 100 billion * 100 KB = 10 PB

3. High-Level Design(HLD) of Dropbox System Design

3.1. User Uploading:

Users interact with the client application or web interface to initiate file uploads. The client application communicates with the Upload Service on the server side. Large files may be broken into smaller chunks for efficient transfer.

3.2. Upload Service:

Receives file upload requests from clients. Generates Presigned URLs for S3 to allow clients to upload directly. Coordinates the upload process, ensuring data integrity and completeness. After successful upload, it updates the Metadata Database with file details. Coordinates the upload process, breaking down large files into manageable chunks if necessary.

3.3. Getting Presigned URL:

The client application requests a Presigned URL from the Upload Service. The server generates the Presigned URL by interacting with the S3 service, creating a unique token for the specific upload operation. These URLs grant temporary, secure access to upload a specific file to a designated S3 bucket. Allows clients to bypass the server for direct communication with the storage layer.

3.4. S3 Bucket:

S3 serves as the scalable and durable storage backend. Presigned URLs allow clients to upload directly to S3, minimizing server involvement in the actual file transfer. The bucket structure may organize files based on user accounts and metadata.

3.5. Metadata Database:

Stores metadata associated with each file, including details like name, size, owner, access permissions, and timestamps. Enables quick retrieval of file details without accessing S3. Ensures that file metadata is consistent with the actual content in S3.

3.6. Uploading to S3 using Presigned URL and Metadata:

The client uses the Presigned URL to upload the file directly to the designated S3 bucket. Metadata associated with the file, such as file name and owner, is included in the upload process. This ensures that the file’s metadata is synchronized with its corresponding data in S3.

3.7. Role of Task Runner:

After the file is successfully uploaded to S3, a task runner process is triggered. The task runner communicates with the Metadata Database to update or perform additional tasks related to the uploaded file. This may include updating file status, triggering indexing for search functionality, or sending notifications.

3.8. Downloading Services:

Clients initiate file download requests through the client application. The Download Service queries the Metadata Database for file details. The server’s Download Service retrieves metadata from the Metadata Database. Metadata includes information such as file name, size, owner, and access permissions.

4. Low-Level Design(LLD) of Dropbox System Design

A lot of people assume designing a Dropbox is that all they just need to do is to use some cloud services, upload the file, and download the file whenever they want but that’s not how it works. The core problem is “Where and how to save the files? “. Suppose you want to share a file that can be of any size (small or big) and you upload it into the cloud.

Everything is fine till here but later if you have to make an update in your file then it’s not a good idea to edit the file and upload the whole file again and again into the cloud. The reason is:

Let’s discuss how we can solve this problem:

We can break the files into multiple chunks to overcome the problem we discussed above. There is no need to upload/download the whole single file after making any changes in the file.

Now let’s talk about the various components for the complete low level design solution of the Dropbox.

Let’s assume we have a client installed on our computer (an app installed on your computer) and this client has 4 basic components. These basic components are Watcher, Chunker, Indexer, and Internal DB. We have considered only one client but there can be multiple clients belonging to the same user with the same basic components.

4.1. Client Components

4.2. Metadata Database

The metadata database maintains the indexes of the various chunks. The information contains files/chunks names, and their different versions along with the information of users and workspace.

Lets understand how we can efficientlt do relational database scaling

4.2.1 Relational Database Scaling:

Relational databases like MySQL may face scalability challenges as the data and traffic grow.

4.2.2 Database Sharding:

Database sharding is a horizontal partitioning technique where a large database is divided into smaller, more manageable parts called shards.

4.2.3 Challenges with Database Sharding:

Managing multiple shards can become complex, especially when updates or new information needs to be added. Coordinating transactions across shards can be challenging. Maintenance, backup, and recovery operations become more intricate.

4.2.4 Edge Wrapper:

An edge wrapper is an abstraction layer that sits between the application and the sharded databases.

4.2.5 Object-Relational Mapping (ORM):

ORM is a programming technique that allows data to be seamlessly converted between the relational database format and the application’s object-oriented format.

4.2.6 Edge Wrapper and ORM:

The edge wrapper integrates ORM functionality to provide a convenient interface for the application to interact with sharded databases.

4.3. Message Queuing Service

The messaging service queue will be responsible for the asynchronous communication between the clients and the synchronization service. 

Below are the main requirements of the Message Queuing Service.

There will be two types of messaging queues in the service.

4.4. Synchronization Service

The client communicates with the synchronization services either to receive the latest update from the cloud storage or to send the latest request/updates to the Cloud Storage.

4.5. Cloud Storage

You can use any cloud storage service like Amazon S3 to store the chunks of the files uploaded by the user. The client communicates with the cloud storage for any action performed in the files/folders using the API provided by the cloud provider.

5. Database Design for Dropbox System Design

To understand Database design one should understand

We need the following tables to store our data:

5.1 Users




{
  user_id(PK)
  name
  email
  password
  last_login_at
  created_at
  updated_at
}

5.2 Devices




{
  device_id(PK)
  user_id(FK)
  created_at
  updated_at
}

5.3 Objects




{
    object_id(PK)
    device_id(PK,FK)
    object_type
    parent_object_id
    name
    created_at
    updated_at
}

5.4 Chunks




{
    chunks_id(PK)
    object_id(PK,FK)
    url
    created_at
    updated_at
}

5.5 AccessControlList




{
    user_id(PK,FK1)
    object_id(PK,FK2)
    created_at
    update_at
}

6. API Design for Dropbox System Design

6.1 Download Chunk

This API would be used to download the chunk of a file.




GET /api/v1/chunks/:chunk_id
X-API-Key: api_key
Authorization: auth_token




200 OK
Content-Disposition: attachment; filename="<chunk_id>"
Content-Length: 4096000

The response will contain Content-Disposition header as attachment which will instruct the client to download the chunk. Note that Content-Length is set as 4096000 as each chunk is of 4 MB.

6.2 Upload Chunk

This API would be used to upload the chunk of a file.




POST /api/v1/chunks/:chunk_id
X-API-Key: api_key
Authorization: auth_token
Content-Type: application/octet-stream
/path/to/chunk




200 OK

6.3 Get Objects

This API would be used by clients to query Meta Service for new files/folders when they come online. Client will pass the maximum object id present locally and the unique device id.




GET /api/v1/objects?local_object_id=<Max object_id present locally>&device_id=<Unique Device Id>
X-API-Key: api_key
Authorization: auth_token




200 OK
{
  new_objects: [
    {
      object_id:
      object_type:
      name:
      chunk_ids: [
        chunk1,
        chunk2,
        chunk3
      ]
    }
  ]
}

Meta Service will check the database and return an array of objects containing name of object, object id, object type and an array of chunk_ids. Client calls the Download Chunk API with these chunk_ids to download the chunks and reconstruct the file.

7. Scalabilty for Dropbox System Design

8. Conclusion

In conclusion, the design of the Dropbox system incorporates a well-thought-out architecture that seamlessly handles user file uploads, downloads, metadata management, and storage using a set of key components.


Article Tags :