Design City Guide System like Foursquare
PURPOSE OF CITY GUIDE SERVICE
City guide will be the service that allows users to search and find place near to user’s location. You can think this service will be similar to Foursquare. Before starting to design system, it will be nice to define the purpose of system. Meaning that, before designing process, requirements should be clear and well-defined. The purpose of city guide service is to presenting locations to users based on their locations. You can find any places such as restaurants, theaters, cinemas and more. Suggestions will start from nearby locations to far locations but system will also support to search places based on dedicated locations.
REQUIREMENTS AND SYSTEM BOUNDARIES
If you want to design a system, you must first define the requirements and system boundaries. Before starting to design, you will have Service Design Document and you will define all the requirements from scratch. Since the purpose of city guide service is to create well-architected city guide service, it must be able to recommend places based on your searching query and location. So our system will support these features;
– Users must be able to create an account.
– Users must be able to login the system.
– Users must be able to logout from the system.
– Users must be able to search places such as a restaurant, theatre, cafe, cinema and more when they login or logout.
– Users must be able to add comments to places if they login.
– Users must be able to like or dislike places.
– Users must be able to add a new place and system admin should review it.
– Users must be able to add pictures to places.
– Users must be able to update or delete places if they have a permission.
– System must be able to suggest places from most relevant to less relevant.
– System must be able to suggest places based on user’s query and location.
– System must be able to suggest places based on popularity and user reviews.
– System must be able to monitor.
– System must be able to recommend new places by sending push notifications.
– System should be highly available.
– System should be highly reliable.
– System should be durable.
– System should be resilient.
– System should be highly cost and performance efficient.
Basically system should support 5 important pillars these are;
– Cost Performance
These are the pillars that we should consider together since they are coupled to each other. In brief, availability means that system should be always available. Reliability means that system should work as expected. Resiliency means that how and when system will recover itself if there is any problem. Durability is the one pillar that each part of system should exists until we delete. Cost performance is also important topic that will basically related to use services under cost efficiency. It can be illustrated like if the system will be built on AWS and it is enough to use t2 micro EC2 instances, there will be no any reason to use larger EC2 instances and pay extra money.
Notice that each system will grow day by day so systems will be more complex in the future (like after 5 years) but the important thing is if you have well-defined requirements and structure, there will be no too much work to add new components to your system.
When system boundaries and functional requirements are defined, it is needed to think about cloud or on-promise options. Cloud services have a lot of advantages like cost efficiency, high speed performance, back-up solutions, maintenance, unlimited storage capacity and more.. Your system can be;
– %100 on-promise (Your own data center/server)
– %100 cloud (AWS, Google Cloud, Azure)
– Mix of on-promise and cloud (You can have both during the migration process)
Capacity estimation will be important in the first place (especially for on-promise services) so you can estimate how many server count you need or how to separate your services and which database you will use? Also notice that, your system will be read-heavy and read traffic will be more than write traffic.
Let’s assume there will be 10 Million places in city guide system in 5 years and daily query counts will reach to 10.000. You can estimate that system will grow each year and you can estimate buffer. This means that you will keep 10 million places informations with their metadata and also pictures. If we assume average size of photo 25 KB and each place have average 5 picture;
The capacity will be approximately equals to 10 Million * 5 * 25 KB + (User Metadata informations) + (Location Metadata informations). It is recommended to have replication and back-up solutions for data to prevent data loss so it will be approximately 3 times bigger than what we estimate.
This calculation is just a brief example of how to define system capacity and we will not calculate daily download/upload capacity and metadata capacities but you should consider this calculation (and daily read/write capacity estimation) for service/database scaling.
City guide service can support both REST or SOAP strategy. You can also think about GRPC this is a new strategy coming from Google and the biggest advantage of GRPC is to use smaller bandwidth and resource. If we continue with REST, there will be four main APIs to design the system.
– AddPlace(api_dev_key, name, description, longitude, latitude, category, pictures): This API returns HTTP response with newly added place’s information. (202 accepted if success)
– DeletePlace(api_dev_key, placeID): It can return HTTP response 200 (OK) or 204 (No Content) based on your response.
– UpdatePlace(api_dev_key, updatePlaceRequest): updatePlaceRequest will have a placeID to know which place we are updating.
– GetPlace(search_query, userlocation, categoryfilter = null, sortby = ‘distance || popularity’, page = 1, distanceRadius = 5, maximum_return_count = 20): Return JSON with locations metadata. Results will be ordered based on user query and location.
DESIGN AND DATABASE SCHEMA
Saving the data will be distributed into 2 parts. The first one will be related to keep static content like location pictures. We can use S3 to keep static contents if we are building our service AWS and in front of S3, we can use Cloudfront as a CDN. Cloudfront will act as a cache and it will help system to return response fast. If you are building your system as an on-promise, then you need to have image storage. S3 is a AWS offer and it will help our system to keep static media contents in a secure way. For metadata (places, pictures, users) we can use SQL and NoSQL. NoSQL is the best way to scale system easily. On the other hand, if we think the relations and constraints of SQL, scaling SQL databases is very difficult.
– Place:ID, AddedBy, Name, Lat, Lng, Category, Description, AddedDate, LikeCount, DisLikeCount
– User:ID, UserName, Password, Email, RegisterDate, LastLoginDate
– Comment:ID, Like, DisLike, Comment, CommentDate
– UserPlaceComment:ID, CommentID, UserID, PlaceID
8 byte will be enough to keep placeID and userID. Additionally, we will keep longitude and latitude with 8 bytes. Data will be indexed based on placeID, lat and lng. We will send a query to place table based on place location and user location. So query will evaluate;
(Places where placeLongitude between userLongitude – Radius and userLongitude + Radius and placeLatitude between userLatitude – Radius and userLatitude + Radius)
This way is not the optimal way since we need to calculate both for latitude and longitude. We can use a grid to solve performance problems. We can divide the whole world into smaller grids. This ensures that we can only deal with the neighbour of grids. Thanks to this method, we just only focus on the user location grid and its neighbour grids. A map is the best choice to use a grid. Key is the gridID. Value is the whole places in this grid. So query will evaluate;
(Places where Latitude between userLatitude – Radius and userLatitude + Radius and GridID in (neighbour grids)
Note that of course we will have more database table and these are just sample. It will be nice to follow normalization rules for database designing process.
Also we can partition the system based on;
Using userID and regionID can create a distribution problem. Some regions may have more locations than others and in that case system will not be uniformly distributed. We can partition based on both locationID and regionID.
SYSTEM DESIGN CONSIDERATION
If we are designing a system, the basic concepts we need are;
– Web server
– Application server
– Media file Storage
– Load balancing
As we mentioned above, the replication process is a valuable process to provide high availability, high reliability, and real-time experience. Replication and back-up are two important concepts to provide pillars we mentioned before. Replication is a very important concept to handle a failure of services or servers. Replication can be applied database servers, web servers, application servers, media storages and etc.. Actually we can replicate all parts of the system. (Some of AWS services like Route53, they are highly available in itself so you do not need to take care of replication of Route53, Load balancer, etc..) Notice that replication also helps system to decrease response time. You imagine, if we divide incoming requests into more resources rather than one resource, the system can easily meet all incoming requests. Additionally, the optimum number of a replica to each resource is 3 or more. You can provide redundancy by keeping data in different Availability zone or different region in AWS. We can keep same data onto three different resources and thanks to this process if one server dies, the system automatically continues to work replicas. One more advantage of replication is the system may continue to run at an update to the system. In replication process, Master server will be responsible for writing and reading operations; and slaves are responsible for reading operations.
Load balancing refers to distributing requests across a group of services and servers. When we have talked about the replication and sharding, an incoming request must be directed and this is done by a load balancer. We can use Round-Robin method to redirect incoming requests but there may be a problem in this case. Round-Robin stops sending requests to dead servers but Round-Robin cannot stop sending requests to a server exposed to heavy traffic. We can prepare intelligent Round-Robin algorithm to take away this problem. Additionally, we can use consistent hashing to redirect incoming requests. Consistent hashing ensures that system becomes more uniformly distributed.
The possible problem for sharding with a load balancer is how we rebuilt the system when this system dies. The possible approach is brute-force. This method is slow because we need to rebuilt whole the system from the beginning. We can eliminate this problem by using reverse indexing approach. We can have another Index server that will hold all information about reverse indexing. Reverse index maps all places. We need to build a map and key is the serverID and value is all places.
For caching strategies, we can use global caching mechanism by using cache servers. We can use Redis or memcache but the most important part of caching strategy is how to provide cache eviction. If we use global cache servers, we will guarantee that each user will see the same data in the cache but there will time latency if we use global cache servers. As a caching strategies, we can use LRU (Least Recently Used) algorithm.
For media files caching, as we mentioned before, we will use CDN. CDN is located on different edge locations so that the response time will be smaller than fetching media contents directly from AWS S3.
We can have various ranking mechanisms which are most popular, most relevant, newest and etc… As you know, we can have many servers and we need to get data from these servers. Because of this reason, we need to aggregate function to combine all these data to obtain the most desirable solution.
BASIC CODING OF SYSTEM
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.