System Scaling – Optimization Trade offs

Last Updated : 19 Sep, 2023

Network Load Balancer (NLB) can handle traffic spikes, handle millions of requests per second however, it doesn’t support logging, so use cloudwatch to keep all logs in centralized place. Things to ponder before horizontal scaling.

Ensure thread safe code – for horizontal scaling
Connection pooling, no direct connection upon instantiation
Management of data

Various horizontal scaling options :

Heroku – configure number of dynos, increase power of individual dynos.
HirePower – for fine grained control, power and flexibility.
Azure
Google cloud
AWS
And many more vendors

Web server concurrency setup :

Puma (For Ruby/Rack)- number of workers via web_concurrency flag, number of threads.
NginX – for high performance, stability, rich feature set, simple configuration, low resource consumption.
Apache – for java web applications

Vertical Scaling :
Scale-up a system is called vertical scaling.
Pros –

Increase the power of each server to achieve desired performance.

Cons –

The expensive, practical limit to how much we can vertically scale on an instance.
Scaling configurations depends on balanced cost, performance, resource usage to acceptable levels.

Use cache for transactional queries :
Before caching we need to get the answer to question Is it OK to provide eventual consistency or strong consistency of data is required. For eventual consistency – NoSQL DB. For strong consistency / critical transactions – RDBMS – example payment transactions.

Cache – Redis /Memcached :
Redis –
Auto replica, snapshot.

Memcached –
Multithreaded architecture.
Type of cache :

Global cache –
Simple, effective only up to a certain scale.
Distributed cache –
API Gateway, Cloudfront – Lambda @ edge.

How to choose correct cache :

Is the system write heavy and reads less frequently – (time based logs).
Is the data written once and read many times – ( User profile).
Is data returned always unique – (Search engine).

Cache aside :

General purpose and best for read heavy workloads. (Memcached, Redis).
System are resilient to cache failures.
Data model in cache can be different than data model in database.(Result of multiple query response stored against – request id).
Common write strategy is – write to DB directly. TTL is used to serve stale data until TTL expires. If data freshness required – we can either invalidate cache entry or use write through cache method.
Load data lazily

Read Through Cache :

Sits in line with DB.
In case of cache miss, cache is populated by DB hit.
Load data lazily.
Best for read heavy workloads -( News story)

Cons –
First time its always a cache miss. Warming/pre-heating cache by queries manually. Data might become inconsistent, strategy for write will be the solution.
Difference b/w Cache aside and read through cache :
Cache aside –

Application is responsible for data fetch from db and populate cache.
Data model can be different.
Read through cache –
Logic is supported by library or stand-alone cache provider.
Data model will be same as of DB.

Write through cache – Cache write policy :
Write around –
Data are written to DB first when first read request comes, then data written to cache.

It Can combine with read-through and with cache aside as well.
Good performance when data written once and read less frequently or never. (Real time logs/ chat room messages)

Write through –
Data first are written to cache, then to DB. Can be combined with read-through.

Extra write latency.
Guarantee consistency – no need to use any cache invalidation technique (DAX – Dynamo DB accelerator).
DAX can be used as write around cache. Applications can write to Dynamodb and read it through DAX. Possible issue – Negative cache entry, when DAX cant find requested item in underlying dynamodb table, DAX returns an empty result to user, instead of error.

Write back –
Write data to cache, cache acknowledges immediately, after some delay it writes the data back to database.

Called write behind as well.
Improves performance and good for write heavy workloads.
When combined with read-through, works good for mixed workload where most recently updated and assessed data is always available in cache.
Resilient to DB failures, can tolerate some DB downtime.
If batching or coalescing is supported, it can reduce overall writes to the DB, decrease the load and reduce cost for DB which charge for number of requests (DynamoDB).
DAX is write through (so Dynamo db cost wont be reduced for write heavy applications)

Note –
If we use Redis for both caches aside and write back to better absorb spikes during peak load – in case of cache failure, data may be permanently lost.
Cache eviction policy :

LRU – Least recently used. Mostly used every where, in search engines as well.
Random replacement, FIFO – not frequently used.

N+1 Queries :
Queries that require other queries to get complete picture of data. Root cause- architectural issue or inattention to data retrieval considerations. Solution – eager loading the related records, records fetched in initial query.
Inefficient code

Avoid doing resource intensive things.
Moving to faster libraries.
Streaming – Use streaming to upload excel or other data intensive load tasks to minimize the memory and CPU footprint.
Moving collection traversal to database, e.g.
- Calculate sum of records in RDBMS instead of doing it in code. Use single aggregate database query.
- Avoid eager loading of entire document, when only few fields will be accessed.

Backgrounding :
Identify and separate tasks that can be delayed for few seconds or can be handled by another system. Use queues, separate out jobs based on type – transactional, user-triggered bulk jobs, critical jobs in separate queue – SQS.

Use read replicas for performing these non- critical tasks like –

Sending Email
Generate reports
Upload configuration / documents

Asset modification :
Ensure that all the assets were gzipped or optimized before loading them. This results in reducing load time significantly. Push compressed and zipped front end assets – webpacks to S3, using deployment script. S3 will serve to gzip by setting the content encoding and the content type. Automate deployment as much as you can – use Cloudformation by AWS, Terraform.

Memory Leaks :
Avoid memory leaks, they can hit swap memory (Exchanging data between real memory and virtual memory(virtual memory address, always enabled in modern CPU) is “swapping”. the swap on disk is swap space). Avoid restart server band-aid solution, try to figure out the actual cause.

Co-location :
Ensure all required microservices were located in one region to get low latency and speed up all queries and operations.

References :
Wikipedia
Redis vs Memcached
For more System Design concepts