Open In App

What is Lambda architecture | System Design

Last Updated : 17 Jan, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

This Architecture is widely used in many big tech companies as it takes advantage of both real-time data processing as well as batch processing i.e. one can query both fresh data by real-time data processing technique and historical data using batch processing data technique.

Lambda-Architecture

What is Lambda Architecture?

Lambda architecture is an excellent architecture for handling massive real-time data and building fault-tolerant, scalable systems.

Lambda ( λ ) architecture is one of 3 big data architecture patterns. Apart from batch and stream processing, Lambda architecture also includes a data serving layer for responding to user queries.

Different Ways to Approach Lambda Architecture

There are two approaches to Lambda Architecture:

  • Hybrid approach:
    • It is designed to harness enormous volumes of rapidly created data, enabling businesses to make use of data more quickly.
  • Specific approach:
    • It attempts to balance latency, throughput, and fault tolerance by using batch processing to provide accurate views by batch data, while simultaneously using real-time stream processing to provide views of online data. The outputs from both batch and speed layers can be merged before the presentation.

Layers in Lambda Architecture

Lambda Architecture has mainly three layers to process big data:

  • Batch Layer (Cold process)
  • Stream Layer (Hot process or Speed Layer)
  • Serving layer

Layer-of-Lambda-Architecture

1. Batch layer

Batch Layer operates on the complete data and thus allows the system to produce the most accurate results. However, the results come at the cost of high latency due to high computation time.

The batch layer stores the raw data as it arrives and computes the batch views for consumption. Naturally, batch processes will occur at some interval and will be long-lived. The scope of data is anywhere from minutes to years.

2. Stream layer

Stream Layer operates on the real-time data to complement the batch views. It receives the arriving data from various clients and performs incremental updates to the batch layer results and store them in processed data Database.

This layer generates results in a low-latency, near real-time fashion. By implementing incremental algorithms(like insertion sort) at the Stream layer, the computation cost can be significantly reduced. The batch views may be processed with more complex or expensive rules and takes more time but has better data quality and less skew, while the real-time views processed simply by incoming traffic give you access to the latest possible data.

3. Serving Layer

Serving Layer is a server or a set of servers which processes output of various queries from different modules(like analytics module, Notification module) using the results sent from the batch and speed layers.

The outputs from the batch layer in the form of batch views and the speed layer in the form of near-real-time views are stored in the Processed Data DB as well as sent to serving layer, and this output is used by the serving layer to compute the queries on an ad-hoc basis and the database is used by the serving layer to compute the queries on premeditated basis.

List of tools used in the Lambda Architecture

Advantages of Lambda Architecture

  • It is a good balance of speed, reliability, and scalability.
  • The batch layer of Lambda architecture manages historical data with the fault-tolerant, distributed storage, ensuring a low possibility of errors even if the system crashes.
  • The Stream layer of Lambda architecture manages the real time data with immediate response with somewhat less precision.
  • Access to both real-time and offline data results in covering many data analysis scenarios very well.

Disadvantages of Lambda Architecture

  • Lambda architecture is complex infrastructure as it has many layers involved.
  • Although the offline layer and the real-time stream face different scenarios, their internal processing logic is the same, so there are many duplicate modules and require different codebase.
  • Maintaining the different code base and keeping them in sync so that processed data produces same results from both paths.
  • Computes every batch cycle more then once, which decreases the system performance and requires more resources.
  • A data set modeled with Lambda architecture is difficult to migrate or reorganize.

Conclusion

Lambda architecture is a flexible and powerful architecture. It is used by many tech companies to process the data they need to drive their most critical decisions and initiatives.

  • The batch layer provides accurate data with increased latency – useful for scenarios that require precision over speed (e.g. financial reporting, banking logs)
  • The real-time layer provides approximate data at a low latency – useful for scenarios that require Quick responses and the system needs to handle various updates in new data streams over precision (e.g. operational real-time decision-making).
  • The serving layer enables stakeholders to easily access this data.

Although this post described the architecture and tradeoffs at a high level, in a real production environment there are many more considerations, like – Scalability, Consistency, Fault Tolerance and Operational requirements.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads