Open In App

Introduction to Log structured merge (LSM) Tree

B+ Trees and LSM Trees are two basic data structures when we talk about the building blocks of Databases. B+ Trees are used when we need less search and insertion time and on the other hand, LSM trees are used when we have write-intensive datasets and reads are not that high.

This article will teach about Log Structured Merge Tree aka LSM Tree. LSM Trees are the data structure underlying many highly scalable NoSQL distributed key-value type databases such as Amazon’s DynamoDB, Cassandra, and ScyllaDB.



LSM Trees

A simple version of LSM Trees comprises 2 levels of tree-like data structure:

Simple LSM Tree

New records are inserted into the memtable T0 component. If the insertion causes the T0 component to exceed a certain size threshold, a contiguous segment of entries is removed from T0 and merged into T1 on disk.



LSM Workflow

LSM primarily uses 3 concepts to optimize read and write operations:

Memtable representation

3.1. SSTables flushed to Disk

3.6. Compactor compacted 2 SSTables to 1 SSTable

Conclusion:

Article Tags :