SSTable in Apache Cassandra

In this article, we are going to discuss SSTable which is one of the storage engines in Cassandra and SSTable components and also, we will cover what type of information kept in different database file in SSTable. Let’s discuss one by one.

SSTable :
It is one of the storage engines in Apache Cassandra i.e storage for Immutable data file for row storage. In Cassandra, SSTable uses for persisting data on disk.


Figure – SSTable in Apache Cassandra

Key points :

  • In Apache Cassandra, as you will check how data stores then data in SSTables and SSTables are flushed to disk from Memtables or are streamed from other nodes.
  • In Cassandra, while inserting data the timestamp is included in every write when it was written.
  • In Cassandra, compaction is a concept that combines multiple SSTable into one big SSTable, and once new SSTable has been written after that old SSTables can be removed.
    only, the latest timestamp is kept.

Components of SSTable :
In Cassandra, SSTable has multiple components that stored in multiple files as following.



  • Data.db –
    In SSTable, Data.db stores the actual data, i.e. the contents of rows.

  • Index.db –
    It is the component of SSTable in which an index from partition keys to positions in the Data.db file. It may also include an index to rows within a partition.

  • Summary.db –
    In Cassandra, SSTable component Summary.db has a sampling of (by default) every 128th entry in the Index.db file.

  • Filter.db –
    In SSTable, It is a Bloom Filter of the partition keys.

  • CompressionInfo.db –
    In SSTable, It is the component that kept the Metadata about the offsets. CompressionInfo.db kept the lengths of compression chunks in the Data.db file.

  • Statistics.db –
    It is one of the important components in SSTable which kept the statistics of data. In Cassandra, It is an SSTable component that Stores metadata about the SSTable and including information about timestamps, tombstones, clustering keys, compaction, repair, compression, Time to Live (TTL) values, and more.

  • Digest.crc32 –
    In Cassandra, this SSTable component has a CRC-32 digest of the Data.db file.

  • TOC.txt –
    In Cassandra, this SSTable component has a plain text list of the component files for the SSTable. In SSTable, Within the Data.db file, rows are organized by partition. These partitions are stored in token order such that by a hash of the partition key when Murmur3Partition chosen rows are stored in the order of their clustering keys.

Note –
In Apache Cassandra, SSTables can be optionally compressed using block-based compression.

Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.

My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.