SSTable in Apache Cassandra
In this article, we are going to discuss SSTable which is one of the storage engines in Cassandra and SSTable components and also, we will cover what type of information kept in different database file in SSTable. Let’s discuss one by one.
It is one of the storage engines in Apache Cassandra i.e storage for Immutable data file for row storage. In Cassandra, SSTable uses for persisting data on disk.
Key points :
- In Apache Cassandra, as you will check how data stores then data in SSTables and SSTables are flushed to disk from Memtables or are streamed from other nodes.
- In Cassandra, while inserting data the timestamp is included in every write when it was written.
- In Cassandra, compaction is a concept that combines multiple SSTable into one big SSTable, and once new SSTable has been written after that old SSTables can be removed.
only, the latest timestamp is kept.
Components of SSTable :
In Cassandra, SSTable has multiple components that stored in multiple files as following.
- Data.db –
In SSTable, Data.db stores the actual data, i.e. the contents of rows.
- Index.db –
It is the component of SSTable in which an index from partition keys to positions in the Data.db file. It may also include an index to rows within a partition.
- Summary.db –
In Cassandra, SSTable component Summary.db has a sampling of (by default) every 128th entry in the Index.db file.
- Filter.db –
In SSTable, It is a Bloom Filter of the partition keys.
- CompressionInfo.db –
In SSTable, It is the component that kept the Metadata about the offsets. CompressionInfo.db kept the lengths of compression chunks in the Data.db file.
- Statistics.db –
It is one of the important components in SSTable which kept the statistics of data. In Cassandra, It is an SSTable component that Stores metadata about the SSTable and including information about timestamps, tombstones, clustering keys, compaction, repair, compression, Time to Live (TTL) values, and more.
- Digest.crc32 –
In Cassandra, this SSTable component has a CRC-32 digest of the Data.db file.
- TOC.txt –
In Cassandra, this SSTable component has a plain text list of the component files for the SSTable. In SSTable, Within the Data.db file, rows are organized by partition. These partitions are stored in token order such that by a hash of the partition key when Murmur3Partition chosen rows are stored in the order of their clustering keys.
In Apache Cassandra, SSTables can be optionally compressed using block-based compression.