In this article, we are going to discuss SSTable which is one of the storage engines in Cassandra and SSTable components and also, we will cover what type of information kept in different database file in SSTable. Let’s discuss one by one.
It is one of the storage engines in Apache Cassandra i.e storage for Immutable data file for row storage. In Cassandra, SSTable uses for persisting data on disk.
Key points :
- In Apache Cassandra, as you will check how data stores then data in SSTables and SSTables are flushed to disk from Memtables or are streamed from other nodes.
- In Cassandra, while inserting data the timestamp is included in every write when it was written.
- In Cassandra, compaction is a concept that combines multiple SSTable into one big SSTable, and once new SSTable has been written after that old SSTables can be removed.
only, the latest timestamp is kept.
Components of SSTable :
In Cassandra, SSTable has multiple components that stored in multiple files as following.
- Data.db –
In SSTable, Data.db stores the actual data, i.e. the contents of rows.
- Index.db –
It is the component of SSTable in which an index from partition keys to positions in the Data.db file. It may also include an index to rows within a partition.
- Summary.db –
In Cassandra, SSTable component Summary.db has a sampling of (by default) every 128th entry in the Index.db file.
- Filter.db –
In SSTable, It is a Bloom Filter of the partition keys.
- CompressionInfo.db –
In SSTable, It is the component that kept the Metadata about the offsets. CompressionInfo.db kept the lengths of compression chunks in the Data.db file.
- Statistics.db –
It is one of the important components in SSTable which kept the statistics of data. In Cassandra, It is an SSTable component that Stores metadata about the SSTable and including information about timestamps, tombstones, clustering keys, compaction, repair, compression, Time to Live (TTL) values, and more.
- Digest.crc32 –
In Cassandra, this SSTable component has a CRC-32 digest of the Data.db file.
- TOC.txt –
In Cassandra, this SSTable component has a plain text list of the component files for the SSTable. In SSTable, Within the Data.db file, rows are organized by partition. These partitions are stored in token order such that by a hash of the partition key when Murmur3Partition chosen rows are stored in the order of their clustering keys.
In Apache Cassandra, SSTables can be optionally compressed using block-based compression.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.
- Difference Between Apache Kafka and Apache Flume
- Difference between Apache Hive and Apache Spark SQL
- Apache Cassandra tools
- Introduction to Apache Cassandra
- Apache Cassandra (NOSQL database)
- Architecture of Apache Cassandra
- Overview of Data modeling in Apache Cassandra
- Concept of indexing in Apache Cassandra
- Collection Data Type in Apache Cassandra
- Pre-defined data type in Apache Cassandra
- Five main benefits of Apache Cassandra
- Node in Apache Cassandra
- Apache Hive
- Apache HBase
- How to make sure that Apache service keeps running in Ubuntu
- Introduction to Apache Pig
- Why Apache Kafka is so Fast?
- Building Apps with Apache Cordova
- How to Install and Run Apache Kafka on Windows?
- Spring Boot | How to publish JSON messages on Apache Kafka
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.