In this article, we are going to discuss SSTable which is one of the storage engines in Cassandra and SSTable components and also, we will cover what type of information kept in different database file in SSTable. Let’s discuss one by one.
It is one of the storage engines in Apache Cassandra i.e storage for Immutable data file for row storage. In Cassandra, SSTable uses for persisting data on disk.
Key points :
- In Apache Cassandra, as you will check how data stores then data in SSTables and SSTables are flushed to disk from Memtables or are streamed from other nodes.
- In Cassandra, while inserting data the timestamp is included in every write when it was written.
- In Cassandra, compaction is a concept that combines multiple SSTable into one big SSTable, and once new SSTable has been written after that old SSTables can be removed.
only, the latest timestamp is kept.
Components of SSTable :
In Cassandra, SSTable has multiple components that stored in multiple files as following.
- Data.db –
In SSTable, Data.db stores the actual data, i.e. the contents of rows.
- Index.db –
It is the component of SSTable in which an index from partition keys to positions in the Data.db file. It may also include an index to rows within a partition.
- Summary.db –
In Cassandra, SSTable component Summary.db has a sampling of (by default) every 128th entry in the Index.db file.
- Filter.db –
In SSTable, It is a Bloom Filter of the partition keys.
- CompressionInfo.db –
In SSTable, It is the component that kept the Metadata about the offsets. CompressionInfo.db kept the lengths of compression chunks in the Data.db file.
- Statistics.db –
It is one of the important components in SSTable which kept the statistics of data. In Cassandra, It is an SSTable component that Stores metadata about the SSTable and including information about timestamps, tombstones, clustering keys, compaction, repair, compression, Time to Live (TTL) values, and more.
- Digest.crc32 –
In Cassandra, this SSTable component has a CRC-32 digest of the Data.db file.
- TOC.txt –
In Cassandra, this SSTable component has a plain text list of the component files for the SSTable. In SSTable, Within the Data.db file, rows are organized by partition. These partitions are stored in token order such that by a hash of the partition key when Murmur3Partition chosen rows are stored in the order of their clustering keys.
In Apache Cassandra, SSTables can be optionally compressed using block-based compression.
Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with the DSA Self Paced Course at a student-friendly price and become industry ready.
- Apache Cassandra tools
- Architecture of Apache Cassandra
- Introduction to Apache Cassandra
- Node in Apache Cassandra
- Apache Cassandra (NOSQL database)
- Concept of indexing in Apache Cassandra
- Five main benefits of Apache Cassandra
- Overview of Data modeling in Apache Cassandra
- Collection Data Type in Apache Cassandra
- Pre-defined data type in Apache Cassandra
- Difference between Apache Hive and Apache Spark SQL
- Difference Between Apache Kafka and Apache Flume
- Introduction to Apache Pig
- Apache Hive
- Apache HBase
- Why Apache Kafka is so Fast?
- Components of Apache Spark
- Introduction to Apache CouchDB
- Building Apps with Apache Cordova
- How to Install and Run Apache Kafka on Windows?
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.