Concept of indexing in Apache Cassandra

Last Updated : 11 Oct, 2019

Prerequisite – Introduction to Apache Cassandra
Index:
As we can access data using attributes which having the partition key. For Example, if Emp_id is a column name for Employee table and if it is partition key of that table then we can filter or search data with the help of partition key. In this case we can used WHERE clause to define condition over attribute and to search data.
But suppose if there exists a column which is not a partition key of that table and we want to filter or to search or to access data using WHERE clause then the query will not be executed and will give an error.

So, to access data in that case using attributes other than the partition key for fast and efficient lookup of data matching a given condition then we need to define index. It can be used for various purpose like for collections, static columns, collection columns, and any other columns except counter columns.

When to use an Index:

Built-in indexes are the best option on a table which having many rows and that rows contain the indexed value.
In a particular column which column having more unique values in that case we can used indexing.
Table which more overhead due to several reason like column having more entries then in that case we can used indexing.
To query and maintain the index we can used the indexing which is always a good option in that case.

Example:
Suppose you had a cricket match entry table with a million entries for player’s in hundreds of matches and wanted to look up player’s rank by the number of match played. Many player’s ranks will share the same column value for match year. The match_year column is a good option for an index.

Syntax to create an Index:

CREATE INDEX [ IF NOT EXISTS ] index_name
  ON [keyspace_name.]table_name
  ([ ( KEYS | FULL ) ] column_name) 
  (ENTRIES column_name);

Example:
To create table used keyspace1 as a keyspace and Task as a table name. Lets have a look.

CREATE TABLE keyspace1.Task 
(
   Task_id text,
   Task_name text,
   Task_time timestamp,
   T_location text,
   PRIMARY KEY (Task_id, Task_name)
);

As Cassandra is a distributed and decentralized database with the data organized by partition key, In general case, WHERE clause queries need to include a partition key.

Example:

SELECT * 
FROM Task 
WHERE Task_id = ‘T210’;

This query would work fine.

SELECT * 
FROM Task 
WHERE Task_id = ‘T210’ AND Task_name; ‘set alarm’;

This query would work fine.

Note:
In above table Task_id and Task_name columns are the part of primary key.

SELECT * FROM Task WHERE Task_time= ‘2019-09-30 15:02:56’;

This query would not work, because as we can see that here Task_time is not a part of partition key.

Error: Bad Request : no indexed columns present in by columns clause with Equal operator.

To resolve such type of errors by Creating an index on a clustering column. Define a table having a composite partition key, and then create an index on a clustering column.

CREATE TABLE keyspace1.Task (
   Task_id text,
   Task_name text,
   Task_time timestamp,
   T_location text,
  PRIMARY KEY ((Task_id, Task_name), Task_time) 
);
      
CREATE INDEX ON keyspace1.Task(Task_time);

SELECT * 
FROM Task 
WHERE Task_time= ‘2019-09-30 15:02:56’;

Now this query would work:

Note:
By creating an index such that creating secondary indexes does not mean that it will increase the speed of queries in Cassandra.

One of the important advantage of Secondary indexes helps in accessing data which can simply make it so that WHERE clauses that references values in column beyond the primary and clustering columns can run.

We have better option to increase the speed of queries in Cassandra that by creating a table specifically for the query.
Lets have a look on another example. In this example Student_record is a table name and keyspace1 is a keyspace name.

CREATE TABLE Student_record 
(
  Stu_state text,
  Stu_zip text,
  Stu_address text,
  PRIMARY KEY(Stu_state, Stu_zip)
 );

Here in this table Stu_state and Stu_zip may be same so, to define a unique record in a table we can add Stu_id as primary key which uniquely defined the record.

Now we can do modification in existing table by using ALTER command in CQL.

ALTER TABLE Student_record ADD Stu_id int PRIMARY KEY;

Output:

Table : Student_record

To check the description of a table in Cassandra used the following CQL query given below.
Describe table Student_record;

SELECT * 
FROM Student_record 
WHERE Stu_id = '107';

Output:

Table : Output

To create Composite Partition key in Cassandra:
A composite partition key is defined as the key where a partition key is having more than one column then it is called composite partition key.

Example:

CREATE TABLE Registration (
  Name text,
  Date timestamp,
  Email text, 
  Query text,
  PRIMARY KEY((Name, Date), Email) WITH CLUSTERING ORDER BY(Date DESC);

SELECT * 
FROM Registration LIMIT 2;

Note:
There is an row key for each Row in Cassandra when we create an index. It is common to have denormalized data in Cassandra. It is false that secondary indexes make queries run faster in Cassandra. Parenthesis is used to specify a composite partition key.

Suggest improvement

User Defined Function in Cassandra

High Availability Mechanism in Cassandra

Share your thoughts in the comments

Concept of indexing in Apache Cassandra

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?