Open In App

Secondary Indexes on MAP Collection in Cassandra

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss the overview of Secondary Indexes on MAP Collection in Cassandra and then will implement the exercise and will see how it actually works, and then finally will conclude the importance of Secondary Indexes on MAP Collection. Let’s discuss it one by one.

Pre-requisite –

Overview :
Creating secondary Indexes on MAP Collection in Cassandra is very useful. So, here you will see the real use case of creating an index on a collection because it makes searching and querying data very efficient and fast. So if, you want to search based on a keyword then while searching and querying data, you can use the CONTAINS KEY keywords for a specific keyword in the MAP collection data type. In MAP collection data key-value pairs will be stored, and we will use KEY to search the data. Let’s understand the whole concept with the help of examples.

Syntax :
In this, you will see the syntax part of how to create an index on collection. You can use the given below command as follows.

CREATE INDEX ON <table_name>(KEYS<map_column>)

Here, you will see the syntax of the WHERE clause part where you will use the CONTAINS keyword to search the specific value in the MAP collection. 

WHERE <map_column> CONTAINS KEY <key>

Example :
Let’s consider you have existing keyspace namely cluster1 and then first we will create a user_data table by using CQL command as follows.

Step-1 : Creating table -user_data –

use cluster1;
create table user_data
(
user_id varchar,
user_first_name varchar,
user_last_name varchar,
company varchar,
user_tags map<varchar,varchar>,
primary key(user_id)
);

Step-2 : Creating Index on user_tags –

CREATE INDEX ON user_data (KEYS(user_tags))

Step-3 : Inserting data –

insert into user_data(user_id, user_first_name, user_last_name, company, user_tags)
values('Ashish01','Ashish','Rana','abc',{'GFG':'Geeks for Geeks','HTML':'HyperText Markup Language'});

insert into user_data(user_id, user_first_name, user_last_name, company, user_tags)
values('Ashish02','Ayush','NA','abc',{'GFG':'Geeks for Geeks','IDK':'I Do not Know'});

insert into user_data(user_id, user_first_name, user_last_name, company, user_tags)
values('Ashish03','Ayushi','NA','abc',{'FYI':'For Your Information','IDK':'I Do not Know'});

Step-4 : Verifying and Reading data – 

select * from user_data;

Step-5 : Output – 

user_id company user_first_name user_last_name user_tags
Ashish03 abc Ayushi  NA {‘FYI’: ‘For Your Information’, ‘IDK’: ‘I Do not Know’}
Ashish02 abc Ayush  NA {‘GFG’: ‘Geeks for Geeks’, ‘IDK’: ‘I Do not Know’}
Ashish01 abc Ashish  Rana {‘GFG’: ‘Geeks for Geeks’, ‘HTML’: ‘HyperText Markup Language’}

Step-6: Searching on the basis of specific keyword – 

select * from user_data where user_tags CONTAINS KEY 'GFG';

Step-7 : Output –

user_id company user_first_name user_last_name user_tags
Ashish02 abc Ayush  NA {‘GFG‘: ‘Geeks for Geeks’, ‘IDK’: ‘I Do not Know’}
Ashish01 abc Ashish  Rana {‘GFG‘: ‘Geeks for Geeks’, ‘HTML’: ‘HyperText Markup Language’}

In Cassandra, a secondary index is a way to enable querying on non-primary key columns. When we create a secondary index on a column, Cassandra creates an index on that column that allows us to query data based on that column. However, it is important to note that using secondary indexes can lead to performance issues, especially when querying large datasets.

  1. In the case of a MAP collection in Cassandra, a secondary index can be created on the keys or the values of the MAP collection. When we create a secondary index on the keys of a MAP collection, we can query data based on the keys of the MAP collection. Similarly, when we create a secondary index on the values of a MAP collection, we can query data based on the values of the MAP collection.
  2. However, there are some limitations when using secondary indexes on MAP collections in Cassandra. For example, when querying on a secondary index on the values of a MAP collection, we can only query for rows that contain a specific value, and we cannot query for rows that contain a specific key-value pair. Additionally, querying on secondary indexes can be slow and may not scale well for large datasets.

In summary, secondary indexes on MAP collections in Cassandra can be useful for querying data based on the keys or values of a MAP collection. However, it is important to consider the limitations and performance implications when using secondary indexes, especially for large datasets. It is recommended to use secondary indexes only for small datasets or in combination with other indexing techniques in order to optimize query performance.



Last Updated : 02 Nov, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads