Prerequisite – Introduction to Cassandra, Overview of Data Modelling
In this article, we are going to discuss how we can Designing the model in Cassandra. Modelling design is one of the key part for any application. So, let’s discuss how we can make a better model for any application with an example we will see how we can do that.
As you will see Cassandra has a different approach regarding data modelling from RDBMS modelling. We can see the difference while discussing the data modelling. In RDBMS, we can perform JOINS while creating tables and also we can avoid duplicates by using foreign key in relevant tables.
In Cassandra, we can say that it is not the case and Cassandra is a distributed system and we can denormalize the data where needed. In RDBMS, we can retrieve and fetch via JOINs while In Cassandra it can be expensive because we can retrieve data via partition keys and In Cassandra, data is spanned across the nodes in Cassandra.
Goals keep in mind while designing the models –
- Evenly spread of data:
In a cluster evenly spread of data is one of key goal such that for a single column primary key will be partitioning key and for a composite primary key partitioning key will be a partition key and cluster key. We should have a Primary Key (PK) on the basis of uniqueness for example we can say like ID, email, Username should be chosen as a PK so, In that case cluster of nodes will be fully utilize.
- Minimizing the number of Reads:
In Cassandra we must know the queries in advance which will be require in the system and then designing the model accordingly. In Cassandra, if single query is fetching data from multiple partition so this can impact on system performance which will make system slow. In RDBMS, we have such liberty that we can create query after designing schema first that shows how it is actually different from non relational model.
Number of people visits the website and management want to following details given below. Let’s have a look.
- List of all Employee
- List of all Domain.
- List of Employees by domains
Now, let’s create the table one by one. First, we are going to create table Domain.
CREATE TABLE Domain ( D_id int, D_name text, D_info text, PRIMARY KEY(D_id) );
Now, we are going to create Employee table.
CREATE TABLE Employee ( username Varchar, E_name text, E_age int, PRIMARY KEY(username) );
Now, we are going to insert some data into the Domain table.
INSERT INTO Domain(D_id, D_name, D_info) VALUES (1, 'database', '50 member'); INSERT INTO Domain(D_id, D_name, D_info) VALUES (2, 'Management', '10 member'); INSERT INTO Domain(D_id, D_name, D_info) VALUES (3, 'Networking', '15 member'); INSERT INTO Domain(D_id, D_name, D_info) VALUES (4, 'software', '50 member');
Now, we are going to insert some data into Employee table.
INSERT INTO Employee(username, E_name, E_age) VALUES ('Alpha007', 'Ashish', 23); INSERT INTO Employee(username, E_name, E_age) VALUES ('Alice007', 'Alice', 23); INSERT INTO Employee(username, E_name, E_age) VALUES ('Bob007', 'Bob', 23);
In case of RDBMS, we can use Domain_id as an Foreign Key in Employees table and by JOIN we can get the data, but we are designing model in Cassandra. so, In Case of Cassandra we have to create another table which will satisfy the need as per requirement.
CREATE TABLE Employees_by_Domains ( username varchar, E_name text, D_name text, E_age int, PRIMARY KEY(D_name, E_age) );
In Employees by Domains table primary key has two parts first one is D_name which primary key and second one is E_age which is cluster key and records are clustered by E_age.
insert into Employees_by_Domains(username, E_name, D_name, E_age) VALUES ('Ashish001', 'Rana', 'Software Er.', 23); insert into Employees_by_Domains(username, E_name, D_name, E_age) VALUES ('Alice007', 'Alice', 'Database', 25); insert into Employees_by_Domains(username, E_name, D_name, E_age) VALUES ('Bob007', 'Bob', 'Networking', 26);
Now, we will see the results of every table and to get the data according to use case.
Let’s have a look.
To see the result’s of Domain table used the following CQL query given below.
Select * from Domain;
To see the result’s of Employee table used the following CQL query given below.
Select * from Employee;
To see the result’s of Employees_by_Domains table used the following CQL query given below.
Select * from Employees_by_Domains;
To get the token value used the following CQL query given below.
select token(username) from Employee;
Each token value is unique for each username and tokens will spread across nodes. When we will execute following querry:
SELECT * FROM Employee where username = 'Alpha007'
It will return data and pick the node based on this (-9203506337422437113) token value.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.