Introduction to Apache Cassandra
Cassandra is a distributed database management system which is open source with wide column store, NoSQL database to handle large amount of data across many commodity servers which provides high availability with no single point of failure. It is written in Java and developed by Apache Software Foundation.
Avinash Lakshman & Prashant Malik initially developed the Cassandra at Facebook to power the Facebook inbox search feature. Facebook released Cassandra as an open source project on Google code in July 2008. In March 2009 it became an Apache Incubator project and in February 2010 it becomes a top-level project. Due to its outstanding technical features Cassandra becomes so popular.
Apache Cassandra is used to manage very large amounts of structure data spread out across the world. It provides highly available service with no single point of failure. Listed below are some points of Apache Cassandra:
- It is scalable, fault-tolerant, and consistent.
- It is column-oriented database.
- Its distributed design is based on Amazon’s Dynamo and its data model on Google’s Big table.
- It is Created at Facebook and it differs sharply from relational database management systems.
Cassandra implements a Dynamo-style replication model with no single point of failure but its add a more powerful “column family” data model. Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco, Rackspace, eBay, Netflix, and more.
The design goal of a Cassandra is to handle big data workloads across multiple nodes without any single point of failure. Cassandra has peer-to-peer distributed system across its nodes, and data is distributed among all the nodes of the cluster.
All the nodes of Cassandra in a cluster play the same role. Each node is independent, at the same time interconnected to other nodes. Each node in a cluster can accept read and write requests, regardless of where the data is actually located in the cluster. When a node goes down, read/write request can be served from other nodes in the network.
Features of Cassandra:
Cassandra has become popular because of its technical features. There are some of the features of Cassandra:
- Easy data distribution –
It provides the flexibility to distribute data where you need by replicating data across multiple data centers.
If there are 5 node let say N1, N2, N3, N4, N5 and by using partitioning algorithm we will decide the token range and distribute data accordingly. Each node have specific token range in which data will be distribute. let’s have a look on diagram for better understanding.
- Flexible data storage –
Cassandra accommodates all possible data formats including: structured, semi-structured, and unstructured. It can dynamically accommodate changes to your data structures accordingly to your need.
- Elastic scalability –
Cassandra is highly scalable and allows to add more hardware to accommodate more customers and more data as per requirement.
- Fast writes –
Cassandra was designed to run on cheap commodity hardware. Cassandra performs blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the read efficiency.
- Always on Architecture –
Cassandra has no single point of failure and it is continuously available for business-critical applications that can’t afford a failure.
- Fast linear-scale performance –
Cassandra is linearly scalable therefore it increases your throughput as you increase the number of nodes in the cluster. It maintains a quick response time.
- Transaction support –
Cassandra supports properties like Atomicity, Consistency, Isolation, and Durability (ACID) properties of transactions.