Open In App

Why Apache Kafka is so Fast?

Apache Kafka is a well known open-source stream processing platform which aims to provide a high-throughput, low-latency & fault-tolerant platform which is capable of handling real-time data input. 



So what is it that makes Apache Kafka the go-to platform of choice when it comes to real-time data processing? Apart from all the other perks that Kafka provides, speed is one of the most important ones. Let us see how Kafka is built to be so fast.

1. Low-Latency I/O: There are two possible places which can be used for storing and caching the data: Random Access Memory (RAM) and Disk.



Thus, Kafka relies on the filesystem for the storage and caching of messages. Although it uses the disk approach and not the RAM approach, it still manages to achieve low latency! You might wonder how is this possible, given the high seek time. Let’s find out.

2. Kafka Avoids the Seek Time: Yes! Kafka smartly avoids the seek time by using a concept called Sequential I/O.

3. Zero Copy Principle: The most common way to send data over a network requires multiple context switches between the Kernel mode and the User mode, which results in the consumption of memory bandwidth and CPU cycles. The Zero Copy Principle aims to reduce this by requesting the kernel to move the data directly to the response socket rather than moving it via the application. Kafka’s speed is tremendously improved by the implementation of the zero-copy principle.

4. Optimal Data Structure: Tree vs. Queue: The tree seems to be the data structure of choice when it comes to data storage. Most of the modern databases use some form of the tree data structure. Eg. MongoDB uses BTree.

Thus, it uses a queue since all the data is appended at the end and the reads are very simple by the use of pointers. These operations are O(1) thereby confirming the efficiency of the queue data structure for Kafka.

5. Horizontal Scaling: Kafka has the ability to have multiple partitions for a single topic that can be spread across thousands of machines. This enables it to maintain the high-throughput and provide low latency.

6. Compression & Batching of Data: Kafka batches the data into chunks which helps in reducing the network calls and converting most of the random writes to sequential ones. It’s more efficient to compress a batch of data as compared to compressing individual messages.

Hence, Kafka compresses a batch of messages and sends them to the server where they’re written in the compressed form itself. They are decompressed when consumed by the subscriber. GZIP & Snappy compression protocols are supported by Kafka.

Article Tags :