Open In App

Topics, Partitions, and Offsets in Apache Kafka

Apache Kafka is a publish-subscribe messaging system. A messaging system let you send messages between processes, applications, and servers. Broadly Speaking, Apache Kafka is software where topics (A topic might be a category) can be defined and further processed. In this article, we are going to discuss the 3 most important components of Apache Kafka

  1. Topics
  2. Partitions
  3. Offsets

Topics, Partitions, and Offsets

In Kafka we have Topics and Topics represent a particular stream of data. So a Kafka Topic is going to be pretty similar to what a table is in a database without all the constraints, so if you have many tables in a database you will have many topics in Apache Kafka. You can have as many Topics as you want in Apache Kafka and the way to identify a Topic is by its name. So when you name a Topic it will need to have a unique name. Topics are split into Partitions. So when you create a Kafka Topic we will have to specify how many Partitions we want for the Kafka topics. Each partition is going to be a stream of data as well and each Partition will have the data in it being ordered and each message within a Partition will get an incremental ID which is the position of the message in the Partition and that specific ID is called an Offset. 



 

So if we take this example of a Kafka Topic with 3 partitions then if we look at Partition 0, it will have the message with Offset 0, then the message with Offset 1, 2, 3..etc, maybe all the way up to 11. And then the next message to be written is going to be message number 12, offset number 12. And then Partition 1 is also part of our Kafka Topic and this one has also Offsets going from 0 all the way to 7 and then the next message to be written is number 8 and Partition 2 has messages or offsets going from 0 to 9 and the next message should be written is number 10. So as we can see in this example the partitions are independent. We will be writing to each partition independently at its own speed, so the Offsets in each partition are independent and again a message has a coordinate of a Topic name a Partition id, and an Offset.

Topic Example

Let’s go through an example where we have cars and the cars are ground on the road. So we have a fleet of cars and we’re a car company and what we want to do is to have the car position in Kafka. Why because maybe we have many applications we need that stream of car positions for maybe a dashboard or some alerting or so on. So we’re going to create in Kafka a Topic and name that cars_gps and that topic will contain the position of all the cars in real-time and so what we’ll do is that each car is going to send to Kafka maybe every 20 seconds, their position and their position will be included as part of a message and each message will contain the carID so we can know which car the position belongs to as well as the car position itself. 



For example the latitude and longitude. But we could choose to add more data to that message we can add the speed, we can add the weight of the car, we can add how many hours the car has been on, and so on. So we choose to create a topic with 10 partitions but in Kafka, the more partitions you have the more throughput can go through your topic. So this is something you have to do as part of testing and capacity planning. So from there maybe consumer applications are going to be a location dashboard for a mobile application or notification service. For example, if a car hasn’t been moving for more than 10 minutes, maybe it’s broken or maybe your car has arrived at its destination and we want to send a notification to wherever it has arrived.

 

Some Major Points to Remember in Topics, Partitions, and Offsets

Please refer to the same example.

 

Topic Naming Convention

Naming a topic is a “free-for-all”. So you can do whatever you want but once you go into production with Kafka you need to enforce guidelines internally to ease the management of your cluster. So you’re free to come up with your own guidelines. If you want you can also the following guidelines for naming a topic.

<message type>.<dataset name>.<data name>

Article Tags :