Read Data From the Beginning Using Kafka Consumer API
Last Updated :
29 Feb, 2024
Apache Kafka is a distributed, fault-tolerant stream processing system. Data is read from Kafka and output to standard output using the kafka-console-consumer CLI. To operate, the Kafka consumer sends fetch requests to the brokers in charge of the partitions it wishes to consume. With every request, the consumer offset is recorded in the log. A portion of the log is returned to the customer starting at the offset location. This position is mostly controlled by the user, who may even rewind it to re-consume data if they so want.
Step-By-Step Implementation of Read Data from the Beginning Using Kafka Consumer API
Below are the steps to implement reading data from the beginning using Kafka Consumer API.
Step 1: Maven dependency
First, let’s add the Maven dependency for the Kafka Clients Java library to the pom.xml file in our project:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.5.1</version>
</dependency>
Step 2: Publish Messages from the Beginning
To publish messages, let’s first establish a KafkaProducer instance with a minimal setup specified by a Properties instance:
Java
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.common.serialization.StringSerializer;
import java.util.Properties;
public class KafkaProducerFactory
{
public static Producer<String, String> createProducer(String bootstrapServers)
{
Properties producerProperties = new Properties();
producerProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
producerProperties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer. class .getName());
producerProperties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer. class .getName());
return new KafkaProducer<>(producerProperties);
}
}
|
Step 3: Use KafkaProducer.send method
To publish messages to the Kafka stream, we utilize the KafkaProducer.send(ProducerRecord) function.
Java
for ( int i = 1 ; i <= 10 ; i++) {
ProducerRecord<String, String> record = new ProducerRecord<>( "geeksforgeeks" , String.valueOf(i));
producer.send(record);
}
|
Step 4: Consume messages from the beginning of a Kafka Consumer API
Using a randomly created consumer group id, we build an instance of KafkaConsumer to consume messages from the start of a Kafka topic. To do this, we assign a randomly generated UUID to the consumer’s “group.id” property:
Java
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.util.Properties;
import java.util.UUID;
public class KafkaConsumerProperties
{
public static Properties createConsumerProperties(String bootstrapServers)
{
Properties properties = new Properties();
properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer. class .getName());
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer. class .getName());
properties.put(ConsumerConfig.GROUP_ID_CONFIG, generateRandomGroupId());
return properties;
}
private static String generateRandomGroupId()
{
return UUID.randomUUID().toString();
}
}
|
- The customer will always be a member of a new consumer group that is designated by the group.id property whenever we create a new consumer group id for them.
- A newly formed consumer group will not be connected to any offset.
- In these situations, Kafka has a setting called auto.offset.reset that specifies what should happen if the current offset on the server disappears or if there is no starting offset in Kafka.
Step 5: Use the KafkaConsumer.poll(Duration duration) method
The KafkaConsumer.poll(Duration duration) function is then used to poll for new messages.
Java
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.time.Duration;
import java.util.Properties;
public class KafkaConsumerExample
{
private static final String BOOTSTRAP_SERVERS = "localhost:9092" ;
public static void main(String[] args)
{
Properties consumerProperties = KafkaConsumerProperties.createConsumerProperties(BOOTSTRAP_SERVERS);
Consumer<String, String> consumer = KafkaConsumerFactory.createConsumer(consumerProperties);
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds( 10 ));
processRecords(records);
consumer.close();
}
private static void processRecords(ConsumerRecords<String, String> records) {
for (ConsumerRecord<String, String> record : records) {
Logger.info(record.value());
}
}
}
|
Step 6: Reset the existing consumer to read
Using KafkaConsumer.seekToBeginning(Collection<TopicPartition> partitions), we can force the current consumer to start reading from the beginning of the topic. This function takes in a collection of TopicPartition and uses the offset of the consumer to refer to the partition’s start:
consumer.seekToBeginning(consumer.assignment());
- We provide the seekToBeginning() function the value of KafkaConsumer.assignment().
- The set of partitions that the consumer is presently allocated is returned by the KafkaConsumer.assignment() function.
Step 7: Read all the messages from the beginning
When the same customer is surveyed once again for messages, all of the messages from the start of the partition are now read.
Java
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds( 10 ));
for (ConsumerRecord<String, String> record : records) {
logger.info(record.value());
}
|
Share your thoughts in the comments
Please Login to comment...