In Spring Batch, processors play an important role in the processing phase of a batch job. Simply put, the processor in Spring Batch is like an intermediary that receives an item (data) from a reader, does some processing on it, and then sends it to the writer. The Processor in Spring Batch is represented by the ItemProcessor interface. This interface has a form:
public interface ItemProcessor<I, O> {
O process(I item) throws Exception;
}
where,
- “I”: It is the type of input item read by the reader.
- “O”: It is the type of output item that will be passed to the writer.
- When configuring a Spring Batch job, you can define a processor by implementing the ItemProcessor interface. For example:
public class MyItemProcessor implements ItemProcessor<InputType, OutputType> {
@Override
public OutputType process(InputType item) throws Exception {
// Process the input item and return the processed item
// This method is where the business logic for processing takes place
// You can transform, filter, or modify the input item here
return processedItem;
}
}
In the Spring Batch job configuration, you can then wire this processor into your step:
Java
@Bean
public Step myStep(ItemReader<InputType> reader,
ItemProcessor<InputType, OutputType> processor,
ItemWriter<OutputType> writer) {
return stepBuilderFactory.get( "myStep" )
.<InputType, OutputType>chunk( 10 )
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
|
Where, processor is the instance of your custom ItemProcessor implementation.
Example:
Let’s simplify the concept of processors in Spring Batch using a real-life analogy with the example of a content processing system for a website like GeeksforGeeks.Imagine GeeksforGeeks needs to update and process the content on its website regularly. They have a massive database of programming tutorials written in different formats, and they want to standardize the content before publishing it on the website. This is where the content processing system comes into play.
- Reader (Content Retriever): The system retrieves content from the GeeksforGeeks database. Each piece of content represents a programming tutorial in various languages (Python, Java, Ruby, C, C++, etc.).
- Processor (Content Processor): The processor is like the team of editors and reviewers who ensure that the content follows a standardized format and meets certain quality criteria before it goes live on the website.
In the context of Spring Batch, the ItemProcessor is similar to the content processing logic. For example, it could check for consistency in code formatting, add standardized headers, or perform language-specific adjustments.
- Real-life analogy: If the tutorial content has code snippets, the processor might ensure that all code follows a consistent style guide and includes necessary comments.
public class CodeFormattingProcessor implements ItemProcessor<Tutorial, Tutorial> {
@Override
public Tutorial process(Tutorial tutorial) throws Exception {
// Check and standardize code formatting for the tutorial
tutorial.setCode(CodeFormatter.format(tutorial.getCode()));
return tutorial;
}
}
- Writer (Content Publisher): The writer is responsible for publishing the processed content to the website. In our analogy, this corresponds to updating the GeeksforGeeks database with the standardized content.
Real-life analogy: After the content processor has ensured consistent code formatting, the writer updates the database with the processed tutorial.
public class DatabaseWriter implements ItemWriter<Tutorial> {
@Override
public void write(List<? extends Tutorial> tutorials) throws Exception {
// Update the GeeksforGeeks database with the processed tutorials
tutorialDatabaseService.updateTutorials(tutorials);
}
}
By using Spring Batch, GeeksforGeeks can efficiently automate this content processing system, ensuring that all programming tutorials meet certain quality standards before being published on their website. The processor component, represented by the ItemProcessor, allows for the customization and standardization of content processing logic.
Advantages of Data Transformation with ItemProcessors in Spring Batch
- Modularity: Breaks the task into clear steps for better organization.
- Reusability: Creates tools that can be used again for different tasks, saving time.
- Scalability: Speeds up tasks by dividing the work among many helpers.
- Error Handling: Acts like a safety net, catching and dealing with unexpected issues.
- Complex Transformations: Centralizes intricate changes, simplifying the process.
- Integration: Easily connects with other tools or services for versatility.
- Testing and Debugging: Makes it simple to check and fix each part independently.
Data Transformation with ItemProcessors in Spring Batch
In Spring Batch, the ItemProcessor plays a crucial role in transforming data during batch processing. It allows you to apply custom logic to modify or enrich the data read by the ItemReader before it is written by the ItemWriter. Let’s extend the example of a content processing system for GeeksforGeeks with additional attributes and provide a guide on how to perform data transformation using ItemProcessors.Below steps to be followed. Let’s start from the beginning by creating a Spring Boot project and adding the necessary dependencies. For this example, I’ll use Maven as the build tool.
Step 1: Create a Spring Boot Project
- Go to website Spring Initializr
- Set the following configurations:
- Project: Maven Project
- Language: Java
- Spring Boot: Latest stable version
- Group: Your desired group name, e.g. com.geeksforgeeks
- Artifact: Your desired artifact name, e.g. content-processor
- Dependencies:
- Spring Batch
- Spring Web
- Lombok
- Click on the “Generate” button to download the project zip file.
Step 2: Extract and Import into IDE
Extract the downloaded zip file and import the project into your preferred IDE (Eclipse, IntelliJ, etc.).
Step 3: Add Additional Dependencies
Open the pom.xml file in your project and add the necessary dependencies. For this example, we’ll use H2 database for simplicity. If you are using a different database, adjust the dependencies accordingly. Below is the full pom.xml file configuration.
XML
<? xml version = "1.0" encoding = "UTF-8" ?>
< modelVersion >4.0.0</ modelVersion >
< parent >
< groupId >org.springframework.boot</ groupId >
< artifactId >spring-boot-starter-parent</ artifactId >
< version >2.6.1</ version >
< relativePath />
</ parent >
< groupId >com.geeksforgeeks</ groupId >
< artifactId >GeeksforGeeksContentProcessor</ artifactId >
< version >0.0.1-SNAPSHOT</ version >
< name >GeeksforGeeksContentProcessor</ name >
< description >RESTful API for GeeksforGeeks Content Processor Spring Boot App</ description >
< properties >
< java.version >8</ java.version >
</ properties >
< dependencies >
< dependency >
< groupId >org.springframework.boot</ groupId >
< artifactId >spring-boot-starter-data-jpa</ artifactId >
</ dependency >
< dependency >
< groupId >org.springframework.boot</ groupId >
< artifactId >spring-boot-starter-batch</ artifactId >
</ dependency >
< dependency >
< groupId >org.springframework.boot</ groupId >
< artifactId >spring-boot-starter-web</ artifactId >
</ dependency >
< dependency >
< groupId >com.h2database</ groupId >
< artifactId >h2</ artifactId >
< scope >runtime</ scope >
</ dependency >
< dependency >
< groupId >org.springframework.boot</ groupId >
< artifactId >spring-boot-devtools</ artifactId >
< scope >runtime</ scope >
< optional >true</ optional >
</ dependency >
< dependency >
< groupId >org.projectlombok</ groupId >
< artifactId >lombok</ artifactId >
< optional >true</ optional >
</ dependency >
< dependency >
< groupId >org.springframework.boot</ groupId >
< artifactId >spring-boot-starter-test</ artifactId >
< scope >test</ scope >
</ dependency >
< dependency >
< groupId >org.springframework.batch</ groupId >
< artifactId >spring-batch-test</ artifactId >
< scope >test</ scope >
</ dependency >
</ dependencies >
< build >
< plugins >
< plugin >
< groupId >org.springframework.boot</ groupId >
< artifactId >spring-boot-maven-plugin</ artifactId >
< configuration >
< excludes >
< exclude >
< groupId >org.projectlombok</ groupId >
< artifactId >lombok</ artifactId >
</ exclude >
</ excludes >
</ configuration >
</ plugin >
</ plugins >
</ build >
</ project >
|
Step 4: Create a ProgrammingTutorial Class
Create the ProgrammingTutorial class with the additional attributes as per the requirement.
Java
package com.geeksforgeeks.model;
import java.util.Date;
import jakarta.persistence.Column;
import jakarta.persistence.Entity;
import jakarta.persistence.GeneratedValue;
import jakarta.persistence.GenerationType;
import jakarta.persistence.Id;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.hibernate.annotations.CreationTimestamp;
import org.hibernate.annotations.UpdateTimestamp;
/**
* Represents a programming tutorial entity.
*
* This class is annotated with JPA annotations for entity mapping. Lombok
* annotations are used to generate getters, setters, and constructors.
* Hibernate annotations are used to handle timestamp creation and updates.
*
* @author rahul.chauhan
*/
@Entity
@Data
@NoArgsConstructor
@AllArgsConstructor
public class ProgrammingTutorial {
/**
* Unique identifier for the tutorial.
*/
@Id
@GeneratedValue (strategy = GenerationType.IDENTITY)
private Long id;
@Column
private String title;
/**
* Programming language covered in the tutorial.
*/
@Column
private String language;
/**
* Content of the programming tutorial.
*/
@Column
private String content;
@Column
private String author;
/**
* Timestamp representing the creation time of the tutorial. Automatically
* populated by Hibernate.
*/
@CreationTimestamp
private Date createTime;
/**
* Timestamp representing the last update time of the tutorial. Automatically
* updated by Hibernate.
*/
@UpdateTimestamp
private Date lastUpdateTime;
}
|
Step 5: Create a TutorialRepository
Create a simple repository interface for accessing the database.
Java
package com.geeksforgeeks.repository;
import org.springframework.data.jpa.repository.JpaRepository;
import com.geeksforgeeks.model.ProgrammingTutorial;
public interface TutorialRepository extends JpaRepository<ProgrammingTutorial, Long> {
}
|
Step 6: Configure Application Properties
In your application.properties file, configure the H2 database and other Spring Batch properties.
# DataSource settings
spring.datasource.url=jdbc:h2:mem:testdb
spring.datasource.driverClassName=org.h2.Driver
spring.datasource.username=testUser
spring.datasource.password=password
spring.jpa.database-platform=org.hibernate.dialect.H2Dialect
# H2 Console settings
spring.h2.console.enabled=true
spring.h2.console.path=/h2-console
# Hibernate settings
spring.jpa.hibernate.ddl-auto=update
spring.jpa.show-sql=true
spring.jpa.properties.hibernate.format_sql=true
# Server port
server.port=8080
Step 7: Implement ItemReader , ItemWriter , ItemProcesor and Batch config class
Java
package com.geeksforgeeks.batch;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableScheduling;
import com.geeksforgeeks.model.ProgrammingTutorial;
@Configuration
@EnableBatchProcessing
@EnableScheduling
public class BatchConfiguration {
private final JobBuilderFactory jobBuilderFactory;
private final StepBuilderFactory stepBuilderFactory;
private final JobLauncher jobLauncher;
private final JobCompletionNotificationListener notificationListener;
private final TutorialItemReader tutorialItemReader;
private final TutorialItemProcessor tutorialItemProcessor;
private final TutorialItemWriter tutorialItemWriter;
@Autowired
public BatchConfiguration(
JobBuilderFactory jobBuilderFactory,
StepBuilderFactory stepBuilderFactory,
JobLauncher jobLauncher,
JobCompletionNotificationListener notificationListener,
TutorialItemReader tutorialItemReader,
TutorialItemProcessor tutorialItemProcessor,
TutorialItemWriter tutorialItemWriter
) {
this .jobBuilderFactory = jobBuilderFactory;
this .stepBuilderFactory = stepBuilderFactory;
this .jobLauncher = jobLauncher;
this .notificationListener = notificationListener;
this .tutorialItemReader = tutorialItemReader;
this .tutorialItemProcessor = tutorialItemProcessor;
this .tutorialItemWriter = tutorialItemWriter;
}
@Bean
public Job processContentJob() {
return jobBuilderFactory.get( "processContentJob" )
.incrementer( new RunIdIncrementer())
.listener(notificationListener)
.flow(processContentStep())
.end()
.build();
}
@Bean
public Step processContentStep() {
return stepBuilderFactory.get( "processContentStep" )
.<ProgrammingTutorial, ProgrammingTutorial>chunk( 10 )
.reader(tutorialItemReader)
.processor(tutorialItemProcessor)
.writer(tutorialItemWriter)
.build();
}
}
|
Java
package com.geeksforgeeks.batch;
import org.springframework.batch.core.BatchStatus;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.listener.JobExecutionListenerSupport;
import org.springframework.stereotype.Component;
@Component
public class JobCompletionNotificationListener extends JobExecutionListenerSupport {
@Override
public void afterJob(JobExecution jobExecution) {
if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
System.out.println( "Batch Job Completed Successfully! Time to verify the results." );
}
}
}
|
Java
package com.geeksforgeeks.batch;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.stereotype.Component;
import com.geeksforgeeks.model.ProgrammingTutorial;
@Component
public class TutorialItemProcessor implements ItemProcessor<ProgrammingTutorial, ProgrammingTutorial> {
@Override
public ProgrammingTutorial process(ProgrammingTutorial tutorial) throws Exception {
tutorial.setTitle( "Transformed: " + tutorial.getTitle());
tutorial.setContent(transformContent(tutorial.getContent()));
return tutorial;
}
private String transformContent(String content) {
return content.toUpperCase();
}
}
|
Java
package com.geeksforgeeks.batch;
import java.util.Iterator;
import java.util.List;
import org.springframework.batch.item.ItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import com.geeksforgeeks.model.ProgrammingTutorial;
import com.geeksforgeeks.repository.TutorialRepository;
@Component
public class TutorialItemReader implements ItemReader<ProgrammingTutorial> {
@Autowired
private TutorialRepository tutorialRepository;
private Iterator<ProgrammingTutorial> tutorialIterator;
@Override
public ProgrammingTutorial read() throws Exception {
if (tutorialIterator == null || !tutorialIterator.hasNext()) {
initializeIterator();
}
return tutorialIterator.hasNext() ? tutorialIterator.next() : null ;
}
private void initializeIterator() {
List<ProgrammingTutorial> tutorials = tutorialRepository.findAll();
tutorialIterator = tutorials.iterator();
}
}
|
Java
package com.geeksforgeeks.batch;
import java.util.List;
import org.springframework.batch.item.ItemWriter;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import com.geeksforgeeks.model.ProgrammingTutorial;
import com.geeksforgeeks.repository.TutorialRepository;
@Component
public class TutorialItemWriter implements ItemWriter<ProgrammingTutorial> {
@Autowired
private TutorialRepository tutorialRepository;
@Override
public void write(List<? extends ProgrammingTutorial> tutorials) throws Exception {
tutorialRepository.saveAll(tutorials);
}
}
|
Step 8: Create ContentProcessingController
Java
package com.geeksforgeeks.controller;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
@RequestMapping ( "/api/content" )
public class ContentProcessingController {
private final JobLauncher jobLauncher;
private final Job processContentJob;
public ContentProcessingController(JobLauncher jobLauncher, Job processContentJob) {
this .processContentJob = processContentJob;
this .jobLauncher = jobLauncher;
}
@PostMapping ( "/process" )
public ResponseEntity<String> processContent() {
try {
JobParameters jobParameters = new JobParametersBuilder()
.addString( "jobParam1" , String.valueOf(System.currentTimeMillis())).toJobParameters();
JobExecution jobExecution = jobLauncher.run(processContentJob, jobParameters);
return ResponseEntity.ok( "Content processing job initiated successfully. Job ID: " + jobExecution.getId());
} catch (Exception e) {
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body( "Error initiating content processing job: " + e.getMessage());
}
}
}
|
Below is the project structure of created spring Boot application:
Run the Spring Boot Application
Testing Spring Boot Application
Use Postman to Trigger the Batch Job: Open Postman and create a new request:
- Method: POST
- URL: http://localhost:8080/api/content/process
OR just paste the below CURL into the postman request area:
curl -X POST -H "Content-Type: application/json" -d '{
"title": "Sample Tutorial",
"language": "Java",
"content": "This is a sample tutorial content.",
"author": "Rahul Dravid"
}' http://localhost:8080/api/tutorials
Send the request, and you should receive a response indicating that the content processing job has been initiated. See the below image for reference:
It is visible from the attached image the message showing as “Content processing job initiated successfully. Job ID: 3″.
Conclusion
In conclusion, creating a Spring Batch application for processing programming tutorials involves configuring entity classes, implementing ItemProcessor, ItemReader, and ItemWriter components, setting up batch jobs, and creating endpoints for job initiation. The application can be tested using Postman, and the entire process is designed to streamline the batch processing of data, ensuring consistency and efficiency in handling large datasets. Monitoring and debugging tools, along with additional enhancements, can be employed to refine and optimize the application for specific use cases. Overall, Spring Batch simplifies the development of robust and scalable batch processing systems within a Spring Boot application.
Share your thoughts in the comments
Please Login to comment...