Open In App

Spring Batch – Data Transformation with ItemProcessors

Last Updated : 05 Feb, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In Spring Batch, processors play an important role in the processing phase of a batch job. Simply put, the processor in Spring Batch is like an intermediary that receives an item (data) from a reader, does some processing on it, and then sends it to the writer. The Processor in Spring Batch is represented by the ItemProcessor interface. This interface has a form:

public interface ItemProcessor<I, O> {
O process(I item) throws Exception;
}

where,

  • “I”: It is the type of input item read by the reader.
  • “O”: It is the type of output item that will be passed to the writer.
  • When configuring a Spring Batch job, you can define a processor by implementing the ItemProcessor interface. For example:
public class MyItemProcessor implements ItemProcessor<InputType, OutputType> {

@Override
public OutputType process(InputType item) throws Exception {
// Process the input item and return the processed item
// This method is where the business logic for processing takes place
// You can transform, filter, or modify the input item here
return processedItem;
}
}

In the Spring Batch job configuration, you can then wire this processor into your step:

Java




@Bean
public Step myStep(ItemReader<InputType> reader,
                   ItemProcessor<InputType, OutputType> processor,
                   ItemWriter<OutputType> writer) {
    return stepBuilderFactory.get("myStep")
            .<InputType, OutputType>chunk(10)
            .reader(reader)
            .processor(processor)
            .writer(writer)
            .build();
}


Where, processor is the instance of your custom ItemProcessor implementation.

Example:

Let’s simplify the concept of processors in Spring Batch using a real-life analogy with the example of a content processing system for a website like GeeksforGeeks.Imagine GeeksforGeeks needs to update and process the content on its website regularly. They have a massive database of programming tutorials written in different formats, and they want to standardize the content before publishing it on the website. This is where the content processing system comes into play.

  • Reader (Content Retriever): The system retrieves content from the GeeksforGeeks database. Each piece of content represents a programming tutorial in various languages (Python, Java, Ruby, C, C++, etc.).
  • Processor (Content Processor): The processor is like the team of editors and reviewers who ensure that the content follows a standardized format and meets certain quality criteria before it goes live on the website.
    In the context of Spring Batch, the ItemProcessor is similar to the content processing logic. For example, it could check for consistency in code formatting, add standardized headers, or perform language-specific adjustments.
  • Real-life analogy: If the tutorial content has code snippets, the processor might ensure that all code follows a consistent style guide and includes necessary comments.
public class CodeFormattingProcessor implements ItemProcessor<Tutorial, Tutorial> {
@Override
public Tutorial process(Tutorial tutorial) throws Exception {
// Check and standardize code formatting for the tutorial
tutorial.setCode(CodeFormatter.format(tutorial.getCode()));
return tutorial;
}
}
  • Writer (Content Publisher): The writer is responsible for publishing the processed content to the website. In our analogy, this corresponds to updating the GeeksforGeeks database with the standardized content.
    Real-life analogy: After the content processor has ensured consistent code formatting, the writer updates the database with the processed tutorial.
public class DatabaseWriter implements ItemWriter<Tutorial> {
@Override
public void write(List<? extends Tutorial> tutorials) throws Exception {
// Update the GeeksforGeeks database with the processed tutorials
tutorialDatabaseService.updateTutorials(tutorials);
}
}

By using Spring Batch, GeeksforGeeks can efficiently automate this content processing system, ensuring that all programming tutorials meet certain quality standards before being published on their website. The processor component, represented by the ItemProcessor, allows for the customization and standardization of content processing logic.

Advantages of Data Transformation with ItemProcessors in Spring Batch

  • Modularity: Breaks the task into clear steps for better organization.
  • Reusability: Creates tools that can be used again for different tasks, saving time.
  • Scalability: Speeds up tasks by dividing the work among many helpers.
  • Error Handling: Acts like a safety net, catching and dealing with unexpected issues.
  • Complex Transformations: Centralizes intricate changes, simplifying the process.
  • Integration: Easily connects with other tools or services for versatility.
  • Testing and Debugging: Makes it simple to check and fix each part independently.

Data Transformation with ItemProcessors in Spring Batch

In Spring Batch, the ItemProcessor plays a crucial role in transforming data during batch processing. It allows you to apply custom logic to modify or enrich the data read by the ItemReader before it is written by the ItemWriter. Let’s extend the example of a content processing system for GeeksforGeeks with additional attributes and provide a guide on how to perform data transformation using ItemProcessors.Below steps to be followed. Let’s start from the beginning by creating a Spring Boot project and adding the necessary dependencies. For this example, I’ll use Maven as the build tool.

Step 1: Create a Spring Boot Project

  • Go to website Spring Initializr
  • Set the following configurations:
    • Project: Maven Project
    • Language: Java
    • Spring Boot: Latest stable version
    • Group: Your desired group name, e.g. com.geeksforgeeks
    • Artifact: Your desired artifact name, e.g. content-processor
  • Dependencies:
    • Spring Batch
    • Spring Web
    • Lombok
  • Click on the “Generate” button to download the project zip file.

Step 2: Extract and Import into IDE

Extract the downloaded zip file and import the project into your preferred IDE (Eclipse, IntelliJ, etc.).

Step 3: Add Additional Dependencies

Open the pom.xml file in your project and add the necessary dependencies. For this example, we’ll use H2 database for simplicity. If you are using a different database, adjust the dependencies accordingly. Below is the full pom.xml file configuration.

XML




<?xml version="1.0" encoding="UTF-8"?>
<!-- Maven Project Object Model (POM) file for the GeeksforGeeksContentProcessor Spring Boot App -->
      
    <!-- Specify the Maven version and POM format -->
    <modelVersion>4.0.0</modelVersion>
  
    <!-- Parent POM for Spring Boot projects -->
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.6.1</version>
        <relativePath /> <!-- lookup parent from repository -->
    </parent>
  
    <!-- Project information -->
    <groupId>com.geeksforgeeks</groupId>
    <artifactId>GeeksforGeeksContentProcessor</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>GeeksforGeeksContentProcessor</name>
    <description>RESTful API for GeeksforGeeks Content Processor Spring Boot App</description>
  
    <!-- Project properties -->
    <properties>
        <java.version>8</java.version> <!-- Java version for the project -->
    </properties>
  
    <!-- Project dependencies -->
    <dependencies>
        <!-- Spring Boot starter for Spring Data JPA -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>
  
        <!-- Spring Boot starter for Spring Batch -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-batch</artifactId>
        </dependency>
  
        <!-- Spring Boot starter for building web applications -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
  
        <!-- H2 Database as a runtime dependency -->
        <dependency>
            <groupId>com.h2database</groupId>
            <artifactId>h2</artifactId>
            <scope>runtime</scope>
        </dependency>
  
        <!-- Spring Boot devtools for development -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-devtools</artifactId>
            <scope>runtime</scope>
            <optional>true</optional>
        </dependency>
  
        <!-- Lombok for simplified Java code -->
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
  
        <!-- Spring Boot starter for testing -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
  
        <!-- Spring Batch testing support -->
        <dependency>
            <groupId>org.springframework.batch</groupId>
            <artifactId>spring-batch-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>
  
    <!-- Maven Build Configuration -->
    <build>
        <plugins>
            <!-- Spring Boot Maven Plugin -->
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <!-- Exclude Lombok from Spring Boot plugin -->
                    <excludes>
                        <exclude>
                            <groupId>org.projectlombok</groupId>
                            <artifactId>lombok</artifactId>
                        </exclude>
                    </excludes>
                </configuration>
            </plugin>
        </plugins>
    </build>
  
</project>


Step 4: Create a ProgrammingTutorial Class

Create the ProgrammingTutorial class with the additional attributes as per the requirement.

Java




package com.geeksforgeeks.model;
  
import java.util.Date;
  
import jakarta.persistence.Column;
import jakarta.persistence.Entity;
import jakarta.persistence.GeneratedValue;
import jakarta.persistence.GenerationType;
import jakarta.persistence.Id;
  
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.hibernate.annotations.CreationTimestamp;
import org.hibernate.annotations.UpdateTimestamp;
  
/**
 * Represents a programming tutorial entity.
 *
 * This class is annotated with JPA annotations for entity mapping. Lombok
 * annotations are used to generate getters, setters, and constructors.
 * Hibernate annotations are used to handle timestamp creation and updates.
 *
 * @author rahul.chauhan
 */
@Entity
@Data
@NoArgsConstructor
@AllArgsConstructor
public class ProgrammingTutorial {
    /**
     * Unique identifier for the tutorial.
     */
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
  
    /**
     * Title of the programming tutorial.
     */
    @Column
    private String title;
  
    /**
     * Programming language covered in the tutorial.
     */
    @Column
    private String language;
  
    /**
     * Content of the programming tutorial.
     */
    @Column
    private String content;
  
    /**
     * Author of the programming tutorial.
     */
    @Column
    private String author;
  
    /**
     * Timestamp representing the creation time of the tutorial. Automatically
     * populated by Hibernate.
     */
    @CreationTimestamp
    private Date createTime;
  
    /**
     * Timestamp representing the last update time of the tutorial. Automatically
     * updated by Hibernate.
     */
    @UpdateTimestamp
    private Date lastUpdateTime;
}


Step 5: Create a TutorialRepository

Create a simple repository interface for accessing the database.

Java




package com.geeksforgeeks.repository;
  
import org.springframework.data.jpa.repository.JpaRepository;
import com.geeksforgeeks.model.ProgrammingTutorial;
  
public interface TutorialRepository extends JpaRepository<ProgrammingTutorial, Long> {
}


Step 6: Configure Application Properties

In your application.properties file, configure the H2 database and other Spring Batch properties.

# DataSource settings
spring.datasource.url=jdbc:h2:mem:testdb
spring.datasource.driverClassName=org.h2.Driver
spring.datasource.username=testUser
spring.datasource.password=password
spring.jpa.database-platform=org.hibernate.dialect.H2Dialect

# H2 Console settings
spring.h2.console.enabled=true
spring.h2.console.path=/h2-console

# Hibernate settings
spring.jpa.hibernate.ddl-auto=update
spring.jpa.show-sql=true
spring.jpa.properties.hibernate.format_sql=true

# Server port
server.port=8080

Step 7: Implement ItemReader , ItemWriter , ItemProcesor and Batch config class

Java




/*
BatchConfiguration.java
*/
package com.geeksforgeeks.batch;
  
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableScheduling;
  
import com.geeksforgeeks.model.ProgrammingTutorial;
  
@Configuration
@EnableBatchProcessing
@EnableScheduling
public class BatchConfiguration {
  
    private final JobBuilderFactory jobBuilderFactory;
    private final StepBuilderFactory stepBuilderFactory;
    private final JobLauncher jobLauncher;
    private final JobCompletionNotificationListener notificationListener;
    private final TutorialItemReader tutorialItemReader;
    private final TutorialItemProcessor tutorialItemProcessor;
    private final TutorialItemWriter tutorialItemWriter;
  
    @Autowired
    public BatchConfiguration(
            JobBuilderFactory jobBuilderFactory,
            StepBuilderFactory stepBuilderFactory,
            JobLauncher jobLauncher,
            JobCompletionNotificationListener notificationListener,
            TutorialItemReader tutorialItemReader,
            TutorialItemProcessor tutorialItemProcessor,
            TutorialItemWriter tutorialItemWriter
    ) {
        this.jobBuilderFactory = jobBuilderFactory;
        this.stepBuilderFactory = stepBuilderFactory;
        this.jobLauncher = jobLauncher;
        this.notificationListener = notificationListener;
        this.tutorialItemReader = tutorialItemReader;
        this.tutorialItemProcessor = tutorialItemProcessor;
        this.tutorialItemWriter = tutorialItemWriter;
    }
  
    @Bean
    public Job processContentJob() {
        return jobBuilderFactory.get("processContentJob")
                .incrementer(new RunIdIncrementer())
                .listener(notificationListener)
                .flow(processContentStep())
                .end()
                .build();
    }
  
    @Bean
    public Step processContentStep() {
        return stepBuilderFactory.get("processContentStep")
                .<ProgrammingTutorial, ProgrammingTutorial>chunk(10)
                .reader(tutorialItemReader)
                .processor(tutorialItemProcessor)
                .writer(tutorialItemWriter)
                .build();
    }
}


Java




/*
JobCompletionNotificationListener.java
*/
package com.geeksforgeeks.batch;
  
import org.springframework.batch.core.BatchStatus;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.listener.JobExecutionListenerSupport;
import org.springframework.stereotype.Component;
  
@Component
public class JobCompletionNotificationListener extends JobExecutionListenerSupport {
  
    @Override
    public void afterJob(JobExecution jobExecution) {
        if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
            System.out.println("Batch Job Completed Successfully! Time to verify the results.");
        }
    }
}


Java




/*
TutorialItemProcessor.java
*/
package com.geeksforgeeks.batch;
  
import org.springframework.batch.item.ItemProcessor;
import org.springframework.stereotype.Component;
  
import com.geeksforgeeks.model.ProgrammingTutorial;
  
@Component
public class TutorialItemProcessor implements ItemProcessor<ProgrammingTutorial, ProgrammingTutorial> {
  
    @Override
    public ProgrammingTutorial process(ProgrammingTutorial tutorial) throws Exception {
        // Your transformation logic here
        tutorial.setTitle("Transformed: " + tutorial.getTitle());
        tutorial.setContent(transformContent(tutorial.getContent()));
        return tutorial;
    }
  
    private String transformContent(String content) {
        // Your content transformation logic here
        // For example, perform language-specific adjustments
        return content.toUpperCase();
    }
}


Java




/*
TutorialItemReader.java
*/
package com.geeksforgeeks.batch;
  
import java.util.Iterator;
import java.util.List;
  
import org.springframework.batch.item.ItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
  
import com.geeksforgeeks.model.ProgrammingTutorial;
import com.geeksforgeeks.repository.TutorialRepository;
  
@Component
public class TutorialItemReader implements ItemReader<ProgrammingTutorial> {
  
    @Autowired
    private TutorialRepository tutorialRepository;
  
    private Iterator<ProgrammingTutorial> tutorialIterator;
  
    @Override
    public ProgrammingTutorial read() throws Exception {
        if (tutorialIterator == null || !tutorialIterator.hasNext()) {
            initializeIterator();
        }
        return tutorialIterator.hasNext() ? tutorialIterator.next() : null;
    }
  
    private void initializeIterator() {
        List<ProgrammingTutorial> tutorials = tutorialRepository.findAll();
        tutorialIterator = tutorials.iterator();
    }
}


Java




/*
TutorialItemWriter.java
*/
package com.geeksforgeeks.batch;
  
import java.util.List;
  
import org.springframework.batch.item.ItemWriter;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
  
import com.geeksforgeeks.model.ProgrammingTutorial;
import com.geeksforgeeks.repository.TutorialRepository;
  
@Component
public class TutorialItemWriter implements ItemWriter<ProgrammingTutorial> {
  
    @Autowired
    private TutorialRepository tutorialRepository;
  
    @Override
    public void write(List<? extends ProgrammingTutorial> tutorials) throws Exception {
        tutorialRepository.saveAll(tutorials);
    }
}


Step 8: Create ContentProcessingController

Java




package com.geeksforgeeks.controller;
  
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
  
@RestController
@RequestMapping("/api/content")
public class ContentProcessingController {
  
    private final JobLauncher jobLauncher;
    private final Job processContentJob;
  
    public ContentProcessingController(JobLauncher jobLauncher, Job processContentJob) {
        this.processContentJob = processContentJob;
        this.jobLauncher = jobLauncher;
    }
  
    @PostMapping("/process")
    public ResponseEntity<String> processContent() {
        try {
            JobParameters jobParameters = new JobParametersBuilder()
                    .addString("jobParam1", String.valueOf(System.currentTimeMillis())).toJobParameters();
  
            JobExecution jobExecution = jobLauncher.run(processContentJob, jobParameters);
  
            return ResponseEntity.ok("Content processing job initiated successfully. Job ID: " + jobExecution.getId());
        } catch (Exception e) {
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                    .body("Error initiating content processing job: " + e.getMessage());
        }
    }
}


Below is the project structure of created spring Boot application:

Project Structure

Run the Spring Boot Application

Testing Spring Boot Application

Use Postman to Trigger the Batch Job: Open Postman and create a new request:

  • Method: POST
  • URL: http://localhost:8080/api/content/process

OR just paste the below CURL into the postman request area:

curl -X POST -H "Content-Type: application/json" -d '{
"title": "Sample Tutorial",
"language": "Java",
"content": "This is a sample tutorial content.",
"author": "Rahul Dravid"
}' http://localhost:8080/api/tutorials

Send the request, and you should receive a response indicating that the content processing job has been initiated. See the below image for reference:
Testing in PostmanIt is visible from the attached image the message showing as “Content processing job initiated successfully. Job ID: 3″.

Conclusion

In conclusion, creating a Spring Batch application for processing programming tutorials involves configuring entity classes, implementing ItemProcessor, ItemReader, and ItemWriter components, setting up batch jobs, and creating endpoints for job initiation. The application can be tested using Postman, and the entire process is designed to streamline the batch processing of data, ensuring consistency and efficiency in handling large datasets. Monitoring and debugging tools, along with additional enhancements, can be employed to refine and optimize the application for specific use cases. Overall, Spring Batch simplifies the development of robust and scalable batch processing systems within a Spring Boot application.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads