Batch Processing – MongoDB to CSV Export using Spring Batch

Last Updated : 07 Mar, 2024

Batch processing is a common requirement while dealing with large volumes of data. Batch processing plays an important role in handling large datasets efficiently using chunks or batches. Spring framework provides a flexible framework for building batch-processing applications in Java.

Steps to Export MongoDB to CSV File

Below are the steps to read from MongoDB and generate CSV files or export to CSV files.

Step 1: Create Spring Boot Project

Create a Spring boot starter project and add the necessary dependencies. Here is the attachment on how to create and add the dependencies.

Spring Boot Project Creation

Create the project and click on next in the next step add the required dependencies as shown below.

Importing required Dependencies

For Batch Processing, the required dependencies are Spring Batch, Spring Data MongoDB.

Step 2: Creating the Entity

Let us consider Employee as an entity class for better understanding.

Define the Employee Entity class with the fields representing the employee attributes and after annotating the class with @Document annotation to specify the MongoDB collection name.

Java

package com.app.entity; 
  
import org.springframework.data.annotation.Id; 
import org.springframework.data.mongodb.core.mapping.Document; 
  
/** 
 * Represents an Employee entity. 
 */
@Document(collection = "Employee") 
public class Employee { 
    @Id
    private int id; 
    private String firstName; 
    private String lastName; 
    private String department; 
    private int salary; 
  
    public Employee() { 
        super(); 
    } 
    //Constructor 
    public Employee(int id, String firstName, String lastName, String department, int salary) { 
        super(); 
        this.id = id; 
        this.firstName = firstName; 
        this.lastName = lastName; 
        this.department = department; 
        this.salary = salary; 
    } 
    // getter and setters 
    public int getId() { 
        return id; 
    } 
  
    public void setId(int id) { 
        this.id = id; 
    } 
  
    public String getFirstName() { 
        return firstName; 
    } 
  
    public void setFirstName(String firstName) { 
        this.firstName = firstName; 
    } 
  
    public String getLastName() { 
        return lastName; 
    } 
  
    public void setLastName(String lastName) { 
        this.lastName = lastName; 
    } 
  
    public String getDepartment() { 
        return department; 
    } 
  
    public void setDepartment(String department) { 
        this.department = department; 
    } 
  
    public int getSalary() { 
        return salary; 
    } 
  
    public void setSalary(int salary) { 
        this.salary = salary; 
    } 
} 

In the above Java class, we have added some required attributes.

Step 3: Configuring the Spring batch

Create the configuration class for batch processing and annotate the class with the annotations @Configuration and @EnableBatchProcessing, define a job in the corresponding class to read the data from MongoDb and Write it to a CSV file using Spring Batch.

Here is the MongoDBReader class.

Java

import java.util.HashMap; 
  
import org.springframework.batch.core.Job; 
import org.springframework.batch.core.Step; 
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing; 
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory; 
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory; 
import org.springframework.batch.item.data.MongoItemReader; 
import org.springframework.batch.item.file.FlatFileItemWriter; 
import org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor; 
import org.springframework.batch.item.file.transform.DelimitedLineAggregator; 
import org.springframework.beans.factory.annotation.Autowired; 
import org.springframework.context.annotation.Bean; 
import org.springframework.context.annotation.Configuration; 
import org.springframework.core.io.FileSystemResource; 
import org.springframework.data.domain.Sort; 
import org.springframework.data.domain.Sort.Direction; 
import org.springframework.data.mongodb.core.MongoTemplate; 
  
import com.app.entity.Employee; 
  
// Configuration class for reading data from MongoDB 
@Configuration
@EnableBatchProcessing
public class MongoDbReader { 
  
    // Autowiring JobBuilderFactory to create jobs 
    @Autowired
    private JobBuilderFactory jobBuilderFactory; 
  
    // Autowiring StepBuilderFactory to create steps 
    @Autowired
    private StepBuilderFactory stepBuilderFactory; 
  
    // Autowiring MongoTemplate for MongoDB operations 
    @Autowired
    private MongoTemplate mongoTemplate; 
  
    // Method to create a job for reading employee data 
    @Bean
    public Job readEmployee() throws Exception { 
        return jobBuilderFactory.get("readEmployee").flow(step1()).end().build(); 
    } 
  
    // Method to create a step for processing employee data 
    @Bean
    public Step step1() throws Exception { 
        return stepBuilderFactory.get("step1").<Employee, Employee>chunk(5).reader(reader()) 
                .writer(writer()).build(); 
    } 
  
    // Method to create a reader for reading employee data from MongoDB 
    @Bean
    public MongoItemReader<Employee> reader() { 
        MongoItemReader<Employee> reader = new MongoItemReader<>(); 
        reader.setTemplate(mongoTemplate); 
        reader.setSort(new HashMap<String, Sort.Direction>() {{ 
            put("_id", Direction.DESC); 
        }}); 
  
        reader.setTargetType(Employee.class); 
        reader.setQuery("{}"); 
        return reader; 
    } 
  
    // Method to create a writer for writing employee data to a CSV file 
    @Bean
    public FlatFileItemWriter<Employee> writer() { 
        FlatFileItemWriter<Employee> writer = new FlatFileItemWriter<>(); 
        writer.setResource(new FileSystemResource("src/main/resources/employee.csv")); 
        // Use append mode 
        writer.setAppendAllowed(true); 
        DelimitedLineAggregator<Employee> dl = new DelimitedLineAggregator<>(); 
        BeanWrapperFieldExtractor<Employee> bf = new BeanWrapperFieldExtractor<>(); 
        bf.setNames(new String[]{"id", "firstName", "lastName", "department", "salary"}); 
        dl.setFieldExtractor(bf); 
        writer.setLineAggregator(dl); 
        return writer; 
    } 
  
} 

In the above class we have done the following things:

Implemented the ItemReader: The custom ItemReader reader() method in the above class is used to read the data from MongoDB using the MongoItemReader. We have configured the reader to connect to MongoDB.
Implemented the ItemWriter: We have used the custom ItemWriter writer() method in the above class to write the data to CSV file using FlatFileItemWriter. We have configured the writer to specify the outpul file location and format of the data.

Step 4: Scheduling the Batch Job

Create the configuration class with the annotations @Configuration and @EnableScheduling and implement a scheduled task to trigger the spring batch job at regular intervals using @Scheduled annotation, there operation can be performed based on the requirement.

Java

package com.app.config; 
  
import java.text.SimpleDateFormat; 
  
import org.springframework.batch.core.Job; 
import org.springframework.batch.core.JobParameters; 
import org.springframework.batch.core.JobParametersBuilder; 
import org.springframework.batch.core.launch.JobLauncher; 
import org.springframework.beans.factory.annotation.Autowired; 
import org.springframework.context.annotation.Configuration; 
import org.springframework.scheduling.annotation.EnableScheduling; 
import org.springframework.scheduling.annotation.Scheduled; 
  
@Configuration
@EnableScheduling
public class SchedulerConfig { 
  
    // Inject JobLauncher and Job beans from Spring context 
    @Autowired
    private JobLauncher jobLauncher; 
    @Autowired
    private Job job; 
  
    // Create a SimpleDateFormat object for date formatting 
    private SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.S"); 
  
    // Define a scheduled task using @Scheduled annotation 
    // The task will be executed every 50 seconds with an initial delay of 5 seconds 
    @Scheduled(fixedDelay = 50000, initialDelay = 5000) 
    public void scheduleByFixedRate() throws Exception { 
        System.out.println("Batch job starting"); 
  
        try { 
            // Create JobParameters with a unique identifier 
            JobParameters jobParameters = new JobParametersBuilder() 
                    .addLong("startAt", System.currentTimeMillis()).toJobParameters(); 
              
            // Run the job using JobLauncher and JobParameters 
            jobLauncher.run(job, jobParameters); 
  
            System.out.println("Batch job executed successfully\n"); 
        } catch (Exception e) { 
            // Handle any exceptions that occur during job execution 
            System.err.println("Error executing batch job: " + e.getMessage()); 
        } 
    } 
} 

Step 5: Running the Application

After completing all the setup, then run the spring boot application and then the csv file will be generated in the specific project location as shown below:

CSV file Generated

As we can see in the above attachment, after running the spring boot application the one CSV file is generated named employee.csv in the location src/main/resource and in the csv file the data is shown below.

CSV File Data