Java program to delete duplicate lines in text file

Prerequisite : PrintWriter , BufferedReader

Given a file input.txt . Our Task is to remove duplicate lines from it and save the output in file say output.txt

Naive Algorithm :

1. Create PrintWriter object for output.txt
2. Open BufferedReader for input.txt
3. Run a loop for each line of input.txt
   3.1 flag = false
   3.2 Open BufferedReader for output.txt
   3.3 Run a loop for each line of output.txt
      ->  If  line of output.txt is equal to current line of input.txt 
            -> flag = true
            -> break loop

4. Check flag, if false
     -> write current line of input.txt to output.txt
     -> Flush PrintWriter stream

5. Close resources.

To successfully run the below program input.txt must exits in same folder OR provide full path for it.

// Java program to remove
// duplicates from input.txt and 
// save output to output.txt

import java.io.*;

public class FileOperation
{
    public static void main(String[] args) throws IOException 
    {
        // PrintWriter object for output.txt
        PrintWriter pw = new PrintWriter("output.txt");
        
        // BufferedReader object for input.txt
        BufferedReader br1 = new BufferedReader(new FileReader("input.txt"));
        
        String line1 = br1.readLine();
        
        // loop for each line of input.txt
        while(line1 != null)
        {
            boolean flag = false;
            
            // BufferedReader object for output.txt
            BufferedReader br2 = new BufferedReader(new FileReader("output.txt"));
            
            String line2 = br2.readLine();
            
            // loop for each line of output.txt
            while(line2 != null)
            {
                
                if(line1.equals(line2))
                {
                    flag = true;
                    break;
                }
                
                line2 = br2.readLine();
            
            }
            
            // if flag = false
            // write line of input.txt to output.txt
            if(!flag){
                pw.println(line1);
                
                // flushing is important here
                pw.flush();
            }
            
            line1 = br1.readLine();
            
        }
        
        // closing resources
        br1.close();
        pw.close();
        
        System.out.println("File operation performed successfully");
    }
}

Output:



File operation performed successfully

Note : If output.txt exist in cwd(current working directory) then it will be overwritten by above program otherwise new file will be created.

A better solution is to use HashSet to store each line of input.txt. As set ignores duplicate values, so while storing a line, check if it already present in hashset. Write it to output.txt only if not present in hashset.

To successfully run the below program input.txt must exits in same folder OR provide full path for them.

// Efficient Java program to remove
// duplicates from input.txt and 
// save output to output.txt

import java.io.*;
import java.util.HashSet;

public class FileOperation
{
    public static void main(String[] args) throws IOException 
    {
        // PrintWriter object for output.txt
        PrintWriter pw = new PrintWriter("output.txt");
        
        // BufferedReader object for input.txt
        BufferedReader br = new BufferedReader(new FileReader("input.txt"));
        
        String line = br.readLine();
        
        // set store unique values
        HashSet<String> hs = new HashSet<String>();
        
        // loop for each line of input.txt
        while(line != null)
        {
            // write only if not
            // present in hashset
            if(hs.add(line))
                pw.println(line);
            
            line = br.readLine();
            
        }
        
        pw.flush();
        
        // closing resources
        br.close();
        pw.close();
        
        System.out.println("File operation performed successfully");
    }
}

Output:

File operation performed successfully

Note : If output.txt exist in cwd(current working directory) then it will be overwritten by above program otherwise new file will be created.

This article is contributed by Gaurav Miglani. If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.



My Personal Notes arrow_drop_up



Practice Tags :

Recommended Posts:



2.5 Average Difficulty : 2.5/5.0
Based on 2 vote(s)






User Actions