Skip to content
Related Articles

Related Articles

Eliminating repeated lines from a file using Python
  • Last Updated : 29 Dec, 2020

Let us see how to delete several repeated lines from a file using Python’s File Handling power. If the file is small with a few lines, then the task of deleting/eliminating repeated lines from it could be done manually, but when it comes to large files, this is where Python comes to your rescue. 

Approach :

  1. Open the input file using using the open() function and pass in the flag -r to open in reading mode.
  2. Open an output file, using the -w flag, where we would store the contents of file after deleting all repeated lines from it.
  3. Using the set() method keep track of all the lines seen so far, so that we can compare it with the current reading line.
  4. Now, iterate over each line of input file and compare it with the lines seen so far.
  5. If the current line is also present in lines seen so far, then skip that line else write that line to the output file, and don’t forget to add the current line to the lines seen so far.
  6. Close the files.

Example: 

For the sake of this example lets create a file (Lorem_input.txt) with some lipsum text in it. All the repeated lines are marked in bold.

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Phasellus est neque, mollis vel massa vel, condimentum facilisis ipsum. Mauris vitae mollis magna.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Aliquam laoreet vitae nisi quis rutrum. Sed ut ligula nec enim consequat egestas vel a sapien. Pellentesque sit amet euismod felis. Pellentesque in nibh ultricies, convallis sapien id, sagittis odio. Vivamus placerat ex sed ligula porttitor dignissim.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Morbi posuere eget odio ut venenatis. Nam lobortis bibendum maximus. Donec venenatis sapien sed varius accumsan.



Now lets create an empty output file (Lorem_output.txt), where we will store the modified input file.

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# creating the output file
outputFile = open('C:/Users/user/Desktop/Lorem_output.txt', "w")
  
# reading the input file
inputFile = open('C:/Users/user/Desktop/Lorem_input.txt', "r")
  
# holds lines already seen
lines_seen_so_far = set()
  
# iterating each line in the file
for line in inputFile:
  
    # checking if line is unique
    if line not in lines_seen_so_far:
  
        # write unique lines in output file
        outputFile.write(line)
  
        # adds unique lines to lines_seen_so_far
        lines_seen_so_far.add(line)        
  
# closing the file
inputFile.close()
outputFile.close()

chevron_right


Running the above Python script will remove all the repeated lines from the input file and write the modified file to output file. After running this script the output file(Lorem_output.txt) will look something like this

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Phasellus est neque, mollis vel massa vel, condimentum facilisis ipsum. Mauris vitae mollis magna.
Aliquam laoreet vitae nisi quis rutrum. Sed ut ligula nec enim consequat egestas vel a sapien. Pellentesque sit amet euismod felis. Pellentesque in nibh ultricies, convallis sapien id, sagittis odio. Vivamus placerat ex sed ligula porttitor dignissim.
Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Morbi posuere eget odio ut venenatis. Nam lobortis bibendum maximus. Donec venenatis sapien sed varius accumsan.

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.

My Personal Notes arrow_drop_up
Recommended Articles
Page :