Let us see how to delete several repeated lines from a file using Python’s File Handling power. If the file is small with a few lines, then the task of deleting/eliminating repeated lines from it could be done manually, but when it comes to large files, this is where Python comes to your rescue.
Approach :
- Open the input file using using the open() function and pass in the flag -r to open in reading mode.
- Open an output file, using the -w flag, where we would store the contents of file after deleting all repeated lines from it.
- Using the set() method keep track of all the lines seen so far, so that we can compare it with the current reading line.
- Now, iterate over each line of input file and compare it with the lines seen so far.
- If the current line is also present in lines seen so far, then skip that line else write that line to the output file, and don’t forget to add the current line to the lines seen so far.
- Close the files.
Example:
For the sake of this example lets create a file (Lorem_input.txt) with some lipsum text in it. All the repeated lines are marked in bold.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Phasellus est neque, mollis vel massa vel, condimentum facilisis ipsum. Mauris vitae mollis magna.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Aliquam laoreet vitae nisi quis rutrum. Sed ut ligula nec enim consequat egestas vel a sapien. Pellentesque sit amet euismod felis. Pellentesque in nibh ultricies, convallis sapien id, sagittis odio. Vivamus placerat ex sed ligula porttitor dignissim.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Morbi posuere eget odio ut venenatis. Nam lobortis bibendum maximus. Donec venenatis sapien sed varius accumsan.
Now lets create an empty output file (Lorem_output.txt), where we will store the modified input file.
Python3
# creating the output file outputFile = open ( 'C:/Users/user/Desktop/Lorem_output.txt' , "w" ) # reading the input file inputFile = open ( 'C:/Users/user/Desktop/Lorem_input.txt' , "r" ) # holds lines already seen lines_seen_so_far = set () # iterating each line in the file for line in inputFile: # checking if line is unique if line not in lines_seen_so_far: # write unique lines in output file outputFile.write(line) # adds unique lines to lines_seen_so_far lines_seen_so_far.add(line) # closing the file inputFile.close() outputFile.close() |
Running the above Python script will remove all the repeated lines from the input file and write the modified file to output file. After running this script the output file(Lorem_output.txt) will look something like this
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Phasellus est neque, mollis vel massa vel, condimentum facilisis ipsum. Mauris vitae mollis magna.
Aliquam laoreet vitae nisi quis rutrum. Sed ut ligula nec enim consequat egestas vel a sapien. Pellentesque sit amet euismod felis. Pellentesque in nibh ultricies, convallis sapien id, sagittis odio. Vivamus placerat ex sed ligula porttitor dignissim.
Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Morbi posuere eget odio ut venenatis. Nam lobortis bibendum maximus. Donec venenatis sapien sed varius accumsan.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.