Skip to content
Related Articles

Related Articles

How to read large text files in Python?
  • Last Updated : 29 Dec, 2020

Python is an open-source dynamically typed and interpreted programming language. Reading and writing files are an integral part of programming. In Python, files are read by using the readlines() method. The readlines() method returns a list where each item of the list is a complete sentence in the file. This method is useful when the file size is small. Since readlines() method appends each line to the list and then returns the entire list it will be time-consuming if the file size is extremely large say in GB. Also, the list will consume a large chunk of the memory which can cause memory leakage if sufficient memory is unavailable. To avoid this problem, we can use the file object as an iterator to iterate over the file and perform the required task. Since the iterator just iterates over the entire file and does not require any additional data structure for data storage, the memory consumed is less comparatively. Also, the iterator does not perform expensive operations like appending hence it is time-efficient as well. Files are iterable in Python hence it is advisable to use iterators. 

The following two programs demonstrate how large text files can be read using Python.

Method 1:

The first approach makes use of iterator to iterate over the file. In this technique, we use the fileinput module in Python. The input() method of fileinput module can be used to read files. The advantage of using this method over readlines() is fileinput.input() does not load the entire file into memory. Hence, there is no chance of memory leakage. The fileinput.input() method takes a list of filenames and if no parameter is passed it accepts input from the stdin. The method returns an iterator which returns individual lines from the text file being scanned.

Code Implementation:



Python3




# import module
import fileinput
import time
  
#time at the start of program is noted
start = time.time()
  
#keeps a track of number of lines in the file
count = 0
for lines in fileinput.input(['sample.txt']):
    print(lines)
    count = count + 1
      
#time at the end of program execution is noted
end = time.time()
  
#total time taken to print the file
print("Execution time in seconds: ",(end - start))
print("No. of lines printed: ",count)

Output:

Explanation:

The input() method returns an iterator which scans the entire file and prints each line.

Method 2:

The second approach also uses an iterator to read the file. The only difference is we will use iterator of a file object. The method used is open() wraps the entire file into a file object. Next, we use an iterator to get the lines in the file object. We open the file in a ‘with’ block as it automatically closes the file as soon as the entire block executes. As the with block completes the __exit__() method is called which releases any open resources.

Code Implementation:



Python3




import time
  
start = time.time()
count = 0
with open("sample.txt") as file:
    for line in file:
       print(line)
       count = count + 1
end =  time.time()
print("Execution time in seconds: ",(end-start))
print("No of lines printed: ",count)

Output:

Explanation:

The time required in this approach is comparatively less. This program can be written without the block as well but in that case, we must make sure to close the file resource explicitly.

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course

My Personal Notes arrow_drop_up
Recommended Articles
Page :