How to read large text files in Python?
Python is an open-source dynamically typed and interpreted programming language. Reading and writing files are an integral part of programming. In Python, files are read by using the readlines() method. The readlines() method returns a list where each item of the list is a complete sentence in the file. This method is useful when the file size is small. Since readlines() method appends each line to the list and then returns the entire list it will be time-consuming if the file size is extremely large say in GB. Also, the list will consume a large chunk of the memory which can cause memory leakage if sufficient memory is unavailable. To avoid this problem, we can use the file object as an iterator to iterate over the file and perform the required task. Since the iterator just iterates over the entire file and does not require any additional data structure for data storage, the memory consumed is less comparatively. Also, the iterator does not perform expensive operations like appending hence it is time-efficient as well. Files are iterable in Python hence it is advisable to use iterators.
The following two programs demonstrate how large text files can be read using Python.
The first approach makes use of iterator to iterate over the file. In this technique, we use the fileinput module in Python. The input() method of fileinput module can be used to read files. The advantage of using this method over readlines() is fileinput.input() does not load the entire file into memory. Hence, there is no chance of memory leakage. The fileinput.input() method takes a list of filenames and if no parameter is passed it accepts input from the stdin. The method returns an iterator which returns individual lines from the text file being scanned.
The input() method returns an iterator which scans the entire file and prints each line.
The second approach also uses an iterator to read the file. The only difference is we will use iterator of a file object. The method used is open() wraps the entire file into a file object. Next, we use an iterator to get the lines in the file object. We open the file in a ‘with’ block as it automatically closes the file as soon as the entire block executes. As the with block completes the __exit__() method is called which releases any open resources.
The time required in this approach is comparatively less. This program can be written without the block as well but in that case, we must make sure to close the file resource explicitly.