In this article, we are going to see how to increase the speed of computation of the pandas using modin library. Modin is a python library very similar to pandas (almost identical in terms of syntax) capable of handling a huge dataset that cannot fit into RAM in one go. Pandas are good enough in terms of speed of execution for datasets in size of in MB’s and few GB’s but when we are dealing with really large datasets speed to process the data becomes the bottleneck.
Pandas library was designed to work on single-core and therefore with modern age compute power every personal laptop comes with now at least 2 cores and Modin just exploits this opportunity of executing the operations on all available cores thus speeding up the whole process.
To install Modin and all it’s dependencies use any of the below pip commands.
pip install modin[ray]
pip install modin[dask]
pip install modin[all]
To limit the number of CPUs to use we can add the below 2 lines of code in your script
import os # this specifies the number of # CPUs to use. os.environ["MODIN_CPUS"] = "2"
Example 1: Dataframe Append Operation:
Append() operations are very common in pandas and in the code below here we have demonstrated this by running it 10 times using both pandas and Modin and timed it against each other to see the speedup difference. Clearly, Modin beats pandas as it uses all the cores available on my system. Also using the time module to measure the operations speed to compare with each other, and it turns out that Modin is 25x Times faster than pandas in this case.
Pandas Appending Time :0.682852745056152 Modin Appending Time :0.027661800384521484
Example 2: Modin is 4.4x Times faster than pandas.
Here we are using a CSV file of size 602 MB which can be downloaded from this link. Also renamed the file as demo.csv to keep it short. In the code below here we used fillna() method which goes through the entire DataFrame and fills all NaN values with the desired value in my example it’s 0.
Pandas fillna Time: 1.2 sec Modin fillna Time: 0.27 sec
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course