Data Ingestion via Excel: Comparing runtimes
Data ingestion is the process of obtaining and importing the data for the storage in the database. In this article, we explore different data ingestion techniques used to extract the data from the excel file in Python and compare their runtimes.
Let’s suppose the excel file looks like this –
Using xlrd library
Using xlrd
module, one can retrieve information from a spreadsheet. For example, reading, writing or modifying the data can be done in Python. Also, user might have to go through various sheets and retrieve data based on some criteria or modify some rows and columns and do a lot of work.
import xlrd
import time
t1 = time.time()
workbook = xlrd.open_workbook( 'excel.xls' )
sheet = workbook.sheet_by_index( 0 )
for i in range (sheet.nrows):
row = sheet.row_values(i)
print (row)
t2 = time.time()
print ( "\nTime taken by xlrd:" )
print (t2 - t1)
|
Output:
Using Pandas
Python data analysis library is a powerful tool used by data scientists. It helps in data ingestion and data exploration.
import pandas as pd
import time
t1 = time.time()
data = pd.read_excel( 'excel.xls' )
print (data.head())
t2 = time.time()
print ( "\nTime taken by xlrd:" )
print (t2 - t1)
|
Output:
Using dask dataframe
A Dask DataFrame is a large parallel DataFrame composed of many smaller Pandas DataFrames, split along the index.
import dask
import dask.dataframe as dd
import pandas as pd
from dask.delayed import delayed
import time
t1 = time.time()
parts = dask.delayed(pd.read_excel)( 'excel.xls' ,
sheet_name = 0 )
df = dd.from_delayed(parts)
print (df.head())
t2 = time.time()
print ( "\nTime taken by Dask:" )
print (t2 - t1)
|
Output:
Last Updated :
01 Mar, 2020
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...