How to read multiple data files into Pandas?
In this article, we are going to see how to read multiple data files into pandas, data files are of multiple types, here are a few ways to read multiple files by using the pandas package in python.
The demonstrative files can be download from here
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course
Method 1: Reading CSV files
If our data files are in CSV format then the read_csv() method must be used. read_csv takes a file path as an argument. it reads the content of the CSV. To read multiple CSV files we can just use a simple for loop and iterate over all the files.
Example: Reading Multiple CSV files using Pandas
In this example we make a list of our data files or file path and then iterate through the file paths using a for loop, a for loop is used to iterate through iterables like list, tuples, strings, etc. And then create a data frame using pd.DataFrame(), concatenate each dataframe into a main dataframe using pd.concat(), then convert the final main dataframe into a CSV file using to_csv() method which takes the name of the new CSV file we want to create as an argument.
Method 2: Using the glob package
The glob module in python is used to retrieve files or pathnames matching a specified pattern.
This program is similar to the above program but the only difference is instead of keeping track of file names using a list we use the glob package to retrieve files matching a specified pattern.
Example: Reading multiple CSV files using Pandas and glob.
Method 3: Reading text files using Pandas:
To read text files, the panda’s method read_table() must be used.
Example: Reading text file using pandas and glob.
Using glob package to retrieve files or pathnames and then iterate through the file paths using a for loop. Create a data frame of the contents of each file after reading it using pd.read_table() method which takes the file path as an argument. Concatenate each dataframe into a main dataframe using pd.concat(), then convert the final main dataframe into a CSV file using to_csv() method which takes the name of the new CSV file we want to create as an argument.