Python | Read csv using pandas.read_csv()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
Most of the data for analysis is available in the form of a tabular format such as Excel and Comma Separated files(CSV). To access data from csv file, we require a function read_csv() that retrieves data in the form of data frame. Before using this function, we must import the pandas library.
Importing Pandas library:
import pandas as pd
The read_csv() function is used to retrieve data from csv file. The syntax of read_csv() method is:
pd.read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
Code #1 Retrieving data from csv file
Here is the list of parameters with their Default values. Not all of them are much important but remembering these actually save time of performing some functions on own. One can see parameters of any function by pressing shift + tab in jupyter notebook. Useful ones are given below with their usage :
- filepath_or_buffer: It is the location of the file which is to be retrieved using this function. It accepts any string path or URL of the file.
- sep: It stands for separator, default is ‘, ‘ as in csv(comma separated values).
- header: It accepts int, list of int, row numbers to use as the column names and start of the data. If no names are passed, i.e., header=None, then, it will display first column as 0, second as 1, and so on.
- usecols: It is used to retrieve only selected columns from the csv file.
- nrows: It means number of rows to be displayed from the dataset.
- index_col: If None, there are no index numbers displayed along with records.
- squeeze: If true and only one column is passed, returns pandas series.
- skiprows: Skips passed rows in new data frame.
- names: It allows to retrieve columns with new names.
|filepath_or_buffer||URL or Dir location of file|
|sep||Stands for separator, default is ‘, ‘ as in csv(comma separated values)|
Makes passed column as index instead of 0, 1, 2, 3…r
Makes passed row/s[int/int list] as header
|use_cols||Only uses the passed col[string list] to make data frame|
|squeeze||If true and only one column is passed, returns pandas series|
|skiprows||Skips passed rows in new data frame|
Refer the link to data set used from here.
Code #2 :