Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, and makes importing and analyzing data much easier. In this article, I have used Pandas to analyze data on Country Data.csv file from UN public Data Sets of a popular ‘statweb.stanford.edu’ website.
As I have analyzed the Indian Country Data, I have introduced Pandas key concepts as below. Before going through this article, have a rough idea of basics from matplotlib and csv.
Easiest way to install pandas is to use pip:
pip install pandas
or, Download it from here
Creating A DataFrame in Pandas
Creation of dataframe is done by passing multiple Series into the DataFrame class using pd.Series method. Here, it is passed in the two Series objects, s1 as the first row, and s2 as the second row.
Importing Data with Pandas
The first step is to read the data. The data is stored as a comma-separated values, or csv, file, where each row is separated by a new line, and each column by a comma (,). In order to be able to work with the data in Python, it is needed to read the csv file into a Pandas DataFrame. A DataFrame is a way to represent and work with tabular data. Tabular data has rows and columns, just like this csv file(Click Download).
Indexing DataFrames with Pandas
Indexing can be possible using the pandas.DataFrame.iloc method. The iloc method allows to retrieve as many as rows and columns by position.
Indexing Using Labels in Pandas
Indexing can be worked with labels using the pandas.DataFrame.loc method, which allows to index using labels instead of positions.
The above doesn’t actually look much different from df.iloc[0:5,:]. This is because while row labels can take on any values, our row labels match the positions exactly. But column labels can make things much easier when working with data. Example:
DataFrame Math with Pandas
Computation of data frames can be done by using Statistical Functions of pandas tools.
Plots in these examples are made using standard convention for referencing the matplotlib API which provides the basics in pandas to easily create decent looking plots.
This article is contributed by Afzal_Saan. If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.
- Data Analysis and Visualization with Python | Set 2
- Visualization and Prediction of Crop Production data using Python
- Data visualization with different Charts in Python
- Python - Data visualization using Bokeh
- COVID-19 Data Visualization using matplotlib in Python
- Data Visualization using Turicreate in Python
- Top 8 Python Libraries for Data Visualization
- Data Visualization Using Chartjs and Django
- Pandas Built-in Data Visualization | ML
- 10 Best Data Visualization Tools in 2020
- Interactive visualization of data using Bokeh
- Animated Data Visualization using Plotly Express
- Data Visualization with Seaborn Line Plot
- Mandelbrot Fractal Set visualization in Python
- Insertion Sort Visualization using Matplotlib in Python
- Binary Search Visualization using Pygame in Python
- Box plot visualization with Pandas and Seaborn
- KDE Plot Visualization with Pandas and Seaborn
- Directed Graphs, Multigraphs and Visualization in Networkx
- Analysis of test data using K-Means Clustering in Python