Pandas Introduction

Last Updated : 23 Apr, 2024

Pandas is a powerful and open-source Python library. The Pandas library is used for data manipulation and analysis. Pandas consist of data structures and functions to perform efficient operations on data.

This free tutorial will cover an overview of Pandas, covering the fundamentals of Python Pandas.

Table of Content

What is Pandas Libray in Python?
What can you do using Pandas?
Getting Started with Pandas
Data Structures in Pandas Library
Pandas Series
Pandas DataFrame
How to run the Pandas Program in Python?

What is Pandas Libray in Python?

Pandas is a powerful and versatile library that simplifies the tasks of data manipulation in Python.

Pandas is well-suited for working with tabular data, such as spreadsheets or SQL tables.

The Pandas library is an essential tool for data analysts, scientists, and engineers working with structured data in Python.

Did you know?

Pandas name is derived from “panel data” and is also refered as “Python Data Analysis“.

What is Python Pandas used for?

The Pandas library is generally used for data science, but have you wondered why? This is because the Pandas library is used in conjunction with other libraries that are used for data science.

It is built on top of the NumPy library which means that a lot of the structures of NumPy are used or replicated in Pandas.

The data produced by Pandas is often used as input for plotting functions in Matplotlib, statistical analysis in SciPy, and machine learning algorithms in Scikit-learn.

You must be wondering, Why should you use the Pandas Library. Python’s Pandas library is the best tool to analyze, clean, and manipulate data.

Here is a list of things that we can do using Pandas.

Data set cleaning, merging, and joining.
Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data.
Columns can be inserted and deleted from DataFrame and higher-dimensional objects.
Powerful group by functionality for performing split-apply-combine operations on data sets.
Data Visualization.

Getting Started with Pandas

Let’s see how to start working with the Python Pandas library:

Installing Pandas

The first step in working with Pandas is to ensure whether it is installed in the system or not. If not, then we need to install it on our system using the pip command.

Follow these steps to install Pandas:

Step 1: Type ‘cmd’ in the search box and open it.
Step 2: Locate the folder using the cd command where the python-pip file has been installed.
Step 3: After locating it, type the command:

pip install pandas

For more reference, take a look at this article on installing pandas follows.

Importing Pandas

After the Pandas have been installed in the system, you need to import the library. This module is generally imported as follows:

import pandas as pd

Note: Here, pd is referred to as an alias for the Pandas. However, it is not necessary to import the library using the alias, it just helps in writing less code every time a method or property is called.

Data Structures in Pandas Library

Pandas generally provide two data structures for manipulating data. They are:

Series
DataFrame

Pandas Series

A Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, Python objects, etc.). The axis labels are collectively called indexes.

The Pandas Series is nothing but a column in an Excel sheet. Labels need not be unique but must be of a hashable type.

The object supports both integer and label-based indexing and provides a host of methods for performing operations involving the index.

Pandas Series

Creating a Series

Pandas Series is created by loading the datasets from existing storage (which can be a SQL database, a CSV file, or an Excel file).

Pandas Series can be created from lists, dictionaries, scalar values, etc.

Example: Creating a series using the Pandas Library.

Python3

import pandas as pd  
import numpy as np 
  
# Creating empty series  
ser = pd.Series()  
print("Pandas Series: ", ser)  
  
# simple array  
data = np.array(['g', 'e', 'e', 'k', 's'])  
    
ser = pd.Series(data)  
print("Pandas Series:\n", ser)

Output

Pandas Series: Series([], dtype: float64)
Pandas Series:
0    g
1    e
2    e
3    k
4    s
dtype: object

For more information, refer to Creating a Pandas Series

Pandas DataFrame

Pandas DataFrame is a two-dimensional data structure with labeled axes (rows and columns).

Creating DataFrame

Pandas DataFrame is created by loading the datasets from existing storage (which can be a SQL database, a CSV file, or an Excel file).

Pandas DataFrame can be created from lists, dictionaries, a list of dictionaries, etc.

Example: Creating a DataFrame Using the Pandas Library

Python3

import pandas as pd  
    
# Calling DataFrame constructor  
df = pd.DataFrame()  
print(df) 
  
# list of strings  
lst = ['Geeks', 'For', 'Geeks', 'is', 'portal', 'for', 'Geeks']  
    
# Calling DataFrame constructor on list  
df = pd.DataFrame(lst)  
print(df)

Output:

Empty DataFrame
Columns: []
Index: []
        0
0   Geeks
1     For
2   Geeks
3      is
4  portal
5     for
6   Geeks

Note: For more information, refer to Creating a Pandas DataFrame

How to run the Pandas Program in Python?

The Pandas program can be run from any text editor, but it is recommended to use Jupyter Notebook for this, as Jupyter gives you the ability to execute code in a particular cell rather than the entire file.

Jupyter also provides an easy way to visualize Pandas DataFrame and plots.

Note: For more information on Jupyter Notebook, refer to How To Use Jupyter Notebook – An Ultimate Guide

Conclusion

This tutorial provides a solid foundation for mastering the Pandas library, from basic operations to advanced techniques. We have also covered the Pandas data structures (series and DataFrame) with examples.

After completing this tutorial, you will gain a complete idea of what is Python Pandas. What is Pandas used for? and how to use Python Pandas.

As you apply these skills to your projects, you will discover how Pandas enhances your ability to explore, clean, and analyze data, making it an indispensable tool in the data scientist’s toolkit.

Suggest improvement

Pandas Tutorial

How to Install Pandas in Python?

Share your thoughts in the comments

Introduction

Creating Objects

Viewing Data

Selection & Slicing

Operations

Manipulating Data

Grouping Data

Merging, Joining, Concatenating and Comparing

Working with Date and Time

Working With Text Data

Working with CSV and Excel files

Visualization

Applications and Projects

Pandas Introduction

What is Pandas Libray in Python?

What is Python Pandas used for?

Getting Started with Pandas

Installing Pandas

Importing Pandas

Data Structures in Pandas Library

Pandas Series

Creating a Series

Python3

Pandas DataFrame

Creating DataFrame

Python3

How to run the Pandas Program in Python?

Conclusion

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?