Open In App
Related Articles

Pandas Introduction

Improve Article
Improve
Save Article
Save
Like Article
Like

Pandas is a powerful and open-source library Python library for data manipulation and analysis, providing data structures and functions for efficient operations.

What is Pandas?

Pandas is a powerful and versatile library that simplifies tasks of data manipulation in Python . Pandas is built on top of the NumPy library and is particularly well-suited for working with tabular data, such as spreadsheets or SQL tables. Its versatility and ease of use make it an essential tool for data analysts, scientists, and engineers working with structured data in Python.

What can you do using Pandas?

Pandas are generally used for data science but have you wondered why? This is because pandas are used in conjunction with other libraries that are used for data science. It is built on the top of the NumPy library which means that a lot of structures of NumPy are used or replicated in Pandas. The data produced by Pandas are often used as input for plotting functions of Matplotlib, statistical analysis in SciPy, and machine learning algorithms in Scikit-learn. Here is a list of things that we can do using Pandas.

  • Data set cleaning, merging, and joining.
  • Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data.
  • Columns can be inserted and deleted from DataFrame and higher dimensional objects.
  • Powerful group by functionality for performing split-apply-combine operations on data sets.
  • Data Visulaization

Getting Started with Pandas

Installing Pandas

The first step of working in pandas is to ensure whether it is installed in the system or not.  If not then we need to install it in our system using the pip command. Type the cmd command in the search box and locate the folder using the cd command where python-pip file has been installed. After locating it, type the command:

pip install pandas

For more reference take a look at this article on installing pandas follows.

Importing Pandas

After the pandas have been installed into the system, you need to import the library. This module is generally imported as follows:

import pandas as pd

Here, pd is referred to as an alias to the Pandas. However, it is not necessary to import the library using the alias, it just helps in writing less amount code every time a method or property is called. 

Pandas Data Structures

Pandas generally provide two data structures for manipulating data, They are: 

  • Series
  • DataFrame

Pandas Series

A Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called indexes.
Pandas Series is nothing but a column in an Excel sheet. Labels need not be unique but must be a hashable type. The object supports both integer and label-based indexing and provides a host of methods for performing operations involving the index.

Series Data Frame

Note: For more information, refer to Python | Pandas Series 

Creating a Series

In the real world, a Pandas Series will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, or an Excel file. Pandas Series can be created from lists, dictionaries, and from scalar values, etc.

Example:

Python3

import pandas as pd 
import numpy as np

# Creating empty series 
ser = pd.Series() 
print("Pandas Series: ", ser) 

# simple array 
data = np.array(['g', 'e', 'e', 'k', 's']) 
  
ser = pd.Series(data) 
print("Pandas Series:\n", ser)

Output:

Pandas Series: Series([], dtype: float64)
Pandas Series:
0    g
1    e
2    e
3    k
4    s
dtype: object

Note: For more information, refer to Creating a Pandas Series

DataFrame

Pandas DataFrame is a two-dimensional data structure with labeled axes (rows and columns).

Note: For more information, refer to Python | Pandas DataFrame 

Creating Data Frame

In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, or an Excel file. Pandas DataFrame can be created from lists, dictionaries, and from a list of dictionaries, etc.

Example:

Python3

import pandas as pd 
  
# Calling DataFrame constructor 
df = pd.DataFrame() 
print(df)

# list of strings 
lst = ['Geeks', 'For', 'Geeks', 'is', 'portal', 'for', 'Geeks'] 
  
# Calling DataFrame constructor on list 
df = pd.DataFrame(lst) 
print(df) 

Output:

Empty DataFrame
Columns: []
Index: []
        0
0   Geeks
1     For
2   Geeks
3      is
4  portal
5     for
6   Geeks

Note: For more information, refer to Creating a Pandas DataFrame 

How to run Pandas Program in Python?

Pandas program can be run from any text editor but it is recommended to use Jupyter Notebook for this as Jupyter gives the ability to execute code in a particular cell rather than executing the entire file. Jupyter also provides an easy way to visualize pandas data frames and plots.

Note: For more information on Jupyter Notebook, refer to How To Use Jupyter Notebook – An Ultimate Guide 

Conclusion

In this tutorial provides a solid foundation for mastering Pandas, from basic operations to advanced techniques. As you apply these skills to your projects, You will explore that how Pandas enhances your ability to explore, clean, and analyze data, making it an indispensable tool in the data scientist’s toolkit. Happy coding!


Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!

Last Updated : 24 Nov, 2023
Like Article
Save Article
Previous
Next
Similar Reads
Complete Tutorials