Open In App

DataFrame vs Series in Pandas

Last Updated : 17 Feb, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Pandas is a widely-used Python library for data analysis that provides two essential data structures: Series and DataFrame. These structures are potent tools for handling and examining data, but they have different features and applications.

In this article, we will explore the differences between Series and DataFrames.

What are pandas?

Pandas is a popular open-source data manipulation and analysis library for Python. It provides easy-to-use data structures like DataFrame and Series, which are designed to make working with structured data fast, easy, and expressive. Pandas are widely used in data science, machine learning, and data analysis for tasks such as data cleaning, transformation, and exploration.

What is the Pandas series?

A Pandas Series is a one-dimensional array-like object that can hold data of any type (integer, float, string, etc.). It is labelled, meaning each element has a unique identifier called an index. You can think of a Series as a column in a spreadsheet or a single column of a database table. Series are a fundamental data structure in Pandas and are commonly used for data manipulation and analysis tasks. They can be created from lists, arrays, dictionaries, and existing Series objects. Series are also a building block for the more complex Pandas DataFrame, which is a two-dimensional table-like structure consisting of multiple Series objects.

Creating a Series data structure from a list, dictionary, and custom index:

Python3




import pandas as pd
 
# Initializing a Series from a list
data = [1, 2, 3, 4, 5]
series_from_list = pd.Series(data)
print(series_from_list)
 
# Initializing a Series from a dictionary
data = {'a': 1, 'b': 2, 'c': 3}
series_from_dict = pd.Series(data)
print(series_from_dict)
 
# Initializing a Series with custom index
data = [1, 2, 3, 4, 5]
index = ['a', 'b', 'c', 'd', 'e']
series_custom_index = pd.Series(data, index=index)
print(series_custom_index)


Output:

0    1
1 2
2 3
3 4
4 5
dtype: int64
a 1
b 2
c 3
dtype: int64
a 1
b 2
c 3
d 4
e 5
dtype: int64

Key Features of Series data structure:

Indexing:

Each element in a Series has a corresponding index, which can be used to access or manipulate the data.

Python3




print(series_from_list[0])
print(series_from_dict['b'])


Output:

1
2

Vectorized Operations:

Series supports vectorized operations, allowing you to perform arithmetic operations on the entire series efficiently.

Python3




series_a = pd.Series([1, 2, 3])
series_b = pd.Series([4, 5, 6])
sum_series = series_a + series_b
print(sum_series)


Output:

0    5
1 7
2 9
dtype: int64

Alignment:

When performing operations between two Series objects, Pandas automatically aligns the data based on the index labels.

Python3




series_a = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
series_b = pd.Series([4, 5, 6], index=['b', 'c', 'd'])
sum_series = series_a + series_b
print(sum_series)


Output:

a    NaN
b 6.0
c 8.0
d NaN
dtype: float64

NaN Handling:

Missing values, represented by NaN (Not a Number), can be handled gracefully in Series operations.

Python3




series_a = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
series_b = pd.Series([4, 5], index=['b', 'c'])
sum_series = series_a + series_b
print(sum_series)


Output:

a    NaN
b 6.0
c 8.0
dtype: float64

What is Pandas Dataframe?

A Pandas DataFrame is a two-dimensional, tabular data structure with rows and columns. It is similar to a spreadsheet or a table in a relational database. The DataFrame has three main components: the data, which is stored in rows and columns; the rows, which are labeled by an index; and the columns, which are labeled and contain the actual data.

Creating a dataframe from lists, dictionary

Python3




import pandas as pd
 
# Initializing a DataFrame from a dictionary
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 30, 35],
        'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
print(df)
 
# Initializing a DataFrame from a list of lists
data = [['John', 25, 'New York'],
        ['Alice', 30, 'Los Angeles'],
        ['Bob', 35, 'Chicago']]
columns = ['Name', 'Age', 'City']
df = pd.DataFrame(data, columns=columns)
print(df)


Output:

    Name  Age         City
0 John 25 New York
1 Alice 30 Los Angeles
2 Bob 35 Chicago
Name Age City
0 John 25 New York
1 Alice 30 Los Angeles
2 Bob 35 Chicago

Key Features of Data Frame data structures:

Indexing:

DataFrame provides flexible indexing options, allowing access to rows, columns, or individual elements based on labels or integer positions.

Python3




# Accessing a column
print(df['Name'])
 
# Accessing a row by label
print(df.loc[0])
 
# Accessing a row by integer position
print(df.iloc[0])
 
# Accessing an individual element
print(df.at[0, 'Name'])


Output:

0     John
1 Alice
2 Bob
Name: Name, dtype: object
Name John
Age 25
City New York
Name: 0, dtype: object
Name John
Age 25
City New York
Name: 0, dtype: object
John

Column Operations:

Columns in a DataFrame are Series objects, enabling various operations such as arithmetic operations, filtering, and sorting.

Python3




# Adding a new column
df['Salary'] = [50000, 60000, 70000]
 
# Filtering rows based on a condition
high_salary_employees = df[df['Salary'] > 60000]
print(high_salary_employees)
 
# Sorting DataFrame by a column
sorted_df = df.sort_values(by='Age', ascending=False)
print(sorted_df)


Output:

  Name  Age     City  Salary
2 Bob 35 Chicago 70000
Name Age City Salary
2 Bob 35 Chicago 70000
1 Alice 30 Los Angeles 60000
0 John 25 New York 50000

Missing Data Handling:

DataFrames provide methods for handling missing or NaN values, including dropping or filling missing values.

Python3




# Dropping rows with missing values
df.dropna()
print(df)
 
# Filling missing values with a specified value
df.fillna(0)
print(df)


Output:

    Name  Age         City  Salary
0 John 25 New York 50000
1 Alice 30 Los Angeles 60000
2 Bob 35 Chicago 70000
Name Age City Salary
0 John 25 New York 50000
1 Alice 30 Los Angeles 60000
2 Bob 35 Chicago 70000

Grouping and Aggregation:

DataFrames support group-by operations for summarizing data and applying aggregation functions.

Python3




# Grouping by a column and calculating mean
avg_age_by_city = df.groupby('City')['Age'].mean()
print(avg_age_by_city)


Output:

City
Chicago 35.0
Los Angeles 30.0
New York 25.0
Name: Age, dtype: float64

DataFrame vs Series

Series

DataFrame

One- dimensional

Two- dimensional

Series elements must be homogenous.

Can be heterogeneous.

Immutable(size cannot be changed).

Mutable(size can be changeable).

Element wise computations.

Column wise computations.

Functionality is less.

Functionality is more.

Alignment not supported.

Alignment is supported.

Conclusion

In conclusion, Pandas offers two vital data structures, Series and DataFrame, each tailored for specific data manipulation tasks. Series excel in handling one-dimensional labeled data with efficient indexing and vectorized operations, while DataFrames provide tabular data organization with versatile indexing, column operations, and robust handling of missing data. Understanding their differences is crucial for effective data analysis in Python.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads