Data Analysis and Visualization with Python | Set 2

Last Updated : 09 Sep, 2023

Prerequisites : NumPy in Python, Data Analysis Visualization with Python

Python is very well known for Data analysis and visualizations because of the vast libraries it provides such as Pandas, Numpy, Matplotlib, etc. Today we will learn some methods to understand our data better and to gain some useful insights from it.

1. Storing DataFrame in CSV Format :

Pandas provide to.csv('filename', index = "False|True") a function to write DataFrame into a CSV file. Here filename is the name of the CSV file that you want to create and index tells the index (if Default) of DataFrame should be overwritten or not. If we set index = False then the index is not overwritten. By Default value of the index is TRUE then the index is overwritten.

Example :

Python3

import pandas as pd 
  
# assigning three series to s1, s2, s3 
s1 = pd.Series([0, 4, 8]) 
s2 = pd.Series([1, 5, 9]) 
s3 = pd.Series([2, 6, 10]) 
  
# taking index and column values 
dframe = pd.DataFrame([s1, s2, s3]) 
  
# assign column name 
dframe.columns =['Geeks', 'For', 'Geeks'] 
  
# write data to csv file 
dframe.to_csv('geeksforgeeks.csv', index = False)   
dframe.to_csv('geeksforgeeks1.csv', index = True)

Output :

geeksforgeeks.csv: 
   Geeks  For  Geeks.1
0      0    4        8
1      1    5        9
2      2    6       10

geeksforgeeks1.csv: 
   Unnamed: 0  Geeks  For  Geeks.1
0           0      0    4        8
1           1      1    5        9
2           2      2    6       10

2. Handling Missing Data

The Data Analysis Phase also comprises the ability to handle the missing data from our dataset, and not so surprisingly Pandas live up to that expectation as well. This is where dropna and/or fillna methods come into play. While dealing with the missing data, you as a Data Analyst are either supposed to drop the column containing the NaN values (dropna method) or fill in the missing data with the mean or mode of the whole column entry (fillna method), this decision is of great significance and depends upon the data and the effect would create in our results.

Drop the missing Data: Let’s create a dataframe with null values :

Python3

import pandas as pd 
  
# Create a DataFrame 
dframe = pd.DataFrame({'Geeks': [23, 24, 22],  
                       'For': [10, 12, np.nan], 
                       'geeks': [0, np.nan, np.nan]}, 
                       columns =['Geeks', 'For', 'geeks']) 
print("Dataframe: ") 
print(dframe) 
  
# This will remove all the 
# rows with NAN values 
# If axis is not defined then 
# it is along rows i.e. axis = 0 
dframe.dropna(inplace = True) 
print("Dropping Null axis = 0") 
print(dframe)

Output :

DataFrame:
   Geeks   For  geeks
0     23  10.0    0.0
1     24  12.0    NaN
2     22   NaN    NaN

Dropping Null axis = 0
   Geeks   For  geeks
0     23  10.0    0.0

Dropping columns:

Python3

# Create a DataFrame 
dframe = pd.DataFrame({'Geeks': [23, 24, 22],  
                       'For': [10, 12, np.nan], 
                       'geeks': [0, np.nan, np.nan]}, 
                       columns =['Geeks', 'For', 'geeks']) 
  
# if axis is equal to 1 
dframe.dropna(axis = 1, inplace = True) 
  
print(dframe)

Output:

Fill the missing values : Now, to replace any NaN value with mean or mode of the data, fillna is used, which could replace all the NaN values from a particular column or even in whole DataFrame as per the requirement.

Python3

import numpy as np 
import pandas as pd 
  
# Create a DataFrame 
dframe = pd.DataFrame({'Geeks': [23, 24, 22],  
                        'For': [10, 12, np.nan], 
                        'geeks': [0, np.nan, np.nan]}, 
                        columns = ['Geeks', 'For', 'geeks']) 
  
# Use fillna of complete Dataframe  
  
# value function will be applied on every column 
dframe.fillna(value = dframe.mean(), inplace = True) 
print(dframe)

Output :

   Geeks   For  geeks
0     23  10.0    0.0
1     24  12.0    0.0
2     22  11.0    0.0

Filling value of one column:

Python3

# Create a DataFrame 
dframe = pd.DataFrame({'Geeks': [23, 24, 22],  
                        'For': [10, 12, np.nan], 
                        'geeks': [0, np.nan, np.nan]}, 
                        columns = ['Geeks', 'For', 'geeks']) 
  
# filling value of one column 
dframe['For'].fillna(value = dframe['For'].mean(), 
                                    inplace = True) 
print(dframe)

Output:

   Geeks   For  geeks
0     23  10.0    0.0
1     24  12.0    NaN
2     22  11.0    NaN

3. Groupby Method (Aggregation) :

The groupby method allows us to group together the data based on any row or column, thus we can further apply the aggregate functions to analyze our data. Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns. Consider a DataFrame generated by below code :

Python3

import pandas as pd 
import numpy as np 
  
# create DataFrame 
dframe = pd.DataFrame({'Geeks': [23, 24, 22, 22, 23, 24],  
                        'For': [10, 12, 13, 14, 15, 16], 
                        'geeks': [122, 142, 112, 122, 114, 112]}, 
                        columns = ['Geeks', 'For', 'geeks'])  
  
# Apply groupby and aggregate function 
# max to find max value of column  
print("After groupby: ") 
print(dframe.groupby(['Geeks']).max())

Output :

   Geeks  For  geeks
0     23   10    122
1     24   12    142
2     22   13    112
3     22   14    122
4     23   15    114
5     24   16    112

After groupby:
       For  geeks
Geeks            
22      14    122
23      15    122
24      16    142

Suggest improvement

Data analysis and Visualization with Python

Box plot visualization with Pandas and Seaborn

Share your thoughts in the comments

Introduction

Creating Objects

Viewing Data

Selection & Slicing

Operations

Manipulating Data

Grouping Data

Merging, Joining, Concatenating and Comparing

Working with Date and Time

Working With Text Data

Working with CSV and Excel files

Visualization

Applications and Projects

Data Analysis and Visualization with Python | Set 2

1. Storing DataFrame in CSV Format :

Python3

2. Handling Missing Data

Python3

Dropping columns:

Python3

Python3

Filling value of one column:

Python3

3. Groupby Method (Aggregation) :

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?