Skip to content
Related Articles

Related Articles

Pandas Profiling in Python
  • Last Updated : 22 Jun, 2020

The pandas_profiling library in Python include a method named as ProfileReport() which generate a basic report on the input DataFrame. 

The report consist of the following:

  • DataFrame overview,
  • Each attribute on which DataFrame is defined,
  • Correlations between attributes (Pearson Correlation and Spearman Correlation), and
  • A sample of DataFrame.

Syntax :

pandas_profiling.ProfileReport(df, **kwargs)
Arguments                                                                   Type                                                  Description
dfDataFrameData to be analyzed
binsintNumber of bins in histogram. The default is 10.
check_correlationbooleanWhether or not to check correlation. It’s `True` by default.
correlation_thresholdfloatThreshold to determine if the variable pair is correlated. The default is 0.9.
correlation_overrideslistVariable names not to be rejected because they are correlated. There is no variable in the list (`None`) by default.
check_recodedbooleanWhether or not to check recoded correlation (memory heavy feature). Since it’s an expensive computation it can be activated for small datasets. `check_correlation` must be true to disable this check. It’s `False` by default.
pool_sizeintNumber of workers in thread pool. The default is equal to the number of CPU.

Example:

Python3






# importing packages
import pandas as pd
import pandas_profiling as pp
  
  
# dictionary of data
dct = {'ID': {0: 23, 1: 43, 2: 12, 3: 13
              4: 67, 5: 89, 6: 90, 7: 56
              8: 34}, 
       'Name': {0: 'Ram', 1: 'Deep', 2: 'Yash',
                3: 'Aman', 4: 'Arjun', 5: 'Aditya',
                6: 'Divya', 7: 'Chalsea',
                8: 'Akash' }, 
       'Marks': {0: 89, 1: 97, 2: 45, 3: 78,
                 4: 56, 5: 76, 6: 100, 7: 87,
                 8: 81}, 
       'Grade': {0: 'B', 1: 'A', 2: 'F', 3: 'C',
                 4: 'E', 5: 'C', 6: 'A', 7: 'B',
                 8: 'B'}
      }
  
# forming dataframe and printing
data = pd.DataFrame(dct)
print(data)
  
# forming ProfileReport and save
# as output.html file
profile = pp.ProfileReport(data)
profile.to_file("output.html")

Output:

DataFrame

The html file named as output.html is as follows :

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course

My Personal Notes arrow_drop_up
Recommended Articles
Page :