Skip to content
Related Articles

Related Articles

Improve Article
Save Article
Like Article

Pandas Profiling in Python

  • Last Updated : 22 Jun, 2020

The pandas_profiling library in Python include a method named as ProfileReport() which generate a basic report on the input DataFrame. 

The report consist of the following:

  • DataFrame overview,
  • Each attribute on which DataFrame is defined,
  • Correlations between attributes (Pearson Correlation and Spearman Correlation), and
  • A sample of DataFrame.

Syntax :

pandas_profiling.ProfileReport(df, **kwargs)
Arguments                                                                   Type                                                  Description
dfDataFrameData to be analyzed
binsintNumber of bins in histogram. The default is 10.
check_correlationbooleanWhether or not to check correlation. It’s `True` by default.
correlation_thresholdfloatThreshold to determine if the variable pair is correlated. The default is 0.9.
correlation_overrideslistVariable names not to be rejected because they are correlated. There is no variable in the list (`None`) by default.
check_recodedbooleanWhether or not to check recoded correlation (memory heavy feature). Since it’s an expensive computation it can be activated for small datasets. `check_correlation` must be true to disable this check. It’s `False` by default.
pool_sizeintNumber of workers in thread pool. The default is equal to the number of CPU.

Example:

Python3




# importing packages
import pandas as pd
import pandas_profiling as pp
  
  
# dictionary of data
dct = {'ID': {0: 23, 1: 43, 2: 12, 3: 13
              4: 67, 5: 89, 6: 90, 7: 56
              8: 34}, 
       'Name': {0: 'Ram', 1: 'Deep', 2: 'Yash',
                3: 'Aman', 4: 'Arjun', 5: 'Aditya',
                6: 'Divya', 7: 'Chalsea',
                8: 'Akash' }, 
       'Marks': {0: 89, 1: 97, 2: 45, 3: 78,
                 4: 56, 5: 76, 6: 100, 7: 87,
                 8: 81}, 
       'Grade': {0: 'B', 1: 'A', 2: 'F', 3: 'C',
                 4: 'E', 5: 'C', 6: 'A', 7: 'B',
                 8: 'B'}
      }
  
# forming dataframe and printing
data = pd.DataFrame(dct)
print(data)
  
# forming ProfileReport and save
# as output.html file
profile = pp.ProfileReport(data)
profile.to_file("output.html")

Output:

DataFrame

The html file named as output.html is as follows :


My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!