Pandas Profiling in Python

The pandas_profiling library in Python include a method named as ProfileReport() which generate a basic report on the input DataFrame. 

The report consist of the following:

  • DataFrame overview,
  • Each attribute on which DataFrame is defined,
  • Correlations between attributes (Pearson Correlation and Spearman Correlation), and
  • A sample of DataFrame.

Syntax :

pandas_profiling.ProfileReport(df, **kwargs)
Arguments                                                                    Type                                                   Description
df DataFrame Data to be analyzed
bins int Number of bins in histogram. The default is 10.
check_correlation boolean Whether or not to check correlation. It’s `True` by default.
correlation_threshold float Threshold to determine if the variable pair is correlated. The default is 0.9.
correlation_overrides list Variable names not to be rejected because they are correlated. There is no variable in the list (`None`) by default.
check_recoded boolean Whether or not to check recoded correlation (memory heavy feature). Since it’s an expensive computation it can be activated for small datasets. `check_correlation` must be true to disable this check. It’s `False` by default.
pool_size int Number of workers in thread pool. The default is equal to the number of CPU.

Example:

Python3



filter_none

edit
close

play_arrow

link
brightness_4
code

# importing packages
import pandas as pd
import pandas_profiling as pp
  
  
# dictionary of data
dct = {'ID': {0: 23, 1: 43, 2: 12, 3: 13
              4: 67, 5: 89, 6: 90, 7: 56
              8: 34}, 
       'Name': {0: 'Ram', 1: 'Deep', 2: 'Yash',
                3: 'Aman', 4: 'Arjun', 5: 'Aditya',
                6: 'Divya', 7: 'Chalsea',
                8: 'Akash' }, 
       'Marks': {0: 89, 1: 97, 2: 45, 3: 78,
                 4: 56, 5: 76, 6: 100, 7: 87,
                 8: 81}, 
       'Grade': {0: 'B', 1: 'A', 2: 'F', 3: 'C',
                 4: 'E', 5: 'C', 6: 'A', 7: 'B',
                 8: 'B'}
      }
  
# forming dataframe and printing
data = pd.DataFrame(dct)
print(data)
  
# forming ProfileReport and save
# as output.html file
profile = pp.ProfileReport(data)
profile.to_file("output.html")

chevron_right


Output:

DataFrame

The html file named as output.html is as follows :




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.