The pandas_profiling library in Python include a method named as ProfileReport() which generate a basic report on the input DataFrame.
The report consist of the following:
- DataFrame overview,
- Each attribute on which DataFrame is defined,
- Correlations between attributes (Pearson Correlation and Spearman Correlation), and
- A sample of DataFrame.
|df||DataFrame||Data to be analyzed|
|bins||int||Number of bins in histogram. The default is 10.|
|check_correlation||boolean||Whether or not to check correlation. It’s `True` by default.|
|correlation_threshold||float||Threshold to determine if the variable pair is correlated. The default is 0.9.|
|correlation_overrides||list||Variable names not to be rejected because they are correlated. There is no variable in the list (`None`) by default.|
|check_recoded||boolean||Whether or not to check recoded correlation (memory heavy feature). Since it’s an expensive computation it can be activated for small datasets. `check_correlation` must be true to disable this check. It’s `False` by default.|
|pool_size||int||Number of workers in thread pool. The default is equal to the number of CPU.|
The html file named as output.html is as follows :
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.