Pandas Profiling in Python
The pandas_profiling library in Python include a method named as ProfileReport() which generate a basic report on the input DataFrame.
The report consist of the following:
- DataFrame overview,
- Each attribute on which DataFrame is defined,
- Correlations between attributes (Pearson Correlation and Spearman Correlation), and
- A sample of DataFrame.
|df||DataFrame||Data to be analyzed|
|bins||int||Number of bins in histogram. The default is 10.|
|check_correlation||boolean||Whether or not to check correlation. It’s `True` by default.|
|correlation_threshold||float||Threshold to determine if the variable pair is correlated. The default is 0.9.|
|correlation_overrides||list||Variable names not to be rejected because they are correlated. There is no variable in the list (`None`) by default.|
|check_recoded||boolean||Whether or not to check recoded correlation (memory heavy feature). Since it’s an expensive computation it can be activated for small datasets. `check_correlation` must be true to disable this check. It’s `False` by default.|
|pool_size||int||Number of workers in thread pool. The default is equal to the number of CPU.|
The html file named as output.html is as follows :