Python | Pandas Dataframe.describe() method
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
describe() is used to view some basic statistical details like percentile, mean, std etc. of a data frame or a series of numeric values. When this method is applied to a series of string, it returns a different output which is shown in the examples below.
Syntax: DataFrame.describe(percentiles=None, include=None, exclude=None)
percentile: list like data type of numbers between 0-1 to return the respective percentile
include: List of data types to be included while describing dataframe. Default is None
exclude: List of data types to be Excluded while describing dataframe. Default is None
Return type: Statistical summary of data frame.
To download the data set used in following example, click here.
In the following examples, the data frame used contains data of some NBA players. The image of data frame before any operations is attached below.
Example #1: Describing data frame with both object and numeric data type
In this example, the data frame is described and [‘object’] is passed to include parameter to see description of object series. [.20, .40, .60, .80] is passed to percentile parameter to view the respective percentile of Numeric series.
As shown in the output image, Statistical description of dataframe was returned with the respective passed percentiles. For the columns with strings, NaN was returned for numeric operations.
Example #2: Describing series of strings
In this example, the describe method is called by the Name column to see the behaviour with object data type.
As shown in the output image, the behaviour of describe() is different with series of strings.
Different stats were returned like count of values, unique values, top and frequency of occurrence in this case.