Bin Size in Matplotlib Histogram
A histogram is a graphical representation of the distribution of data given by the user. Its appearance is similar to Bar-Graph except it is continuous.
The towers or bars of a histogram are called bins. The height of each bin shows how many values from that data fall into that range.
Width of each bin is = (max value of data – min value of data) / total number of bins
The default value of the number of bins to be created in a histogram is 10. However, we can change the size of bins using the parameter bins in matplotlib.pyplot.hist().
Method 1 :
We can pass an integer in bins stating how many bins/towers to be created in the histogram and the width of each bin is then changed accordingly.
Example 1 :
Here, the bins = 5, i.e the number of bins to be created is 5. Setting bins to an integer creates bins of equal size or width. As the bin size is changed so then the bin width would be changed accordingly as :
width = (195 – 140 ) / 5 = 11
Example 2 :
In the above graph, the width of each bin is :
width = ( 145 – 51 ) / 7 = 13.4
Method 2 :
We can also pass a sequence of int or float in the parameter bins. In which the elements of the sequence are the edges/boundaries of the bins. In this method, the bin width may vary for each bin.
Suppose a sequence [1,2,3,4,5] is assigned to bins and then the number of bins made will be 4 i.e the first bin will be [1,2) (including 1, but excluding 2) second bin will be [2,3) (including 2, but excluding 3) third bin will be [3,4) (including 3, but excluding 4). However, in the last bin [4,5] both 4 and 5 are included.
Hence, all the bins are half-open [a, b) but the last bin is closed [a, b]. For such cases, the width of each bin is equal.
If the difference between each element of the sequence assigned to bins is not equal then the width of each bin is different, hence the bin width depends on the sequence.
Example 1 : Equal bin width
Example 2 : Unequal bin width
For passing a sequence in the bins parameter, we may also use the range function for equally distributed bins. Within range(), the starting point is the minimum of data, the endpoint is the maximum of data + the bin width mentioned, as in range(), the endpoint is not included and the step is bin width.
As the step is fixed in range(), we get equally sized bins in the histogram.