Related Articles

# Binning in Data Mining

• Last Updated : 28 Sep, 2020

Data binning, bucketing is a data pre-processing method used to minimize the effects of small observation errors. The original data values are divided into small intervals known as bins and then they are replaced by a general value calculated for that bin. This has a smoothing effect on the input data and may also reduce the chances of overfitting in case of small datasets

There are 2 methods of dividing data into bins ”

1. Equal Frequency Binning : bins have equal frequency.
2. Equal Width Binning : bins have equal width with a range of each bin are defined as [min + w], [min + 2w] …. [min + nw] where w = (max – min) / (no of bins).

Equal frequency

```Input :[5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215]
Output :
[5, 10, 11, 13]
[15, 35, 50, 55]
[72, 92, 204, 215]

```

Equal Width

```Input :[5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215]
Output :
[10, 11, 13, 15, 35, 50, 55, 72]



```

Code : Implementation of Bining Technique

 `#equal frequency``def` `equifreq(arr1, m):``     ` `    ``a ``=` `len``(arr1)``    ``n ``=` `int``(a ``/` `m)``    ``for` `i ``in` `range``(``0``, m):``        ``arr ``=` `[]``        ``for` `j ``in` `range``(i ``*` `n, (i ``+` `1``) ``*` `n):``            ``if` `j >``=` `a:``                ``break``            ``arr ``=` `arr ``+` `[arr1[j]]``        ``print``(arr)`` ` `#equal width``def` `equiwidth(arr1, m):``    ``a ``=` `len``(arr1)``    ``w ``=` `int``((``max``(arr1) ``-` `min``(arr1)) ``/` `m)``    ``min1 ``=` `min``(arr1)``    ``arr ``=` `[]``    ``for` `i ``in` `range``(``0``, m ``+` `1``):``        ``arr ``=` `arr ``+` `[min1 ``+` `w ``*` `i]``    ``arri``=``[]``     ` `    ``for` `i ``in` `range``(``0``, m):``        ``temp ``=` `[]``        ``for` `j ``in` `arr1:``            ``if` `j >``=` `arr[i] ``and` `j <``=` `arr[i``+``1``]:``                ``temp ``+``=` `[j]``        ``arri ``+``=` `[temp]``    ``print``(arri) `` ` `#data to be binned``data ``=` `[``5``, ``10``, ``11``, ``13``, ``15``, ``35``, ``50``, ``55``, ``72``, ``92``, ``204``, ``215``]``#no of bins``m ``=` `3` ` ` `print``(``"equal frequency binning"``)``equifreq(data, m)`` ` `print``(``"\n\nequal width binning"``)``equiwidth(data, ``3``)`

Output :

```equal frequency binning
[5, 10, 11, 13]
[15, 35, 50, 55]
[72, 92, 204, 215]

equal width binning
[[10, 11, 13, 15, 35, 50, 55, 72], , ]
```

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.

My Personal Notes arrow_drop_up