# Binning in Data Mining

Data binning, bucketing is a data pre-processing method used to minimize the effects of small observation errors. The original data values are divided into small intervals known as bins and then they are replaced by a general value calculated for that bin. This has a smoothing effect on the input data and may also reduce the chances of overfitting in the case of small datasets
There are 2 methods of dividing data into bins:

1. Equal Frequency Binning: bins have an equal frequency.
2. Equal Width Binning : bins have equal width with a range of each bin are defined as [min + w], [min + 2w] …. [min + nw] where w = (max – min) / (no of bins).

Equal frequency:

```Input:[5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215]

Output:
[5, 10, 11, 13]
[15, 35, 50, 55]
[72, 92, 204, 215]```

Equal Width:

```Input: [5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215]

Output:
[5, 10, 11, 13, 15, 35, 50, 55, 72]
[92]
[204, 215]```

Code : Implementation of Binning Technique:

## Python

 `# equal frequency ``def` `equifreq(arr1, m):     ``    ``a ``=` `len``(arr1) ``    ``n ``=` `int``(a ``/` `m) ``    ``for` `i ``in` `range``(``0``, m): ``        ``arr ``=` `[] ``        ``for` `j ``in` `range``(i ``*` `n, (i ``+` `1``) ``*` `n): ``            ``if` `j >``=` `a: ``                ``break``            ``arr ``=` `arr ``+` `[arr1[j]] ``        ``print``(arr) `` ` `# equal width ``def` `equiwidth(arr1, m): ``    ``a ``=` `len``(arr1) ``    ``w ``=` `int``((``max``(arr1) ``-` `min``(arr1)) ``/` `m) ``    ``min1 ``=` `min``(arr1) ``    ``arr ``=` `[] ``    ``for` `i ``in` `range``(``0``, m ``+` `1``): ``        ``arr ``=` `arr ``+` `[min1 ``+` `w ``*` `i] ``    ``arri``=``[] ``     ` `    ``for` `i ``in` `range``(``0``, m): ``        ``temp ``=` `[] ``        ``for` `j ``in` `arr1: ``            ``if` `j >``=` `arr[i] ``and` `j <``=` `arr[i``+``1``]: ``                ``temp ``+``=` `[j] ``        ``arri ``+``=` `[temp] ``    ``print``(arri)  `` ` `# data to be binned ``data ``=` `[``5``, ``10``, ``11``, ``13``, ``15``, ``35``, ``50``, ``55``, ``72``, ``92``, ``204``, ``215``] `` ` `# no of bins ``m ``=` `3` ` ` `print``(``"equal frequency binning"``) ``equifreq(data, m) `` ` `print``(``"\n\nequal width binning"``) ``equiwidth(data, ``3``) `

Output :

```equal frequency binning
[5, 10, 11, 13]
[15, 35, 50, 55]
[72, 92, 204, 215]

equal width binning
[[5, 10, 11, 13, 15, 35, 50, 55, 72], [92], [204, 215]] ```

Don't miss your chance to ride the wave of the data revolution! Every industry is scaling new heights by tapping into the power of data. Sharpen your skills and become a part of the hottest trend in the 21st century.

Dive into the future of technology - explore the Complete Machine Learning and Data Science Program by GeeksforGeeks and stay ahead of the curve.

Previous
Next