# Binning in Data Mining

Data binning, bucketing is a data pre-processing method used to minimize the effects of small observation errors. The original data values are divided into small intervals known as bins and then they are replaced by a general value calculated for that bin. This has a smoothing effect on the input data and may also reduce the chances of overfitting in case of small datasets

There are 2 methods of dividing data into bins ”

1. Equal Frequency Binning : bins have equal frequency.
2. Equal Width Binning : bins have equal width with a range of each bin are defined as [min + w], [min + 2w] …. [min + nw] where w = (max – min) / (no of bins).

Equal frequency

```Input :[5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215]
Output :
[5, 10, 11, 13]
[15, 35, 50, 55]
[72, 92, 204, 215]

```

Equal Width

```Input :[5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215]
Output :
[10, 11, 13, 15, 35, 50, 55, 72]



```

Code : Implementation of Bining Technique

 `#equal frequency ` `def` `equifreq(arr1, m): ` `     `  `    ``a ``=` `len``(arr1) ` `    ``n ``=` `int``(a ``/` `m) ` `    ``for` `i ``in` `range``(``0``, m): ` `        ``arr ``=` `[] ` `        ``for` `j ``in` `range``(i ``*` `n, (i ``+` `1``) ``*` `n): ` `            ``if` `j >``=` `a: ` `                ``break` `            ``arr ``=` `arr ``+` `[arr1[j]] ` `        ``print``(arr) ` ` `  `#equal width ` `def` `equiwidth(arr1, m): ` `    ``a ``=` `len``(arr1) ` `    ``w ``=` `int``((``max``(arr1) ``-` `min``(arr1)) ``/` `m) ` `    ``min1 ``=` `min``(arr1) ` `    ``arr ``=` `[] ` `    ``for` `i ``in` `range``(``0``, m ``+` `1``): ` `        ``arr ``=` `arr ``+` `[min1 ``+` `w ``*` `i] ` `    ``arri``=``[] ` `     `  `    ``for` `i ``in` `range``(``0``, m): ` `        ``temp ``=` `[] ` `        ``for` `j ``in` `arr1: ` `            ``if` `j > arr[i] ``and` `j < arr[i``+``1``]: ` `                ``temp ``+``=` `[j] ` `        ``arri ``+``=` `[temp] ` `    ``print``(arri)  ` ` `  `#data to be binned ` `data ``=` `[``5``, ``10``, ``11``, ``13``, ``15``, ``35``, ``50``, ``55``, ``72``, ``92``, ``204``, ``215``] ` `#no of bins ` `m ``=` `3`  ` `  `print``(``"equal frequency binning"``) ` `equifreq(data, m) ` ` `  `print``(``"\n\nequal width binning"``) ` `equiwidth(data, ``3``) `

Output :

```equal frequency binning
[5, 10, 11, 13]
[15, 35, 50, 55]
[72, 92, 204, 215]

equal width binning
[[10, 11, 13, 15, 35, 50, 55, 72], , ]
```

