Related Articles

# Python | Binning method for data smoothing

• Difficulty Level : Easy
• Last Updated : 20 May, 2019

Prerequisite: ML | Binning or Discretization

Binning method is used to smoothing data or to handle noisy data. In this method, the data is first sorted and then the sorted values are distributed into a number of buckets or bins. As binning methods consult the neighborhood of values, they perform local smoothing.

There are three approaches to perform smoothing –

Smoothing by bin means : In smoothing by bin means, each value in a bin is replaced by the mean value of the bin.
Smoothing by bin median : In this method each bin value is replaced by its bin median value.
Smoothing by bin boundary : In smoothing by bin boundaries, the minimum and maximum values in a given bin are identified as the bin boundaries. Each bin value is then replaced by the closest boundary value.

Approach:

1. Sort the array of given data set.
2. Divides the range into N intervals, each containing the approximately same number of samples(Equal-depth partitioning).
3. Store mean/ median/ boundaries in each row.

Examples:

```Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34

Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29

Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34

Smoothing by bin median:
- Bin 1: 9 9, 9, 9
- Bin 2: 24, 24, 24, 24
- Bin 3: 29, 29, 29, 29
```

Below is the Python implementation for above algorithm –

 `import` `numpy as np  ``import` `math``from` `sklearn.datasets ``import` `load_iris``from` `sklearn ``import` `datasets, linear_model, metrics `` ` `# load iris data set``dataset ``=` `load_iris()   ``a ``=` `dataset.data``b ``=` `np.zeros(``150``)`` ` `# take 1st column among 4 column of data set ``for` `i ``in` `range` `(``150``):``    ``b[i]``=``a[i,``1``]   `` ` `b``=``np.sort(b)  ``#sort the array`` ` `# create bins``bin1``=``np.zeros((``30``,``5``)) ``bin2``=``np.zeros((``30``,``5``))``bin3``=``np.zeros((``30``,``5``))`` ` `# Bin mean``for` `i ``in` `range` `(``0``,``150``,``5``):``    ``k``=``int``(i``/``5``)``    ``mean``=``(b[i] ``+` `b[i``+``1``] ``+` `b[i``+``2``] ``+` `b[i``+``3``] ``+` `b[i``+``4``])``/``5``    ``for` `j ``in` `range``(``5``):``        ``bin1[k,j]``=``mean``print``(``"Bin Mean: \n"``,bin1)``    ` `# Bin boundaries``for` `i ``in` `range` `(``0``,``150``,``5``):``    ``k``=``int``(i``/``5``)``    ``for` `j ``in` `range` `(``5``):``        ``if` `(b[i``+``j]``-``b[i]) < (b[i``+``4``]``-``b[i``+``j]):``            ``bin2[k,j]``=``b[i]``        ``else``:``            ``bin2[k,j]``=``b[i``+``4``]       ``print``(``"Bin Boundaries: \n"``,bin2)`` ` `# Bin median``for` `i ``in` `range` `(``0``,``150``,``5``):``    ``k``=``int``(i``/``5``)``    ``for` `j ``in` `range` `(``5``):``        ``bin3[k,j]``=``b[i``+``2``]``print``(``"Bin Median: \n"``,bin3)`

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.

My Personal Notes arrow_drop_up