Prerequisite: ML | Binning or Discretization
Binning method is used to smoothing data or to handle noisy data. In this method, the data is first sorted and then the sorted values are distributed into a number of buckets or bins. As binning methods consult the neighborhood of values, they perform local smoothing.
There are three approaches to perform smoothing –
Smoothing by bin means : In smoothing by bin means, each value in a bin is replaced by the mean value of the bin.
Smoothing by bin median : In this method each bin value is replaced by its bin median value.
Smoothing by bin boundary : In smoothing by bin boundaries, the minimum and maximum values in a given bin are identified as the bin boundaries. Each bin value is then replaced by the closest boundary value.
- Sort the array of given data set.
- Divides the range into N intervals, each containing the approximately same number of samples(Equal-depth partitioning).
- Store mean/ median/ boundaries in each row.
Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 Smoothing by bin means: - Bin 1: 9, 9, 9, 9 - Bin 2: 23, 23, 23, 23 - Bin 3: 29, 29, 29, 29 Smoothing by bin boundaries: - Bin 1: 4, 4, 4, 15 - Bin 2: 21, 21, 25, 25 - Bin 3: 26, 26, 26, 34 Smoothing by bin median: - Bin 1: 9 9, 9, 9 - Bin 2: 24, 24, 24, 24 - Bin 3: 29, 29, 29, 29
Below is the Python implementation for above algorithm –
- Binning in Data Mining
- Exploration with Hexagonal Binning and Contour Plots
- ML | Binning or Discretization
- Python - Convert Tick-by-Tick data into OHLC (Open-High-Low-Close) Data
- Processing of Raw Data to Tidy Data in R
- Data Integration in Data Mining
- Python | Filtering data with Pandas .query() method
- PyQt5 – Get tooltip data of label | toolTip() method
- Object Oriented Programming in Python | Set 2 (Data Hiding and Object Printing)
- Data analysis and Visualization with Python
- pprint : Data pretty printer in Python
- Classifying data using Support Vector Machines(SVMs) in Python
- Inbuilt Data Structures in Python
- Data type Object (dtype) in NumPy Python
- Data Preprocessing for Machine learning in Python
- MongoDB Python | Insert and Update Data
- MongoDB python | Delete Data and Drop Collection
- Find the k most frequent words from data set in Python
- Data visualization with different Charts in Python
- Analysis of test data using K-Means Clustering in Python
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.