Feature Scaling – Part 3
Feature Scaling is one of the most important steps of Data Preprocessing. It is applied to independent variables or features of data. The data sometimes contains features with varying magnitudes and if we do not treat them, the algorithms only take in the magnitude of these features, neglecting the units. It helps to normalize the data in a particular range and sometimes also helps in speeding up the calculations in an algorithm.
This uses a similar method to the Min-Max scaler but it instead uses the interquartile range, rather than the min-max, so that it is robust to outliers. This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).
The formula below is used:
Some more properties of Robust Scaler are:
- Robust Scaler as 0 mean and unit variance
- Robust Scaler has no predetermined range, unlike Min-Max Scaler
- Robust Scaler uses quartile ranges and this makes it less sensitive to outliers
As you can see in the output, after Robust scaling, the distributions are brought into the same scale and overlap, but the outliers remain outside of bulk of the new distributions. Thus, Robust scaling is an effective method of scaling the data.