# How to Remove Outliers from Multiple Columns in R DataFrame?

Last Updated : 15 Dec, 2023

In this article, we will discuss how to remove outliers from Multiple Columns in the R Programming Language.

To remove outliers from a data frame, we use the Interquartile range (IQR) method. This method uses the first and third quantile values to determine whether an observation is an outlier to not.

If an observation is 1.5 times the interquartile range greater than the third quartile or 1.5 times the interquartile range less than the first quartile it is considered an outlier.

## Remove Outliers from Multiple Columns in R

To find an outlier in the R Language we use the following function, where we first calculate the first and third quantiles of the observation by using the quantile() function. Then we calculate their difference as an interquartile range.

if an observation is 1.5 times the interquartile range greater than the third quartile or 1.5 times the interquartile range less than the first quartile it returns true.

Syntax:

detect_outlier <- function(x) {

Quantile1 <- quantile(x, probs=.25)

Quantile3 <- quantile(x, probs=.75)

IQR = Quantile3-Quantile1

x > Q3 + (iqr*1.5) | x < Q1 – (iqr*1.5) }

Then once the outlier is identified we remove the outlier by testing them with the above function.

### Example 1:

Here, is an example, where we remove outliers from three columns of the data frame.

## R

 `# create sample data frame` `sample_data <- ``data.frame``(x=``c``(10, 8, 120, 14, 11, 90, 13, 15, 200, 25, 5),` `                          ``y=``c``(400, 35, 50, 704, 80, 55, 900, 75, 60, 500, 10),` `                          ``z=``c``(10, 300, 20, 90, 800, 70, 5, 850, 75, 20, 30))` `print``(``"Display original dataframe"``)` `print``(sample_data)`   `# create detect outlier function` `detect_outlier <- ``function``(x) {` `  `  `  ``# calculate first quantile` `  ``Quantile1 <- ``quantile``(x, probs=.25)` `  `  `  ``# calculate third quantile` `  ``Quantile3 <- ``quantile``(x, probs=.75)` `  `  `  ``# calculate interquartile range` `  ``IQR = Quantile3 - Quantile1` `  `  `  ``# return true or false` `  ``x > Quantile3 + (IQR * 1.5) | x < Quantile1 - (IQR * 1.5)` `}`   `# create remove outlier function` `remove_outlier <- ``function``(dataframe, columns = ``names``(dataframe)) {` `  `  `  ``# for loop to traverse in columns vector` `  ``for ``(col ``in` `columns) {` `    `  `    ``# remove observation if it satisfies outlier function` `    ``dataframe <- dataframe[!``detect_outlier``(dataframe[[col]]), ]` `  ``}` `  `  `  ``# return dataframe` `  ``print``(``"Remove outliers"``)` `  ``print``(dataframe)` `}`   `remove_outlier``(sample_data, ``c``(``'x'``, ``'y'``, ``'z'``))`

Output:

`[1] "Display original dataframe"     x   y   z1   10 400  102    8  35 3003  120  50  204   14 704  905   11  80 8006   90  55  707   13 900   58   15  75 8509  200  60  7510  25 500  2011   5  10  30[1] "Remove outliers"     x   y   z1   10 400  102    8  35 3003  120  50  204   14 704  906   90  55  707   13 900   510  25 500  2011   5  10  30`

### Example 2:

Here, is an example, where we remove outliers from four columns of the data frame.

## R

 `# create sample data frame` `sample_data <- ``data.frame``(x=``c``(-1, 2, 3, 4, 3, 2, 3, 4, 4, 5, 10),` `                          ``y=``c``(-4, 3, 5, 7, 8, 5, 9, 7, 6, 5, 10),` `                          ``z=``c``(-1, 3, 2, 9, 8, 7, 0, 8, 7, 2, 13),` `                          ``w=``c``(10, 0, 1, 0, 1, 0, 1, 0, 2, 2, 10))` `print``(``"Display original dataframe"``)` `print``(sample_data)`   `# create detect outlier function` `detect_outlier <- ``function``(x) {` `  `  `  ``# calculate first quantile` `  ``Quantile1 <- ``quantile``(x, probs=.25)` `  `  `  ``# calculate third quantile` `  ``Quantile3 <- ``quantile``(x, probs=.75)` `  `  `  ``# calculate inter quartile range` `  ``IQR = Quantile3 - Quantile1` `  `  `  ``# return true or false` `  ``x > Quantile3 + (IQR * 1.5) | x < Quantile1 - (IQR * 1.5)` `}`   `# create remove outlier function` `remove_outlier <- ``function``(dataframe, columns = ``names``(dataframe)) {` `  `  `  ``# for loop to traverse in columns vector` `  ``for ``(col ``in` `columns) {` `    `  `    ``# remove observation if it satisfies outlier function` `    ``dataframe <- dataframe[!``detect_outlier``(dataframe[[col]]), ]` `  ``}` `  `  `  ``# return dataframe` `  ``print``(``"Remove outliers"``)` `  ``print``(dataframe)` `}`   `remove_outlier``(sample_data, ``c``(``'x'``, ``'y'``, ``'z'``, ``'w'``))`

Output:

`  [1] "Display original dataframe"  x  y  z  w1  -1 -4 -1 102   2  3  3  03   3  5  2  14   4  7  9  05   3  8  8  16   2  5  7  07   3  9  0  18   4  7  8  09   4  6  7  210  5  5  2  211 10 10 13 10[1] "Remove outliers"   x y z w2  2 3 3 03  3 5 2 14  4 7 9 05  3 8 8 16  2 5 7 07  3 9 0 18  4 7 8 09  4 6 7 210 5 5 2 2`

Article Tags :