Replace Missing Values by Column Mean in R DataFrame
In this article, we are going to see how to replace missing values with columns mean in R Programming Language. Missing values in a dataset are usually represented as NaN or NA. Such values must be replaced with another value or removed. This process of replacing another value in place of missing data is known as Data Imputation.
Creating dataframe with missing values:
Method 1: Replace columns using mean() function
Let’s see how to impute missing values with each column’s mean using a dataframe and mean( ) function. mean() function is used to calculate the arithmetic mean of the elements of the numeric vector passed to it as an argument.
Syntax of mean() : mean(x, trim = 0, na.rm = FALSE, …)
- x – any object
- trim – observations to be trimmed from each end of x before the mean is computed
- na.rm – FALSE to remove NA values
Example 1: Replacing NA for all columns using mean( ) function
Example 2: Replacing Missing Data in all columns Using for-Loop
Example 3: Replacing NA for one column.
Let’s impute mean value for 1st column i.e marks1
Method 2: Replace column using colMeans() function
colMeans() function is used to compute the mean of each column of a matrix or array
Syntax of colMeans() : colMeans(x, na.rm = FALSE, dims = 1 …)
- x: object
- dims: dimensions are regarded as ‘columns’ to sum over
- na.rm: TRUE to ignore NA values
Here we are going to use colMeans function to replace the NA in columns.
Method 3: Replacing NA using apply() function
In this method, we will use apply() function to replace the NA from the columns.
Syntax of apply() : apply(X, MARGIN, FUN, …)
- X – an array, including a matrix
- MARGIN – a vector
- FUN – the function to be applied