Open In App

How To Remove A Column In R

Last Updated : 24 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

R is a versatile language that is widely used in data analysis and statistical computing. A common task when working with data is removing one or more columns from a data frame. This guide will show you various methods to remove columns in R Programming Language using different approaches and providing examples to illustrate each method.

Why Remove Columns?

Removing columns from a data frame is a common task in data preprocessing and cleaning. It might be necessary to remove a column when.

  1. It contains irrelevant information.
  2. It has too many missing or erroneous values.
  3. It is highly correlated with other columns, leading to multicollinearity.
  4. It is used to protect privacy or sensitive information.

Let’s explore the methods to remove a column in R.

Using the Base R Syntax

In Base R, you can remove columns using negative indexing or the subset function. To remove a column by name, you can use negative indexing.

R
# Create a sample data frame
df <- data.frame(
  ID = 1:5,
  Name = c("Ali", "Boby", "Charles", "David", "Eva"),
  Age = c(25, 30, 35, 40, 45),
  Gender = c("F", "M", "M", "M", "F")
)
# Remove the 'Age' column
df <- df[, -which(names(df) == "Age")]
print(df)

Output:

  ID    Name Age Gender
1  1     Ali  25      F
2  2    Boby  30      M
3  3 Charles  35      M
4  4   David  40      M
5  5     Eva  45      F

  ID    Name Gender
1  1     Ali      F
2  2    Boby      M
3  3 Charles      M
4  4   David      M
5  5     Eva      F

Subset Function

The subset function can also be used to remove columns.

R
# Create a sample data frame
df <- data.frame(
  ID = 1:5,
  Name = c("Ali", "Boby", "Charles", "David", "Eva"),
  Age = c(25, 30, 35, 40, 45),
  Gender = c("F", "M", "M", "M", "F")
)
df
# Remove the 'Gender' column using subset
df <- subset(df, select = -Gender)
print(df)

Output:

  ID    Name Age Gender
1  1     Ali  25      F
2  2    Boby  30      M
3  3 Charles  35      M
4  4   David  40      M
5  5     Eva  45      F

  ID    Name Age
1  1     Ali  25
2  2    Boby  30
3  3 Charles  35
4  4   David  40
5  5     Eva  45

Remove A Column Using dplyr

The dplyr package, part of the tidyverse, provides a convenient way to manipulate data frames. You can use the select function to remove columns.

R
# Load dplyr
library(dplyr)
# Create a sample data frame
df <- data.frame(
  ID = 1:5,
  Name = c("Ali", "Boby", "Charles", "David", "Eva"),
  Age = c(25, 30, 35, 40, 45),
  Gender = c("F", "M", "M", "M", "F")
)
df
# Remove the 'Age' column using dplyr::select
df <- df %>% select(-Age)
print(df)

Output:

  ID    Name Age Gender
1  1     Ali  25      F
2  2    Boby  30      M
3  3 Charles  35      M
4  4   David  40      M
5  5     Eva  45      F

  ID    Name Gender
1  1     Ali      F
2  2    Boby      M
3  3 Charles      M
4  4   David      M
5  5     Eva      F

Remove Multiple Columns

To remove multiple columns, you can use dplyr::select with the c() function to specify the column names:

R
df <- data.frame(
ID = 1:5,
Name = c("Ali", "Boby", "Charles", "David", "Eva"),
Age = c(25, 30, 35, 40, 45),
Gender = c("F", "M", "M", "M", "F")
)
df

# Remove 'Age' and 'Gender' columns
df <- df %>% select(-c(Age, Gender))
print(df)

Output:

  ID    Name Age Gender
1  1     Ali  25      F
2  2    Boby  30      M
3  3 Charles  35      M
4  4   David  40      M
5  5     Eva  45      F

  ID    Name
1  1     Ali
2  2    Boby
3  3 Charles
4  4   David
5  5     Eva

Remove Columns by Pattern

You can also remove columns based on a pattern in their names:

R
df <- data.frame(
ID = 1:5,
Name = c("Ali", "Boby", "Charles", "David", "Eva"),
Age = c(25, 30, 35, 40, 45),
Gender = c("F", "M", "M", "M", "F")
)
df

# Remove columns starting with 'Age' or 'Gender'
df <- df %>% select(-starts_with("Age"), -starts_with("Gender"))
print(df)

Output:

  ID    Name Age Gender
1  1     Ali  25      F
2  2    Boby  30      M
3  3 Charles  35      M
4  4   David  40      M
5  5     Eva  45      F

  ID    Name
1  1     Ali
2  2    Boby
3  3 Charles
4  4   David
5  5     Eva

Conclusion

Removing columns in R is a fundamental skill for data cleaning and manipulation. You can use various methods, including Base R syntax and the dplyr package, to remove columns by name, by position, or by pattern. Understanding these techniques allows you to manage your data frames effectively and focus on the columns that matter most for your analysis.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads