Replace specific values in column using regex in R

Last Updated : 30 Jun, 2021

In this article, we will discuss how to replace specific values in columns of dataframe in R Programming Language.

Method 1 : Using sub() method

The sub() method in R programming language is a replacement method used to replace any occurrence of a pattern matched with another string. It is operative on the dataframe column or vector. It is particularly useful in the case of large datasets. It can be used to replace a character or both strings composed of one or more words in the specified dataframe column.

Syntax:

sub (pattern , new_string , df$col-name)

Parameter :

pattern – regular expression , or a character string to replace. A * in the pattern indicates one or more characters.

new_string – the string to replace the matches with

df$col-name – the desired column name

Example 1:

R

# declaring dataframe 
data_frame <- data.frame(col1 = c("geeks","for","geek","friends")) 
  
print ("Original DataFrame") 
print (data_frame) 
  
data_frame$col1 <- sub("^ge.*", "new_String", data_frame$col1) 
  
print ("Modified DataFrame") 
print (data_frame)

Output

[1] "Original DataFrame" 
col1
1   geeks 
2     for 
3    geek 
4 friends 
[1] "Modified DataFrame" 
col1 
1 new_String 
2        for 
3 new_String 
4    friends

This method replaces only the first occurrence of the specified string from the mainline.

Example 2:

R

# declaring dataframe 
data_frame <- data.frame(col1 = c("geeks for geeks interviews", 
                                  "suitable 4 placements", 
                                  "interviews placements interviews")) 
  
print ("Original DataFrame") 
print (data_frame) 
  
data_frame$col1 <- sub("interviews", "programming", data_frame$col1) 
  
print ("Modified DataFrame") 
print (data_frame)

Output

[1] "Original DataFrame" 
col1 
1       geeks for geeks interviews 
2            suitable 4 placements 
3 interviews placements interviews
[1] "Modified DataFrame"
col1 
1       geeks for geeks programming 
2             suitable 4 placements 
3 programming placements interviews

Method 2 : Using gsub() method

The gsub( ) method is similar to the sub() method. However, it can use regular expressions for substitution. It also replaces all the occurrences of a particular word in the line.

Syntax:

gsub (pattern , new_string , df$col-name)

Parameter :

pattern – regular expression , or a character string to replace

new_string – the string to replace the matches with

df$col-name – the desired column name

Example 1:

R

# declaring dataframe 
data_frame <- data.frame(col1 = c("geeks","for","friends","gap","geek")) 
  
print ("Original DataFrame") 
print (data_frame) 
  
data_frame$col1 <- gsub("^\\ge.*", "new_String", data_frame$col1) 
  
print ("Modified DataFrame") 
print (data_frame)

Output

[1] "Original DataFrame" 
col1 
1   geeks 
2     for 
3 friends 
4     gap 
5    geek 
[1] "Modified DataFrame" 
col1 
1 new_String 
2        for 
3    friends 
4        gap 
5 new_String

The gsub() method can be used to replace all the occurrences of a particular column.

Example 2:

R

# declaring dataframe 
data_frame <- data.frame(col1 = c("geeks","for","friends","gap","geek")) 
  
print ("Original DataFrame") 
print (data_frame) 
  
data_frame$col1 <- gsub(".*^","GFG ",data_frame$col1) 
  
print ("Modified DataFrame") 
print (data_frame)

Output:

[1] "Original DataFrame" 
col1 
1   geeks 
2     for 
3 friends 
4     gap 
5    geek 
[1] "Modified DataFrame" 
col1 
1   GFG geeks
2     GFG for
3 GFG friends
4     GFG gap
5    GFG geek

It can also be used to remove numbers from the string components of the values.

Example 3:

R

# declaring dataframe 
data_frame <- data.frame(col1 = c("geeks12 is good","suitable 4 placements", 
                                  "love you 2 much")) 
  
print ("Original DataFrame") 
print (data_frame) 
  
data_frame$col1 <- gsub("[0-9]*", "", data_frame$col1) 
  
print ("Modified DataFrame") 
print (data_frame)

Output:

[1] "Original DataFrame" 
col1 
1       geeks1
2 is good 2 suitable 4 placements 
3       love you 2 much 
[1] "Modified DataFrame" 
col1 
1        geeks is good 
2 suitable  placements 
3       love you  much

Suggest improvement

Get Standard Deviation of a Column in R dataframe

How to label specific points in scatter plot in R ?

Share your thoughts in the comments

Replace specific values in column using regex in R

Method 1 : Using sub() method

R

R

Method 2 : Using gsub() method

R

R

R

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?