Skip to content
Related Articles

Related Articles

Improve Article

Replace specific values in column using regex in R

  • Last Updated : 30 Jun, 2021

In this article, we will discuss how to replace specific values in columns of dataframe in R Programming Language.

Method 1 : Using sub() method

The sub() method in R programming language is a replacement method used to replace any occurrence of a pattern matched with another string. It is operative on the dataframe column or vector. It is particularly useful in the case of large datasets. It can be used to replace a character or both strings composed of one or more words in the specified dataframe column. 

Syntax:

sub (pattern , new_string , df$col-name)

Parameter : 



  • pattern – regular expression , or a character string to replace. A * in the pattern indicates one or more characters. 
  • new_string – the string to replace the matches with
  • df$col-name – the desired column name

Example 1:

R




# declaring dataframe
data_frame <- data.frame(col1 = c("geeks","for","geek","friends"))
  
print ("Original DataFrame")
print (data_frame)
  
data_frame$col1 <- sub("^ge.*", "new_String", data_frame$col1)
  
print ("Modified DataFrame")
print (data_frame)

Output

[1] "Original DataFrame" 
col1
1   geeks 
2     for 
3    geek 
4 friends 
[1] "Modified DataFrame" 
col1 
1 new_String 
2        for 
3 new_String 
4    friends

This method replaces only the first occurrence of the specified string from the mainline. 

Example 2:

R




# declaring dataframe
data_frame <- data.frame(col1 = c("geeks for geeks interviews",
                                  "suitable 4 placements",
                                  "interviews placements interviews"))
  
print ("Original DataFrame")
print (data_frame)
  
data_frame$col1 <- sub("interviews", "programming", data_frame$col1)
  
print ("Modified DataFrame")
print (data_frame)

Output

[1] "Original DataFrame" 
col1 
1       geeks for geeks interviews 
2            suitable 4 placements 
3 interviews placements interviews
[1] "Modified DataFrame"
col1 
1       geeks for geeks programming 
2             suitable 4 placements 
3 programming placements interviews

Method 2 : Using gsub() method

The gsub( ) method is similar to the sub() method. However, it can use regular expressions for substitution. It also replaces all the occurrences of a particular word in the line. 



Syntax:

gsub (pattern , new_string , df$col-name)

Parameter : 

  • pattern – regular expression , or a character string to replace
  • new_string – the string to replace the matches with
  • df$col-name – the desired column name

Example 1:

R




# declaring dataframe
data_frame <- data.frame(col1 = c("geeks","for","friends","gap","geek"))
  
print ("Original DataFrame")
print (data_frame)
  
data_frame$col1 <- gsub("^\\ge.*", "new_String", data_frame$col1)
  
print ("Modified DataFrame")
print (data_frame)

Output

[1] "Original DataFrame" 
col1 
1   geeks 
2     for 
3 friends 
4     gap 
5    geek 
[1] "Modified DataFrame" 
col1 
1 new_String 
2        for 
3    friends 
4        gap 
5 new_String

The gsub() method can be used to replace all the occurrences of a particular column.

Example 2:

R




# declaring dataframe
data_frame <- data.frame(col1 = c("geeks","for","friends","gap","geek"))
  
print ("Original DataFrame")
print (data_frame)
  
data_frame$col1 <- gsub(".*^","GFG ",data_frame$col1)
  
print ("Modified DataFrame")
print (data_frame)

Output:

[1] "Original DataFrame" 
col1 
1   geeks 
2     for 
3 friends 
4     gap 
5    geek 
[1] "Modified DataFrame" 
col1 
1   GFG geeks
2     GFG for
3 GFG friends
4     GFG gap
5    GFG geek

It can also be used to remove numbers from the string components of the values. 

Example 3:

R




# declaring dataframe
data_frame <- data.frame(col1 = c("geeks12 is good","suitable 4 placements",
                                  "love you 2 much"))
  
print ("Original DataFrame")
print (data_frame)
  
data_frame$col1 <- gsub("[0-9]*", "", data_frame$col1)
  
print ("Modified DataFrame")
print (data_frame)

Output:

[1] "Original DataFrame" 
col1 
1       geeks1
2 is good 2 suitable 4 placements 
3       love you 2 much 
[1] "Modified DataFrame" 
col1 
1        geeks is good 
2 suitable  placements 
3       love you  much



My Personal Notes arrow_drop_up
Recommended Articles
Page :