Replace specific values in column using regex in R
Last Updated :
30 Jun, 2021
In this article, we will discuss how to replace specific values in columns of dataframe in R Programming Language.
Method 1 : Using sub() method
The sub() method in R programming language is a replacement method used to replace any occurrence of a pattern matched with another string. It is operative on the dataframe column or vector. It is particularly useful in the case of large datasets. It can be used to replace a character or both strings composed of one or more words in the specified dataframe column.
Syntax:
sub (pattern , new_string , df$col-name)
Parameter :
- pattern – regular expression , or a character string to replace. A * in the pattern indicates one or more characters.
- new_string – the string to replace the matches with
- df$col-name – the desired column name
Example 1:
R
data_frame <- data.frame (col1 = c ( "geeks" , "for" , "geek" , "friends" ))
print ( "Original DataFrame" )
print (data_frame)
data_frame$col1 <- sub ( "^ge.*" , "new_String" , data_frame$col1)
print ( "Modified DataFrame" )
print (data_frame)
|
Output
[1] "Original DataFrame"
col1
1 geeks
2 for
3 geek
4 friends
[1] "Modified DataFrame"
col1
1 new_String
2 for
3 new_String
4 friends
This method replaces only the first occurrence of the specified string from the mainline.
Example 2:
R
data_frame <- data.frame (col1 = c ( "geeks for geeks interviews" ,
"suitable 4 placements" ,
"interviews placements interviews" ))
print ( "Original DataFrame" )
print (data_frame)
data_frame$col1 <- sub ( "interviews" , "programming" , data_frame$col1)
print ( "Modified DataFrame" )
print (data_frame)
|
Output
[1] "Original DataFrame"
col1
1 geeks for geeks interviews
2 suitable 4 placements
3 interviews placements interviews
[1] "Modified DataFrame"
col1
1 geeks for geeks programming
2 suitable 4 placements
3 programming placements interviews
Method 2 : Using gsub() method
The gsub( ) method is similar to the sub() method. However, it can use regular expressions for substitution. It also replaces all the occurrences of a particular word in the line.
Syntax:
gsub (pattern , new_string , df$col-name)
Parameter :
- pattern – regular expression , or a character string to replace
- new_string – the string to replace the matches with
- df$col-name – the desired column name
Example 1:
R
data_frame <- data.frame (col1 = c ( "geeks" , "for" , "friends" , "gap" , "geek" ))
print ( "Original DataFrame" )
print (data_frame)
data_frame$col1 <- gsub ( "^\\ge.*" , "new_String" , data_frame$col1)
print ( "Modified DataFrame" )
print (data_frame)
|
Output
[1] "Original DataFrame"
col1
1 geeks
2 for
3 friends
4 gap
5 geek
[1] "Modified DataFrame"
col1
1 new_String
2 for
3 friends
4 gap
5 new_String
The gsub() method can be used to replace all the occurrences of a particular column.
Example 2:
R
data_frame <- data.frame (col1 = c ( "geeks" , "for" , "friends" , "gap" , "geek" ))
print ( "Original DataFrame" )
print (data_frame)
data_frame$col1 <- gsub ( ".*^" , "GFG " ,data_frame$col1)
print ( "Modified DataFrame" )
print (data_frame)
|
Output:
[1] "Original DataFrame"
col1
1 geeks
2 for
3 friends
4 gap
5 geek
[1] "Modified DataFrame"
col1
1 GFG geeks
2 GFG for
3 GFG friends
4 GFG gap
5 GFG geek
It can also be used to remove numbers from the string components of the values.
Example 3:
R
data_frame <- data.frame (col1 = c ( "geeks12 is good" , "suitable 4 placements" ,
"love you 2 much" ))
print ( "Original DataFrame" )
print (data_frame)
data_frame$col1 <- gsub ( "[0-9]*" , "" , data_frame$col1)
print ( "Modified DataFrame" )
print (data_frame)
|
Output:
[1] "Original DataFrame"
col1
1 geeks1
2 is good 2 suitable 4 placements
3 love you 2 much
[1] "Modified DataFrame"
col1
1 geeks is good
2 suitable placements
3 love you much
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...