Open In App

Extract Numbers from Character String Vector in R

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to see how to extract Numbers from Character String Vector in R Programming Language. There are different approaches to extract numbers from character string vectors using some in-built functions. It can be done in the following ways:

  • Extracting numbers from character string using gsub() function
  • Extracting numbers from character string using gregexpr() & regmatches() functions

Method 1: Using gsub() function.

In this method to extract numbers from character string vector, the user has to call the gsub() function which is one of the inbuilt function of R language, and pass the pattern for the first occurrence of the number in the given strings and the vector of the string as the parameter of this function and in return, this function will be returning the first occurred number in the given string to the user.

gsub() function: This function is used to replace find all matches of a string, if the parameter is a string vector, returns a string vector of the same length and with the same attributes. 

Syntax: gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,fixed = FALSE, useBytes = FALSE)

Parameters:

  • pattern: string to be matched, supports regular expression
  • replacement: string for replacement
  • x: string or string vector
  • perl: logical. Should Perl-compatible regexps be used? Has priority overextended
  • fixed: logical. If the TRUE, the pattern is a string to be matched as is.
  • useBytes: logical. If TRUE the matching is done byte-by-byte rather than character-by-character

For finding numbers in the string the pattern will be:

".*?([0-9]+).*"

Example:

R




gfg <- c("7g8ee6ks1", "5f9o1r0", "geeks10")          
print(gfg)
 
res = as.numeric(gsub(".*?([0-9]+).*", "\\1", gfg))            
print(res)


Output:

[1] "7g8ee6ks1" "5f9o1r0"   "geeks10"  
[1]  7  5 10

The time complexity is O(n), where n is the length.

The auxiliary space is also O(n), 

Method 2: Using gregexpr() and regmatches() functions

In this method of extracting numbers from character string using gregexpr() and regmatches() function, where the user needs to call these function with specific parameter into it and then in return these function will be returning all digits present in the vectors of strings to the user.

gregexpr() function: This function returns a list of the same length as text each element of which is of the same form as the return value for regexpr, except that the starting positions of every (disjoint) match are given. 

Syntax: gregexpr(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

Parameters:

  • pattern: regular expression, or string for fixed=TRUE
  • text: string, the character vector
  • ignore.case: case sensitive or not
  • perl: logical. Should perl-compatible regexps be used? Has priority over extended
  • fixed: logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments
  • useBytes: logical. If TRUE the matching is done byte-by-byte rather than character-by-character

regmatches() function: This function is used to extract or replace matched sub-strings from match data.

Syntax: regmatches(x, m, invert = FALSE)

Parameters:

  • x:-a character vector
  • m:-an object with match data
  • invert:-a logical: if TRUE, extract or replace the non-matched substrings.

Example:

R




gfg <- c("7g8ee6ks1", "5f9o1r0", "geeks10")          
 
gfg_numbers <- regmatches(gfg, gregexpr("[[:digit:]]+", gfg))
as.numeric(unlist(gfg_numbers))


Output:

[1]  7  8  6  1  5  9  1  0 10


Last Updated : 03 May, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads