Skip to content
Related Articles

Related Articles

Improve Article

Extract Numbers from Character String Vector in R

  • Last Updated : 16 May, 2021

In this article, we are going to see how to extract Numbers from Character String Vector in R Programming Language. There are different approaches to extract numbers from character string vectors using some in-built functions. It can be done in the following ways:

  • Extracting numbers from character string using gsub() function
  • Extracting numbers from character string using gregexpr() & regmatches() functions

Method 1: Using gsub() function.

In this method to extract numbers from character string vector, the user has to call the gsub() function which is one of the inbuilt function of R language, and pass the pattern for the first occurrence of the number in the given strings and the vector of the string as the parameter of this function and in return, this function will be returning the first occurred number in the given string to the user.

gsub() function: This function is used to replace find all matches of a string, if the parameter is a string vector, returns a string vector of the same length and with the same attributes. 

Syntax: gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,fixed = FALSE, useBytes = FALSE)



Parameters:

  • pattern: string to be matched, supports regular expression
  • replacement: string for replacement
  • x: string or string vector
  • perl: logical. Should Perl-compatible regexps be used? Has priority overextended
  • fixed: logical. If the TRUE, the pattern is a string to be matched as is.
  • useBytes: logical. If TRUE the matching is done byte-by-byte rather than character-by-character

For finding numbers in the string the pattern will be:

".*?([0-9]+).*"

Example:

R




gfg <- c("7g8ee6ks1", "5f9o1r0", "geeks10")           
print(gfg)
  
res = as.numeric(gsub(".*?([0-9]+).*", "\\1", gfg))             
print(res)

Output:

[1] "7g8ee6ks1" "5f9o1r0"   "geeks10"  
[1]  7  5 10

Method 2: Using gregexpr() and regmatches() functions

In this method of extracting numbers from character string using gregexpr() and regmatches() function, where the user needs to call these function with specific parameter into it and then in return these function will be returning all digits present in the vectors of strings to the user.

gregexpr() function: This function returns a list of the same length as text each element of which is of the same form as the return value for regexpr, except that the starting positions of every (disjoint) match are given. 



Syntax: gregexpr(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

Parameters:

  • pattern: regular expression, or string for fixed=TRUE
  • text: string, the character vector
  • ignore.case: case sensitive or not
  • perl: logical. Should perl-compatible regexps be used? Has priority over extended
  • fixed: logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments
  • useBytes: logical. If TRUE the matching is done byte-by-byte rather than character-by-character

regmatches() function: This function is used to extract or replace matched sub-strings from match data.

Syntax: regmatches(x, m, invert = FALSE)

Parameters:

  • x:-a character vector
  • m:-an object with match data
  • invert:-a logical: if TRUE, extract or replace the non-matched substrings.

Example:

R




gfg <- c("7g8ee6ks1", "5f9o1r0", "geeks10")           
  
gfg_numbers <- regmatches(gfg, gregexpr("[[:digit:]]+", gfg))
as.numeric(unlist(gfg_numbers))

Output:

[1]  7  8  6  1  5  9  1  0 10



My Personal Notes arrow_drop_up
Recommended Articles
Page :