Tokenizing a string in C++

Tokenizing a string denotes splitting a string with respect to some delimiter(s). There are many ways to tokenize a string. In this article four of them are explained:

Using stringstream

A stringstream associates a string object with a stream allowing you to read from the string as if it were a stream.
Below is the C++ implementation : 
 

C++

filter_none

edit
close

play_arrow

link
brightness_4
code

// Tokenizing a string using stringstream
#include <bits/stdc++.h>
  
using namespace std;
  
int main()
{
      
    string line = "GeeksForGeeks is a must try";
      
    // Vector of string to save tokens
    vector <string> tokens;
      
    // stringstream class check1
    stringstream check1(line);
      
    string intermediate;
      
    // Tokenizing w.r.t. space ' '
    while(getline(check1, intermediate, ' '))
    {
        tokens.push_back(intermediate);
    }
      
    // Printing the token vector
    for(int i = 0; i < tokens.size(); i++)
        cout << tokens[i] << '\n';
}

chevron_right


Output

GeeksForGeeks
is
a
must
try


Using strtok()

// Splits str[] according to given delimiters.
// and returns next token. It needs to be called
// in a loop to get all tokens. It returns NULL
// when there are no more tokens.
char * strtok(char str[], const char *delims);


Below is the C++ implementation : 
 

C++

filter_none

edit
close

play_arrow

link
brightness_4
code

// C/C++ program for splitting a string
// using strtok()
#include <stdio.h>
#include <string.h>
  
int main()
{
    char str[] = "Geeks-for-Geeks";
  
    // Returns first token 
    char *token = strtok(str, "-");
  
    // Keep printing tokens while one of the
    // delimiters present in str[].
    while (token != NULL)
    {
        printf("%s\n", token);
        token = strtok(NULL, "-");
    }
  
    return 0;
}

chevron_right


Output



Geeks
for
Geeks


Another Example of strtok() :
 

C

filter_none

edit
close

play_arrow

link
brightness_4
code

// C code to demonstrate working of
// strtok
#include <string.h>
#include <stdio.h>
  
// Driver function
int main()
{
 // Declaration of string
    char gfg[100] = " Geeks - for - geeks - Contribute";
  
    // Declaration of delimiter
    const char s[4] = "-";
    char* tok;
  
    // Use of strtok
    // get first token
    tok = strtok(gfg, s);
  
    // Checks for delimeter
    while (tok != 0) {
        printf(" %s\n", tok);
  
        // Use of strtok
        // go through other tokens
        tok = strtok(0, s);
    }
  
    return (0);
}

chevron_right


Output

  Geeks 
  for 
  geeks 
  Contribute


Using strtok_r()

Just like strtok() function in C, strtok_r() does the same task of parsing a string into a sequence of tokens. strtok_r() is a reentrant version of strtok().
There are two ways we can call strtok_r() 
 

// The third argument saveptr is a pointer to a char * 
// variable that is used internally by strtok_r() in 
// order to maintain context between successive calls
// that parse the same string.
char *strtok_r(char *str, const char *delim, char **saveptr);


Below is a simple C++ program to show the use of strtok_r() : 
 

CPP

filter_none

edit
close

play_arrow

link
brightness_4
code

// C/C++ program to demonstrate working of strtok_r()
// by splitting string based on space character.
#include<stdio.h>
#include<string.h>
  
int main()
{
    char str[] = "Geeks for Geeks";
    char *token;
    char *rest = str;
  
    while ((token = strtok_r(rest, " ", &rest)))
        printf("%s\n", token);
  
    return(0);
}

chevron_right


Output

Geeks
for
Geeks


Using std::sregex_token_iterator

In this method the tokenization is done on the basis of regex matches. Better for use cases when multiple delimiters are needed.

Below is a simple C++ program to show the use of std::sregex_token_iterator:

C++

filter_none

edit
close

play_arrow

link
brightness_4
code

// CPP program for above approach
#include <iostream>
#include <regex>
#include <string>
#include <vector>
  
/**
 * @brief Tokenize the given vector 
   according to the regex
 * and remove the empty tokens.
 *
 * @param str
 * @param re
 * @return std::vector<std::string>
 */
std::vector<std::string> tokenize(
                     const std::string str,
                          const std::regex re)
{
    std::sregex_token_iterator it{ str.begin(), 
                             str.end(), re, -1 };
    std::vector<std::string> tokenized{ it, {} };
  
    // Additional check to remove empty strings
    tokenized.erase(
        std::remove_if(tokenized.begin(), 
                            tokenized.end(),
                       [](std::string const& s) {
                           return s.size() == 0;
                       }),
        tokenized.end());
  
    return tokenized;
}
  
// Driver Code
int main()
{
    const std::string str = "Break string 
                   a,spaces,and,commas";
    const std::regex re(R"([\s|,]+)");
    
    // Function Call
    const std::vector<std::string> tokenized = 
                           tokenize(str, re);
    
    for (std::string token : tokenized)
        std::cout << token << std::endl;
    return 0;
}

chevron_right


Output

Break
string
a
spaces
and
commas

Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with the DSA Self Paced Course at a student-friendly price and become industry ready.




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.



Improved By : mani00manu