Tokenizing a string denotes splitting a string with respect to some delimiter(s). There are many ways to tokenize a string. In this article four of them are explained:
Using stringstream
A stringstream associates a string object with a stream allowing you to read from the string as if it were a stream.
Below is the C++ implementation :
C++
#include <bits/stdc++.h>
using namespace std;
int main()
{
string line = "GeeksForGeeks is a must try" ;
vector <string> tokens;
stringstream check1(line);
string intermediate;
while (getline(check1, intermediate, ' ' ))
{
tokens.push_back(intermediate);
}
for ( int i = 0; i < tokens.size(); i++)
cout << tokens[i] << '\n' ;
}
|
Output
GeeksForGeeks
is
a
must
try
Time Complexity: O(n ) where n is the length of string.
Auxiliary Space: O(n-d) where n is the length of string and d is the number of delimiters.
Using strtok()
// Splits str[] according to given delimiters.
// and returns next token. It needs to be called
// in a loop to get all tokens. It returns NULL
// when there are no more tokens.
char * strtok(char str[], const char *delims);
Below is the C++ implementation :
C++
#include <stdio.h>
#include <string.h>
int main()
{
char str[] = "Geeks-for-Geeks" ;
char *token = strtok (str, "-" );
while (token != NULL)
{
printf ( "%s\n" , token);
token = strtok (NULL, "-" );
}
return 0;
}
|
Time Complexity: O(n ) where n is the length of string.
Auxiliary Space: O(1).
Another Example of strtok() :
C
#include <string.h>
#include <stdio.h>
int main()
{
char gfg[100] = " Geeks - for - geeks - Contribute" ;
const char s[4] = "-" ;
char * tok;
tok = strtok (gfg, s);
while (tok != 0) {
printf ( " %s\n" , tok);
tok = strtok (0, s);
}
return (0);
}
|
Output
Geeks
for
geeks
Contribute
Time Complexity: O(n ) where n is the length of string.
Auxiliary Space: O(1).
Using strtok_r()
Just like strtok() function in C, strtok_r() does the same task of parsing a string into a sequence of tokens. strtok_r() is a reentrant version of strtok().
There are two ways we can call strtok_r()
// The third argument saveptr is a pointer to a char *
// variable that is used internally by strtok_r() in
// order to maintain context between successive calls
// that parse the same string.
char *strtok_r(char *str, const char *delim, char **saveptr);
Below is a simple C++ program to show the use of strtok_r() :
C++
#include<stdio.h>
#include<string.h>
int main()
{
char str[] = "Geeks for Geeks" ;
char *token;
char *rest = str;
while ((token = strtok_r(rest, " " , &rest)))
printf ( "%s\n" , token);
return (0);
}
|
Time Complexity: O(n ) where n is the length of string.
Auxiliary Space: O(1).
Using std::sregex_token_iterator
In this method the tokenization is done on the basis of regex matches. Better for use cases when multiple delimiters are needed.
Below is a simple C++ program to show the use of std::sregex_token_iterator:
C++
#include <iostream>
#include <regex>
#include <string>
#include <vector>
std::vector<std::string> tokenize(
const std::string str,
const std::regex re)
{
std::sregex_token_iterator it{ str.begin(),
str.end(), re, -1 };
std::vector<std::string> tokenized{ it, {} };
tokenized.erase(
std::remove_if(tokenized.begin(),
tokenized.end(),
[](std::string const & s) {
return s.size() == 0;
}),
tokenized.end());
return tokenized;
}
int main()
{
const std::string str = "Break string
a,spaces,and,commas";
const std::regex re(R "([\s|,]+)" );
const std::vector<std::string> tokenized =
tokenize(str, re);
for (std::string token : tokenized)
std::cout << token << std::endl;
return 0;
}
|
Output
Break
string
a
spaces
and
commas
Time Complexity: O(n * d) where n is the length of string and d is the number of delimiters.
Auxiliary Space: O(n)
Feeling lost in the world of random DSA topics, wasting time without progress? It's time for a change! Join our DSA course, where we'll guide you on an exciting journey to master DSA efficiently and on schedule.
Ready to dive in? Explore our Free Demo Content and join our DSA course, trusted by over 100,000 geeks!
Last Updated :
02 Jan, 2023
Like Article
Save Article