Splitting a string by some delimiter is a very common task. For example, we have a comma-separated list of items from a file and we want individual items in an array.
Almost all programming languages, provide a function split a string by some delimiter.
In C:
// Splits str[] according to given delimiters.
// and returns next token. It needs to be called
// in a loop to get all tokens. It returns NULL
// when there are no more tokens.
char * strtok(char str[], const char *delims);
C
#include <stdio.h>
#include <string.h>
int main()
{
char str[] = "Geeks-for-Geeks" ;
char *token = strtok (str, "-" );
while (token != NULL)
{
printf ( "%s\n" , token);
token = strtok (NULL, "-" );
}
return 0;
}
|
Output: Geeks
for
Geeks
Time complexity : O(n)
Auxiliary Space: O(n)
In C++
Note: The main disadvantage of strtok() is that it only works for C style strings.
Therefore we need to explicitly convert C++ string into a char array.
Many programmers are unaware that C++ has two additional APIs which are more elegant
and works with C++ string.
Method 1: Using stringstream API of C++
Prerequisite: stringstream API
Stringstream object can be initialized using a string object, it automatically tokenizes strings on space char. Just like “cin” stream stringstream allows you to read a string as a stream of words. Alternately, we can also utilise getline function to tokenize string on any single character delimiter.
Some of the Most Common used functions of StringStream.
clear() — flushes the stream
str() — converts a stream of words into a C++ string object.
operator << — pushes a string object into the stream.
operator >> — extracts a word from the stream.
The code below demonstrates it.
C++
#include <bits/stdc++.h>
using namespace std;
void simple_tokenizer(string s)
{
stringstream ss(s);
string word;
while (ss >> word) {
cout << word << endl;
}
}
void adv_tokenizer(string s, char del)
{
stringstream ss(s);
string word;
while (!ss.eof()) {
getline(ss, word, del);
cout << word << endl;
}
}
int main( int argc, char const * argv[])
{
string a = "How do you do!" ;
string b = "How$do$you$do!" ;
simple_tokenizer(a);
cout << endl;
adv_tokenizer(b, '$' );
cout << endl;
return 0;
}
|
Output : How
do
you
do!
Time Complexity: O(n)
Auxiliary Space:O(n)
Where n is the length of the input string.
Method 2: Using C++ find() and substr() APIs.
Prerequisite: find function and substr().
This method is more robust and can parse a string with any delimiter, not just spaces(though the default behavior is to separate on spaces.) The logic is pretty simple to understand from the code below.
C++
#include <bits/stdc++.h>
using namespace std;
void tokenize(string s, string del = " " )
{
int start, end = -1*del.size();
do {
start = end + del.size();
end = s.find(del, start);
cout << s.substr(start, end - start) << endl;
} while (end != -1);
}
int main( int argc, char const * argv[])
{
string a = "How$%do$%you$%do$%!" ;
tokenize(a, "$%" );
cout << endl;
return 0;
}
|
Output: How
do
you
do
!
Time Complexity: O(n)
Auxiliary Space:O(1)
Where n is the length of the input string.
Method 3: Using temporary string
If you are given that the length of the delimiter is 1, then you can simply use a temp string to split the string. This will save the function overhead time in the case of method 2.
C++
#include <iostream>
using namespace std;
void split(string str, char del){
string temp = "" ;
for ( int i=0; i<( int )str.size(); i++){
if (str[i] != del){
temp += str[i];
}
else {
cout << temp << " " ;
temp = "" ;
}
}
cout << temp;
}
int main() {
string str = "geeks_for_geeks" ;
char del = '_' ;
split(str, del);
return 0;
}
|
Time complexity : O(n)
Auxiliary Space: O(n)
In Java :
In Java, split() is a method in String class.
// expregexp is the delimiting regular expression;
// limit is the number of returned strings
public String[] split(String regexp, int limit);
// We can call split() without limit also
public String[] split(String regexp)
Java
import java.io.*;
public class Test
{
public static void main(String args[])
{
String Str = new String( "Geeks-for-Geeks" );
for (String val: Str.split( "-" , 2 ))
System.out.println(val);
System.out.println( "" );
for (String val: Str.split( "-" ))
System.out.println(val);
}
}
|
Output:
Geeks
for-Geeks
Geeks
for
Geeks
Time complexity : O(n)
Auxiliary Space: O(1)
In Python:
The split() method in Python returns a list of strings after breaking the given string by the specified separator.
// regexp is the delimiting regular expression;
// limit is limit the number of splits to be made
str.split(regexp = "", limit = string.count(str))
Python3
line = "Geek1 \nGeek2 \nGeek3"
print (line.split())
print (line.split( ' ' , 1 ))
|
Output:
['Geek1', 'Geek2', 'Geek3']
['Geek1', '\nGeek2 \nGeek3']
Time Complexity : O(N), since it just traverse through the string finding all whitespace.
Auxiliary Space : O(1), since no extra space has been used.
This article is contributed by Aarti_Rathi and Aditya Chatterjee. If you like GeeksforGeeks and would like to contribute, you can also write an article and mail your article to review-team@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.