Python Program to Split joined consecutive similar characters

Last Updated : 18 May, 2023

Given a String, our task is to write a Python program to split on the occurrence of a non-similar character.

Input : test_str = ‘ggggffggisssbbbeessssstt’
Output : [‘gggg’, ‘ff’, ‘gg’, ‘i’, ‘sss’, ‘bbb’, ‘ee’, ‘sssss’, ‘tt’]
Explanation : All similar consecutive characters are converted to separate strings.

Input : test_str = ‘ggggffgg’
Output : [‘gggg’, ‘ff’, ‘gg’]
Explanation : All similar consecutive characters are converted to separate strings.

Method #1 : Using join() + list comprehension + groupby()

In this, the characters are grouped on similarity using groupby(), join() is used to reform strings list. List comprehension performs task of iterating constructed groups.

Python3

# Python3 code to demonstrate working of
# Split joined consecutive similar characters
# Using join() + list comprehension + groupby()
from itertools import groupby
 
# initializing string
test_str = 'ggggffggisssbbbeessssstt'
 
# printing original string
print("The original string is : " + str(test_str))
 
# groupby groups the elements, join joining Consecutive groups
res = ["".join(group) for ele, group in groupby(test_str)]
       
# printing result
print("Consecutive split string is : " + str(res))

Output

The original string is : ggggffggisssbbbeessssstt
Consecutive split string is : ['gggg', 'ff', 'gg', 'i', 'sss', 'bbb', 'ee', 'sssss', 'tt']

Time Complexity: O(n)
Auxiliary Space: O(n)

Method #2 : Using finditer() + regex + list comprehension

In this, regex is used to check for consecutive equal sequences. The finditer() performs the task of finding the matching regex in a string.

Python3

# Python3 code to demonstrate working of
# Split joined consecutive similar characters
# Using finditer() + regex + list comprehension
import re
 
# initializing string
test_str = 'ggggffggisssbbbeessssstt'
 
# printing original string
print("The original string is : " + str(test_str))
 
# list comprehension iterates for all the formed groups found by regex
# if consecutive numbers need to search "d" can be used.
res = [iters.group(0) for iters in re.finditer(r"(\D)\1*", test_str)]
       
# printing result
print("Consecutive split string is : " + str(res))

Output

The original string is : ggggffggisssbbbeessssstt
Consecutive split string is : ['gggg', 'ff', 'gg', 'i', 'sss', 'bbb', 'ee', 'sssss', 'tt']

The Time and Space Complexity of all the methods is :

Time Complexity: O(n)

Space Complexity: O(n)

Approach 3: Using for loop
This approach uses a for loop to iterate over the characters in the string and a temporary string to store the consecutive similar characters. The time and space complexity of this approach is O(n).

Python3

def split_consecutive_characters(test_str):
    n = len(test_str)
    result = []
    temp = test_str[0]
    for i in range(1, n):
        if test_str[i] == test_str[i-1]:
            temp += test_str[i]
        else:
            result.append(temp)
            temp = test_str[i]
    result.append(temp)
    return result
 
test_str = 'ggggffggisssbbbeessssstt'
print(split_consecutive_characters(test_str))

Output

['gggg', 'ff', 'gg', 'i', 'sss', 'bbb', 'ee', 'sssss', 'tt']

Time Complexity: O(n), where n is the length of the input string.

Auxiliary Space: O(n)

Approach#4: Using re.findall()

In this approach, we use the re.findall method to find all the consecutive groups of similar characters in the input string. The regular expression pattern r”((\w)\2*)” matches any sequence of characters that are the same, and returns them as groups We then extract the first element of each group (which is the string of consecutive characters) and return them as a list.

Define a regular expression pattern to match consecutive characters.
Use re.findall() method to extract all the groups of consecutive characters from the input string.
Return a list of the matched groups.

Python3

import re
 
def split_consecutive_chars(test_str):
    pattern = r"((\w)\2*)"
    groups = re.findall(pattern, test_str)
    return [group[0] for group in groups]
 
test_str = 'ggggffggisssbbbeessssstt'
print(split_consecutive_chars(test_str))

Output

['gggg', 'ff', 'gg', 'i', 'sss', 'bbb', 'ee', 'sssss', 'tt']

Time Complexity: O(n), where n is the length of the input string. The re.findall() method is usually implemented using efficient string matching algorithms that have linear time complexity.

Space Complexity: O(n), where n is the length of the input string. This is because we need to store the matched groups in a list. The regular expression pattern itself does not require any extra space.

Approach#5: Using recursion:

Algorithm:

Check if the input string is empty. If yes, return an empty list.
Set the variable ‘first’ to the first character of the input string.
Set the variable ‘i’ to 1.
While ‘i’ is less than the length of the input string and the ‘i-th’ character of the input string is equal to the ‘first’ character, increment ‘i’.
Create a new list with the current consecutive characters, which is the ‘first’ character repeated ‘i’ times, and concatenate it with the result of recursively calling the
function on the remaining string after the current consecutive characters.
Return the final list.

Python3

def split_consecutive_characters(test_str):
    if not test_str:
        return []
    first = test_str[0]
    i = 1
    while i < len(test_str) and test_str[i] == first:
        i += 1
    return [first * i] + split_consecutive_characters(test_str[i:])
 
test_str = 'ggggffggisssbbbeessssstt'
# printing original string
print("The original string is : " + str(test_str))
print(split_consecutive_characters(test_str))
#This code is contributed by Jyothi Pinjala

Output

The original string is : ggggffggisssbbbeessssstt
['gggg', 'ff', 'gg', 'i', 'sss', 'bbb', 'ee', 'sssss', 'tt']

Time Complexity: O(n), where n is the length of the input string. This is because we need to iterate through the entire string to split it into consecutive characters.
Space Complexity: O(n), where n is the length of the input string. This is because we are creating a new list for each set of consecutive characters, and we could potentially have n/2 such sets if all characters are consecutive. Additionally, the recursive calls to the function create a call stack with a maximum depth of n/2, as the length of the string decreases by at least half with each recursive call.