Python – Start and End Indices of words from list in String

Given a String, our task is to write a Python program to extract the start and end index of all the elements of words of another list from a string.

Input : test_str = “gfg is best for all CS geeks and engineering job seekers”, check_list = [“geeks”, “engineering”, “best”, “gfg”]
Output : {‘geeks’: [23, 27], ‘engineering’: [33, 43], ‘best’: [7, 10], ‘gfg’: [0, 2]}
Explanation : “geeks” starts from index number 23 till 27, hence the result.

Input : test_str = “gfg is best for all CS geeks and engineering job seekers”, check_list = [“geeks”, “gfg”]
Output : {‘geeks’: [23, 27], ‘gfg’: [0, 2]}
Explanation : “geeks” starts from index number 23 till 27, hence the result.

Method #1 : Using loop + index() + len()

In this, loop is used to get each element from list. The index() gets the initial index and len() gets the last index of all the elements from list in the string.

Python3

# Python3 code to demonstrate working of
# Start and End Indices of words from list in String
# Using loop + index() + len()
 
# initializing string

test_str = "gfg is best for all CS geeks and engineering job seekers"
 
# printing original string

print("The original string is : " + str(test_str))
 
# initializing check_list 

check_list = ["geeks", "engineering", "best", "gfg"]
 
res = dict()

for ele in check_list :

    if ele in test_str:

        # getting front index 

        strt = test_str.index(ele)

        # getting ending index

        res[ele] = [strt, strt + len(ele) - 1]
 
# printing result

print("Required extracted indices  : " + str(res))

Output:

The original string is : gfg is best for all CS geeks and engineering job seekers

Required extracted indices : {‘geeks’: [23, 27], ‘engineering’: [33, 43], ‘best’: [7, 10], ‘gfg’: [0, 2]}

Time Complexity: O(n^2)
Auxiliary Space: O(n)

Method #2 : Using dictionary comprehension + len() + index()

In this, we perform tasks similar to the above function but the construction of the result dictionary is done using shorthand using dictionary comprehension.

Python3

# Python3 code to demonstrate working of
# Start and End Indices of words from list in String
# Using dictionary comprehension + len() + index()
 
# initializing string

test_str = "gfg is best for all CS geeks and engineering job seekers"
 
# printing original string

print("The original string is : " + str(test_str))
 
# initializing check_list

check_list = ["geeks", "engineering", "best", "gfg"]
 
# Dictionary comprehension to be used as shorthand for
# forming result Dictionary

res = {key: [test_str.index(key), test_str.index(key) + len(key) - 1]

       for key in check_list if key in test_str}
 
# printing result

print("Required extracted indices  : " + str(res))

Output:

The original string is : gfg is best for all CS geeks and engineering job seekers
Required extracted indices : {‘geeks’: [23, 27], ‘engineering’: [33, 43], ‘best’: [7, 10], ‘gfg’: [0, 2]}

Time Complexity: O(n)
Auxiliary Space: O(n)

Method #3 : Using loop+find()+len() methods

Python3

# Python3 code to demonstrate working of
# Start and End Indices of words from list in String
# Using loop + find() + len()
 
# initializing string

test_str = "gfg is best for all CS geeks and engineering job seekers"
 
# printing original string

print("The original string is : " + str(test_str))
 
# initializing check_list

check_list = ["geeks", "engineering", "best", "gfg"]
 
res = dict()

for ele in check_list :

    if ele in test_str:

        # getting front index

        strt = test_str.find(ele)

        # getting ending index

        res[ele] = [strt, strt + len(ele) - 1]
 
# printing result

print("Required extracted indices : " + str(res))

Output

The original string is : gfg is best for all CS geeks and engineering job seekers
Required extracted indices : {'geeks': [23, 27], 'engineering': [33, 43], 'best': [7, 10], 'gfg': [0, 2]}

Time complexity: O(n*m),
Auxiliary space: O(k),

Method #4: Using regular expression module re.finditer()

This method uses the finditer() method from the regular expression module to search for all occurrences of the words in the check_list in the given string test_str. For each match, it extracts the start and end indices and stores them in a dictionary.

Import the regular expression module re.
Initialize a string variable named test_str with a given value.
Print the original string using the print() function.
Initialize a list named check_list with some words to be searched in the given string.
Initialize an empty dictionary named res to store the result.
For each word in check_list, search for all its occurrences in the given string using the finditer() method from the regular expression module.
For each match, get the start index of the match using the start() method and store it in a variable named start_index.
Get the end index of the match by subtracting 1 from the end index returned by the end() method of the match object and store it in a variable named end_index.
Check if the current word is already present in the result dictionary. If yes, append the current match indices to the list of indices already stored for the word in the dictionary. If not, add the current match indices to the dictionary for the word as a new list.
Print the final result dictionary using the print() function.

Python3

import re
 
# initializing string

test_str = "gfg is best for all CS geeks and engineering job seekers"
 
# printing original string

print("The original string is : " + str(test_str))
 
# initializing check_list

check_list = ["geeks", "engineering", "best", "gfg"]
 
# initializing result dictionary

res = {}
 
# searching for all occurrences of words in check_list using regular expression

for ele in check_list:

    for match in re.finditer(ele, test_str):

        # getting start index of match

        start_index = match.start()

        # getting end index of match

        end_index = match.end() - 1

        # adding match indices to result dictionary

        if ele in res:

            res[ele].append((start_index, end_index))

        else:

            res[ele] = [(start_index, end_index)]
 
# printing result

print("Required extracted indices : " + str(res))

Output

The original string is : gfg is best for all CS geeks and engineering job seekers
Required extracted indices : {'geeks': [(23, 27)], 'engineering': [(33, 43)], 'best': [(7, 10)], 'gfg': [(0, 2)]}

Time complexity: O(n * m), where n is the length of the string test_str and m is the number of words in check_list.
Auxiliary space: O(k * l), where k is the number of words in check_list and l is the maximum number of occurrences of any word in test_str.

Method #5: Using list comprehension + enumerate() + len()

Step-by-step approach:

Initialize the string and the check_list.
Initialize an empty list to store the start and end indices of the words found in the string.
Use a list comprehension to iterate over the words in the check_list and enumerate() function to get the index of each word in the string.
For each word found in the string, append a tuple of the start and end indices to the result list.
Print the result list.

Python3

# Python3 code to demonstrate working of
# Start and End Indices of words from list in String
# Using list comprehension + enumerate() + len()
 
# initializing string

test_str = "gfg is best for all CS geeks and engineering job seekers"
 
# printing original string

print("The original string is : " + str(test_str))
 
# initializing check_list

check_list = ["geeks", "engineering", "best", "gfg"]
 
# initialize result dictionary

res = {}
 
# iterate over the words in check_list and get the index and length of each word in test_str

for word in check_list:

    for idx, val in enumerate(test_str.split()):

        if val == word:

            start_idx = test_str.index(val)

            end_idx = start_idx + len(val) - 1

            if word in res:

                res[word].append((start_idx, end_idx))

            else:

                res[word] = [(start_idx, end_idx)]
 
# print result dictionary

print("Required extracted indices  : " + str(res))

Output

The original string is : gfg is best for all CS geeks and engineering job seekers
Required extracted indices  : {'geeks': [(23, 27)], 'engineering': [(33, 43)], 'best': [(7, 10)], 'gfg': [(0, 2)]}

Time complexity: O(n*m), where n is the length of the string and m is the length of the check_list.
Auxiliary space: O(k), where k is the number of words found in the string.

Article Tags :

Python

Python Programs

Python string-programs