Bigram formation from a given Python list

When we are dealing with text classification, sometimes we need to do certain kind of natural language processing and hence sometimes require to form bigrams of words for processing. In case of absence of appropriate library, its difficult and having to do the same is always quite useful. Let’s discuss certain ways in which this can be achieved.

Method #1 : Using list comprehension + enumerate() + split() The combination of above three functions can be used to achieve this particular task. The enumerate function performs the possible iteration, split function is used to make pairs and list comprehension is used to combine the logic.

Python3

 `# Python3 code to demonstrate``# Bigram formation``# using list comprehension + enumerate() + split()`` ` `# initializing list ``test_list ``=` `[``'geeksforgeeks is best'``, ``'I love it'``]` `# printing the original list ``print` `("The original ``list` `is` `: " ``+` `str``(test_list))` `# using list comprehension + enumerate() + split()``# for Bigram formation``res ``=` `[(x, i.split()[j ``+` `1``]) ``for` `i ``in` `test_list ``       ``for` `j, x ``in` `enumerate``(i.split()) ``if` `j < ``len``(i.split()) ``-` `1``]` `# printing result``print` `("The formed bigrams are : " ``+` `str``(res))`

Output :

The original list is : [‘geeksforgeeks is best’, ‘I love it’] The formed bigrams are : [(‘geeksforgeeks’, ‘is’), (‘is’, ‘best’), (‘I’, ‘love’), (‘love’, ‘it’)]

Time Complexity: O(n), where n is the length of the list test_list
Auxiliary Space: O(n) additional space of size n is created where n is the number of elements in the res list

Method #2 : Using zip() + split() + list comprehension The task that enumerate performed in the above method can also be performed by the zip function by using the iterator and hence in a faster way. Let’s discuss certain ways in which this can be done.

Python3

 `# Python3 code to demonstrate``# Bigram formation``# using zip() + split() + list comprehension`` ` `# initializing list ``test_list ``=` `[``'geeksforgeeks is best'``, ``'I love it'``]` `# printing the original list ``print` `("The original ``list` `is` `: " ``+` `str``(test_list))` `# using zip() + split() + list comprehension``# for Bigram formation``res ``=` `[i ``for` `j ``in` `test_list ``       ``for` `i ``in` `zip``(j.split(" ")[:``-``1``], j.split(" ")[``1``:])]` `# printing result``print` `("The formed bigrams are : " ``+` `str``(res))`

Output :

The original list is : [‘geeksforgeeks is best’, ‘I love it’] The formed bigrams are : [(‘geeksforgeeks’, ‘is’), (‘is’, ‘best’), (‘I’, ‘love’), (‘love’, ‘it’)]

Method #3 : Using reduce():
Algorithm:

1. Initialize the input list “test_list”.
2. Print the original list “test_list”.
3. Use a list comprehension and enumerate() to form bigrams for each string in the input list.
4. Append each bigram tuple to a result list “res”.
5. Print the formed bigrams in the list “res”.

Python3

 `from` `functools ``import` `reduce` `# initializing list``test_list ``=` `[``'geeksforgeeks is best'``, ``'I love it'``]` `# printing the original list``print``(``"The original list is : "` `+` `str``(test_list))` `# using reduce() method to form bigrams``res ``=` `reduce``(``lambda` `acc, s: acc ``+` `[(w, s.split()[i``+``1``]) ``for` `i, w ``in` `enumerate``(s.split()) ``if` `i < ``len``(s.split())``-``1``], test_list, [])` `# printing result``print``(``"The formed bigrams are : "` `+` `str``(res))``#This code is contributed by Jyothi pinjala.`

Output
```The original list is : ['geeksforgeeks is best', 'I love it']
The formed bigrams are : [('geeksforgeeks', 'is'), ('is', 'best'), ('I', 'love'), ('love', 'it')]```

Time complexity:
The time complexity of the code is O(n*m) where n is the number of strings in the input list and m is the maximum number of words in any string. The reason for this is that the code iterates through each string in the input list and splits it into words, and then iterates through each word to form bigrams. This operation is performed once for each string in the input list, so the time complexity is proportional to the number of strings in the list and the maximum number of words in any string.

Space complexity:
The space complexity of the code is also O(n*m) where n is the number of strings in the input list and m is the maximum number of words in any string. The reason for this is that the code creates a result list “res” that stores all the formed bigrams. The size of the list is proportional to the number of bigrams formed, which in turn is proportional to the number of words in each string. Therefore, the space complexity is proportional to the number of strings in the input list and the maximum number of words in any string.

Previous
Next