Skip to content
Related Articles

Related Articles

Python – Remove Non-English characters Strings from List

Improve Article
Save Article
  • Last Updated : 21 Nov, 2022
Improve Article
Save Article

Given a List of Strings, perform removal of all Strings with non-english characters.

Input : test_list = [‘Good| ????’, ‘??Geeks???’] Output : [] Explanation : Both contain non-English characters Input : test_list = [“Gfg”, “Best”] Output : [“Gfg”, “Best”] Explanation : Both are valid English words.

Method #1 : Using regex + findall() + list comprehension

In this, we create a regex of unicodes and check for occurrence in String List, extract each String without unicode using findall().

Python3




# Python3 code to demonstrate working of
# Remove Non-English characters Strings from List
# Using regex + findall() + list comprehension
import re
 
# initializing list
test_list = ['Gfg', 'Good| ????', "for"'??Geeks???']
 
# printing original list
print("The original list is : " + str(test_list))
 
# using findall() to neglect unicode of Non-English alphabets
res = [idx for idx in test_list if not re.findall("[^\u0000-\u05C0\u2100-\u214F]+", idx)]
 
# printing result
print("The extracted list : " + str(res))

Method #2 : Using regex + search() + filter() + lambda

In this, we search for only English alphabets in String, and extract only those that have those. We use filter() + lambda to perform the task of passing filter functionality and iteration.

Python3




# Python3 code to demonstrate working of
# Remove Non-English characters Strings from List
# Using regex + search() + filter() + lambda
import re
 
# initializing list
test_list = ['Gfg', 'Good| ????', "for"'??Geeks???']
 
# printing original list
print("The original list is : " + str(test_list))
 
# using search() to get only those strings with alphabets
res = list(filter(lambda ele: re.search("[a-zA-Z\s]+", ele) is not None, test_list))
 
# printing result
print("The extracted list : " + str(res))

The Time and Space Complexity for all the methods are the same:

Time Complexity: O(n)

Auxiliary Space: O(n)

Method #3 : Using for loop

Python3




# Python3 code to demonstrate working of
# Remove Non-English characters Strings from List
 
# initializing list
test_list = ['Gfg', 'Good| ????', "for", '??Geeks???']
 
# printing original list
print("The original list is : " + str(test_list))
loweralphabets="abcdefghijklmnopqrstuvwxyz"
upperalphabets="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
x=loweralphabets+upperalphabets
res=[]
for i in test_list:
    a=""
    for j in i:
        if j in x:
            a+=j
    res.append(a)
             
# printing result
print("The extracted list : " + str(res))

Output

The original list is : ['Gfg', 'Good| ????', 'for', '??Geeks???']
The extracted list : ['Gfg', 'Good', 'for', 'Geeks']

My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!