Python Program To Remove all control characters
Last Updated :
14 Feb, 2023
In the telecommunication and computer domain, control characters are non-printable characters which are a part of the character set. These do not represent any written symbol. They are used in signaling to cause certain effects other than adding symbols to text. Removing these control characters is an essential utility. In this article, we will discuss how to remove all those control characters.
Example:
Input : test_str = ‘Geeks\0\r for \n\bge\tee\0ks\f’
Output : Geeks for geeeks
Explanation : \n, \0, \f, \r, \b, \t being control characters are removed from string.
Input : test_str = ‘G\0\r\n\fg’
Output : Gfg
Explanation : \n, \0, \f, \r being control characters are removed from string, giving Gfg as output.
Method 1 : Using translate().
The logic applied here is that each non-control character is at the top 33 ASCII characters, hence translation is used to avoid all others except these via mapping.
Python3
test_str = 'Geeks\0\r for \n\bge\tee\0ks\f'
print ( "The original string is : " + str (test_str))
mapping = dict .fromkeys( range ( 32 ))
res = test_str.translate(mapping)
print ( "String after removal of control characters : " + str (res))
|
Output:
for original string is : Geeks
ge eeks
String after removal of control characters : Geeks for geeeks
Method 2: Using unicodedata library
In this, using unicodedata.category(), we can check each character starting with “C” is the control character and hence be avoided in the result string.
Python3
import unicodedata
test_str = 'Geeks\0\r for \n\bge\tee\0ks\f'
print ( "The original string is : " + str (test_str))
res = " ".join(char for char in test_str if unicodedata.category(char)[0]!=" C")
print ( "String after removal of control characters : " + str (res))
|
Output:
for original string is : Geeks
ge eeks
String after removal of control characters : Geeks for geeeks
Method 3: Using Regular Expression
In this, using re library’s sub() function, we can remove all those control characters which are identified with \x format.
Python3
import re
test_str = 'Geeks\0\r for \n\bge\tee\0ks\f'
print ( "The original string is : " + str (test_str))
res = re.sub(r '[\x00-\x1f]' , '', test_str)
print ( "String after removal of control characters : " + str (res))
|
Time Complexity: O(N)
Space Complexity: O(N)
Share your thoughts in the comments
Please Login to comment...