Open In App

Python Program To Remove all control characters

Last Updated : 14 Feb, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

In the telecommunication and computer domain, control characters are non-printable characters which are a part of the character set. These do not represent any written symbol. They are used in signaling to cause certain effects other than adding symbols to text. Removing these control characters is an essential utility. In this article, we will discuss how to remove all those control characters.

Example:

Input : test_str = ‘Geeks\0\r for \n\bge\tee\0ks\f’

Output : Geeks for geeeks

Explanation : \n, \0, \f, \r, \b, \t being control characters are removed from string.

Input : test_str = ‘G\0\r\n\fg’

Output : Gfg

Explanation : \n, \0, \f, \r being control characters are removed from string, giving Gfg as output.

Method 1 : Using translate().

The logic applied here is that each non-control character is at the top 33 ASCII characters, hence translation is used to avoid all others except these via mapping.

Python3




# Python3 code to demonstrate working of
# Remove all control characters
# Using translate()
 
# initializing string
test_str = 'Geeks\0\r for \n\bge\tee\0ks\f'
 
# printing original string
print("The original string is : " + str(test_str))
 
# using translate() and fromkeys()
# to escape all control characters
mapping =  dict.fromkeys(range(32))
res = test_str.translate(mapping)
 
# printing result
print("String after removal of control characters : " + str(res))


Output:

 for original string is : Geeks
ge    eeks
String after removal of control characters : Geeks for geeeks

Method 2: Using unicodedata library

In this, using unicodedata.category(), we can check each character starting with “C” is the control character and hence be avoided in the result string.

Python3




# Python3 code to demonstrate working of
# Remove all control characters
# Using unicodedata library
import unicodedata
 
# initializing string
test_str = 'Geeks\0\r for \n\bge\tee\0ks\f'
 
# printing original string
print("The original string is : " + str(test_str))
 
# surpassing all control characters
# checking for starting with C
res = "".join(char for char in test_str if unicodedata.category(char)[0]!="C")
 
# printing result
print("String after removal of control characters : " + str(res))


Output:

 for original string is : Geeks
ge    eeks
String after removal of control characters : Geeks for geeeks

Method 3: Using Regular Expression

In this, using re library’s sub() function, we can remove all those control characters which are identified with \x format.

Python3




# Python3 code to demonstrate working of
# Remove all control characters
# Using Regular Expression
import re
  
# initializing string
test_str = 'Geeks\0\r for \n\bge\tee\0ks\f'
  
# printing original string
print("The original string is : " + str(test_str))
  
# surpassing all control characters
# using sub()
res = re.sub(r'[\x00-\x1f]', '', test_str)
  
# printing result
print("String after removal of control characters : " + str(res))
#This code is contributed by Edula Vinay Kumar Reddy


Time Complexity: O(N)
Space Complexity: O(N)



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads