Unicodedata – Unicode Database in Python
Unicode Character Database (UCD) is defined by Unicode Standard Annex #44 which defines the character properties for all unicode characters. This module provides access to UCD and uses the same symbols and names as defined by the Unicode Character Database.
Functions defined by the module :
- unicodedata.lookup(name)
This function looks up for the character by name. If a character with the given name is found in the database, then, the corresponding character is returned otherwise Keyerror is raised.
Example :
import unicodedata
print (unicodedata.lookup( 'LEFT CURLY BRACKET' ))
print (unicodedata.lookup( 'RIGHT CURLY BRACKET' ))
print (unicodedata.lookup( 'ASTERISK' ))
|
Output :
{
}
*
- unicodedata.name(chr[, default])
This function returns the name assigned to the given character as a string. If no name is defined, default is returned by the function otherwise ValueError is raised if name is not given.
Example :
import unicodedata
print (unicodedata.name(u '/' ))
print (unicodedata.name(u '|' ))
print (unicodedata.name(u ':' ))
|
Output :
SOLIDUS
VERTICAL LINE
COLON
- unicodedata.decimal(chr[, default])
This function returns the decimal value assigned to the given character as integer. If no value is defined, default is returned by the function otherwise ValueError is raised if value is not given.
Example :
import unicodedata
print (unicodedata.decimal(u '9' ))
print (unicodedata.decimal(u 'a' ))
|
Output :
9
Traceback (most recent call last):
File "7e736755dd176cd0169eeea6f5d32057.py", line 4, in
print unicodedata.decimal(u'a')
ValueError: not a decimal
- unicodedata.digit(chr[, default])
This function returns the digit value assigned to the given character as integer. If no value is defined, default is returned by the function otherwise ValueError is raised if value is not given.
Example :
import unicodedata
print (unicodedata.decimal(u '9' ))
print (unicodedata.decimal(u '143' ))
|
Output :
9
Traceback (most recent call last):
File "ad47ae996380a777426cc1431ec4a8cd.py", line 4, in
print unicodedata.decimal(u'143')
TypeError: need a single Unicode character as parameter
- unicodedata.numeric(chr[, default])
This function returns the numeric value assigned to the given character as integer. If no value is defined, default is returned by the function otherwise ValueError is raised if value is not given.
Example :
import unicodedata
print (unicodedata.decimal(u '9' ))
print (unicodedata.decimal(u '143' ))
|
Output :
9
Traceback (most recent call last):
File "ad47ae996380a777426cc1431ec4a8cd.py", line 4, in
print unicodedata.decimal(u'143')
TypeError: need a single Unicode character as parameter
- unicodedata.category(chr)
This function returns the general category assigned to the given character as string. For example, it returns ‘L’ for letter and ‘u’ for uppercase.
Example :
import unicodedata
print (unicodedata.category(u 'A' ))
print (unicodedata.category(u 'b' ))
|
Output :
Lu
Ll
- unicodedata.bidirectional(chr)
This function returns the bidirectional class assigned to the given character as string. For example, it returns ‘A’ for arabic and ‘N’ for number. An empty string is returned by this function if no such value is defined.
Example :
import unicodedata
print (unicodedata.bidirectional(u '\u0660' ))
|
Output :
AN
- unicodedata.normalize(form, unistr)
This function returns the normal form for the Unicode string unistr. Valid values for form are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’.
Example :
from unicodedata import normalize
print ( '%r' % normalize( 'NFD' , u '\u00C7' ))
print ( '%r' % normalize( 'NFC' , u 'C\u0327' ))
print ( '%r' % normalize( 'NFKD' , u '\u2460' ))
|
Output :
u'C\u0327'
u'\xc7'
u'1'
Last Updated :
19 Nov, 2020
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...