Given that one wants to write an extension module that needs to pass a Python string to C library function. So, the question arises to properly handle Unicode. So, one of the main issues that arise is that existing C libraries won’t understand Python’s native representation of Unicode. Therefore, the main challenge is to convert the Python string into a form that can be more easily understood by C libraries.
To illustrate the solution – given below are two C functions that operate on string data and output it for debugging and experimentation.
Code #1 : Uses bytes provided in the form
char *, int
Code #2 : Uses wide characters in the form
wchar_t *, int
Python strings need to be converted to a suitable byte encoding such as UTF-8 for the byte-oriented function
print_chars(). The code given below a simple extension function solving the purpose.
Code #3 :
For library functions that work with the machine native
wchar_t type, C extension code can be written as –
Code #4 :
Now the code below checks how the extension functions work.
It is to be observed the way the byte-oriented function
print_chars() is receiving UTF-8 encoded data, whereas
print_wchars() is receiving the Unicode code point values.
Code #5 :
53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f 53 70 69 63 79 20 4a 61 6c 61 70 65 f1 6f
Let’s check the nature of C library that is been accessed. For many C libraries, it might make more sense to pass bytes instead of a string. Let’s use the conversion code given below to do so.
Code #6 :
If still desire to pass strings, it is to be taken care that Python3 uses an adaptable string representation that is not entirely straightforward to map directly to C libraries using the standard types
char * or
wchar_t *. Thus, in order to present string data to C, some kind of conversion is almost always necessary. The s# and u# format codes to
PyArg_ParseTuple() safely perform such conversions.
Whenever a conversion is made, a copy of the converted data is attached to the original string object so that it can be reused later as shown in the code below.
Code #7 :
Size : 87 53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f Size : 103 53 70 69 63 79 20 4a 61 6c 61 70 65 f1 6f Size : 163
- Passing NULL-Terminated Strings to C Libraries
- Unicodedata – Unicode Database in Python
- Python | Joining unicode list elements
- Best Python libraries for Machine Learning
- Finding Mean, Median, Mode in Python without libraries
- Passing function as an argument in Python
- Python | Passing Filenames to Extension in C
- Python | Add Logging to Python Libraries
- Python | Passing dictionary as keyword arguments
- Python | Remove empty strings from list of strings
- Python | Tokenizing strings in list of strings
- Python | Interleaving two strings
- C strings conversion to Python
- Python 3 Strings | expandtabs() method
- Python Strings decode() method
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.