If one wants an extension module that needs to pass a NULL-terminated string to a C library. Let’s see how to do it with Python’s Unicode string implementation. C libraries has many functions that operate on NULL-terminated strings declared as
type char *.
The code given below has C function that we will illustrate and test the problem. The C function (Code #1) simply prints the hex representation of individual characters so that the passed strings can be easily debugged.
Code #1 :
48 65 6c 6c 6f
To call such C function from Python, there are few choices. First of it is that – it can be restricted to only operate on bytes using “y” conversion code to
PyArg_ParseTuple() as shown in the code below.
Code #2 :
Let’s see the how to resulting function operates and how bytes with embedded NULL bytes and Unicode strings are rejected.
Code #3 :
48 65 6c 6c 6f 20 57 6f 72 6c 64 Traceback (most recent call last): File "", line 1, in TypeError: must be bytes without null bytes, not bytes Traceback (most recent call last): File "", line 1, in TypeError: 'str' does not support the buffer interface
If you want to pass Unicode strings instead, use the “s” format code to
PyArg_ParseTuple() as shown below.
Code #4 :
Using above code (code #4) will automatically convert all strings to a NULL-terminated UTF-8 encoding. As shown in the code below.
Code #5 :
48 65 6c 6c 6f 20 57 6f 72 6c 64 53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f Traceback (most recent call last): File "", line 1, in TypeError: must be str without null characters, not str Traceback (most recent call last): File "", line 1, in TypeError: must be str, not bytes
If working with a
PyObject * and can’t use
PyArg_ParseTuple(), the code below explains how to check and extract a suitable
char * reference, from both a bytes and string object.
Code #6 : Conversion from bytes
Code #7 : Conversion to UTF-8 bytes from a string
Both of the code conversions guarantee NULL-terminated data, but there is no check for embedded NULL bytes elsewhere inside the string. That needs to be check if it’s important.
Note : There is a hidden memory overhead associated with using the “s” format code to
PyArg_ParseTuple() that is easy to overlook. When writing a code that uses this conversion, a UTF-8 string is created and gets permanently attached to the original string object which if contains non-ASCII characters, makes the size of the string increase until it is garbage collected.
Code #8 :
Size : 87 53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f Size : 103
- Unicode Strings Passing to C Libraries
- Best Python libraries for Machine Learning
- Finding Mean, Median, Mode in Python without libraries
- Passing function as an argument in Python
- Python | Passing Filenames to Extension in C
- Python | Passing dictionary as keyword arguments
- Python | Add Logging to Python Libraries
- Python | Remove empty strings from list of strings
- Python | Tokenizing strings in list of strings
- Python | Interleaving two strings
- C strings conversion to Python
- Python | How to sort a list of strings
- Python | Extract K sized strings
- Python Strings encode() method
- Python Strings decode() method
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.