Passing NULL-Terminated Strings to C Libraries
If one wants an extension module that needs to pass a NULL-terminated string to a C library. Let’s see how to do it with Python’s Unicode string implementation. C libraries has many functions that operate on NULL-terminated strings declared as
type char *.
The code given below has C function that we will illustrate and test the problem. The C function (Code #1) simply prints the hex representation of individual characters so that the passed strings can be easily debugged.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course
Code #1 :
48 65 6c 6c 6f
To call such C function from Python, there are few choices. First of it is that – it can be restricted to only operate on bytes using “y” conversion code to
PyArg_ParseTuple() as shown in the code below.
Code #2 :
Let’s see the how to resulting function operates and how bytes with embedded NULL bytes and Unicode strings are rejected.
Code #3 :
48 65 6c 6c 6f 20 57 6f 72 6c 64 Traceback (most recent call last): File "", line 1, in TypeError: must be bytes without null bytes, not bytes Traceback (most recent call last): File "", line 1, in TypeError: 'str' does not support the buffer interface
If you want to pass Unicode strings instead, use the “s” format code to
PyArg_ParseTuple() as shown below.
Code #4 :
Using above code (code #4) will automatically convert all strings to a NULL-terminated UTF-8 encoding. As shown in the code below.
Code #5 :
48 65 6c 6c 6f 20 57 6f 72 6c 64 53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f Traceback (most recent call last): File "", line 1, in TypeError: must be str without null characters, not str Traceback (most recent call last): File "", line 1, in TypeError: must be str, not bytes
If working with a
PyObject * and can’t use
PyArg_ParseTuple(), the code below explains how to check and extract a suitable
char * reference, from both a bytes and string object.
Code #6 : Conversion from bytes
Code #7 : Conversion to UTF-8 bytes from a string
Both of the code conversions guarantee NULL-terminated data, but there is no check for embedded NULL bytes elsewhere inside the string. That needs to be check if it’s important.
Note : There is a hidden memory overhead associated with using the “s” format code to
PyArg_ParseTuple() that is easy to overlook. When writing a code that uses this conversion, a UTF-8 string is created and gets permanently attached to the original string object which if contains non-ASCII characters, makes the size of the string increase until it is garbage collected.
Code #8 :
Size : 87 53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f Size : 103