Passing NULL-Terminated Strings to C Libraries

Last Updated : 29 Mar, 2019

If one wants an extension module that needs to pass a NULL-terminated string to a C library. Let’s see how to do it with Python’s Unicode string implementation. C libraries has many functions that operate on NULL-terminated strings declared as type char *.

The code given below has C function that we will illustrate and test the problem. The C function (Code #1) simply prints the hex representation of individual characters so that the passed strings can be easily debugged.

Code #1 :

void print_chars(char *s) 
{ 
    while (*s) 
    { 
        printf("%2x ", (unsigned char) *s); 
        s++; 
    } 
    printf("\n"); 
} 
  
print_chars("Hello"); 

Output :

48 65 6c 6c 6f

To call such C function from Python, there are few choices. First of it is that – it can be restricted to only operate on bytes using “y” conversion code to PyArg_ParseTuple() as shown in the code below.

Code #2 :

static PyObject * py_print_chars(PyObject * self, PyObject * args) 
{ 
    char * s; 
    if (! PyArg_ParseTuple(args, "y", &s)) 
    { 
        return NULL; 
    } 
    print_chars(s); 
    Py_RETURN_NONE; 
} 

Let’s see the how to resulting function operates and how bytes with embedded NULL bytes and Unicode strings are rejected.

Code #3 :

print (print_chars(b'Hello World')) 
  
print ("\n", print_chars(b'Hello\x00World')) 
  
print ("\n", print_chars('Hello World')) 

Output :

48 65 6c 6c 6f 20 57 6f 72 6c 64

Traceback (most recent call last):
File "", line 1, in 
TypeError: must be bytes without null bytes, not bytes

Traceback (most recent call last):
File "", line 1, in 
TypeError: 'str' does not support the buffer interface

If you want to pass Unicode strings instead, use the “s” format code to PyArg_ParseTuple() as shown below.

Code #4 :

static PyObject *py_print_chars(PyObject *self, PyObject *args) 
{ 
    char *s; 
    if (!PyArg_ParseTuple(args, "s", &s)) 
    { 
        return NULL; 
    } 
    print_chars(s); 
    Py_RETURN_NONE; 
} 

Using above code (code #4) will automatically convert all strings to a NULL-terminated UTF-8 encoding. As shown in the code below.

Code #5 :

print (print_chars('Hello World')) 
  
# UTF-8 encoding 
print ("\n", print_chars('Spicy Jalape\u00f1o')) 
   
print ("\n", print_chars('Hello\x00World')) 
   
print ("\n", print_chars(b'Hello World')) 

Output :

48 65 6c 6c 6f 20 57 6f 72 6c 64

53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f

Traceback (most recent call last):
File "", line 1, in 
TypeError: must be str without null characters, not str

Traceback (most recent call last):
File "", line 1, in 
TypeError: must be str, not bytes

If working with a PyObject * and can’t use PyArg_ParseTuple(), the code below explains how to check and extract a suitable char * reference, from both a bytes and string object.

Code #6 : Conversion from bytes

// Some Python Object 
PyObject *obj; 
  
// Conversion from bytes  
{ 
    char *s; 
    s = PyBytes_AsString(o); 
    if (!s) 
    { 
        /* TypeError already raised */
        return NULL;  
    } 
    print_chars(s); 
} 

Code #7 : Conversion to UTF-8 bytes from a string

{ 
  
    PyObject *bytes; 
    char *s; 
  
    if (!PyUnicode_Check(obj)) 
    { 
        PyErr_SetString(PyExc_TypeError, "Expected string"); 
        return NULL; 
    } 
  
    bytes = PyUnicode_AsUTF8String(obj); 
    s = PyBytes_AsString(bytes); 
    print_chars(s); 
    Py_DECREF(bytes); 
} 

Both of the code conversions guarantee NULL-terminated data, but there is no check for embedded NULL bytes elsewhere inside the string. That needs to be check if it’s important.

Note : There is a hidden memory overhead associated with using the “s” format code to PyArg_ParseTuple() that is easy to overlook. When writing a code that uses this conversion, a UTF-8 string is created and gets permanently attached to the original string object which if contains non-ASCII characters, makes the size of the string increase until it is garbage collected.

Code #8 :

import sys 
s = 'Spicy Jalape\u00f1o'
print ("Size : ", sys.getsizeof(s)) 
  
# passing string 
print("\n", print_chars(s)) 
  
# increasing size 
print ("\nSize : ", sys.getsizeof(s)) 

Output :

Size : 87

53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f

Size : 103

Suggest improvement

Unicode Strings Passing to C Libraries

Share your thoughts in the comments

Passing NULL-Terminated Strings to C Libraries

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?