Passing NULL-Terminated Strings to C Libraries

If one wants an extension module that needs to pass a NULL-terminated string to a C library. Let’s see how to do it with Python’s Unicode string implementation. C libraries has many functions that operate on NULL-terminated strings declared as type char *.

The code given below has C function that we will illustrate and test the problem. The C function (Code #1) simply prints the hex representation of individual characters so that the passed strings can be easily debugged.

Code #1 :



filter_none

edit
close

play_arrow

link
brightness_4
code

void print_chars(char *s)
{
    while (*s)
    {
        printf("%2x ", (unsigned char) *s);
        s++;
    }
    printf("\n");
}
  
print_chars("Hello");

chevron_right


Output :

48 65 6c 6c 6f

 
To call such C function from Python, there are few choices. First of it is that – it can be restricted to only operate on bytes using “y” conversion code to PyArg_ParseTuple() as shown in the code below.

Code #2 :

filter_none

edit
close

play_arrow

link
brightness_4
code

static PyObject * py_print_chars(PyObject * self, PyObject * args)
{
    char * s;
    if (! PyArg_ParseTuple(args, "y", &s))
    {
        return NULL;
    }
    print_chars(s);
    Py_RETURN_NONE;
}

chevron_right


 
Let’s see the how to resulting function operates and how bytes with embedded NULL bytes and Unicode strings are rejected.

Code #3 :

filter_none

edit
close

play_arrow

link
brightness_4
code

print (print_chars(b'Hello World'))
  
print ("\n", print_chars(b'Hello\x00World'))
  
print ("\n", print_chars('Hello World'))

chevron_right


Output :

48 65 6c 6c 6f 20 57 6f 72 6c 64

Traceback (most recent call last):
File "", line 1, in 
TypeError: must be bytes without null bytes, not bytes

Traceback (most recent call last):
File "", line 1, in 
TypeError: 'str' does not support the buffer interface

 
If you want to pass Unicode strings instead, use the “s” format code to PyArg_ParseTuple() as shown below.

Code #4 :

filter_none

edit
close

play_arrow

link
brightness_4
code

static PyObject *py_print_chars(PyObject *self, PyObject *args)
{
    char *s;
    if (!PyArg_ParseTuple(args, "s", &s))
    {
        return NULL;
    }
    print_chars(s);
    Py_RETURN_NONE;
}

chevron_right


 
Using above code (code #4) will automatically convert all strings to a NULL-terminated UTF-8 encoding. As shown in the code below.

Code #5 :


filter_none

edit
close

play_arrow

link
brightness_4
code

print (print_chars('Hello World'))
  
# UTF-8 encoding
print ("\n", print_chars('Spicy Jalape\u00f1o'))
   
print ("\n", print_chars('Hello\x00World'))
   
print ("\n", print_chars(b'Hello World'))

chevron_right


Output :

48 65 6c 6c 6f 20 57 6f 72 6c 64

53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f

Traceback (most recent call last):
File "", line 1, in 
TypeError: must be str without null characters, not str

Traceback (most recent call last):
File "", line 1, in 
TypeError: must be str, not bytes

 
If working with a PyObject * and can’t use PyArg_ParseTuple(), the code below explains how to check and extract a suitable char * reference, from both a bytes and string object.

Code #6 : Conversion from bytes

filter_none

edit
close

play_arrow

link
brightness_4
code

// Some Python Object
PyObject *obj;
  
// Conversion from bytes 
{
    char *s;
    s = PyBytes_AsString(o);
    if (!s)
    {
        /* TypeError already raised */
        return NULL; 
    }
    print_chars(s);
}

chevron_right


 
Code #7 : Conversion to UTF-8 bytes from a string

filter_none

edit
close

play_arrow

link
brightness_4
code

{
  
    PyObject *bytes;
    char *s;
  
    if (!PyUnicode_Check(obj))
    {
        PyErr_SetString(PyExc_TypeError, "Expected string");
        return NULL;
    }
  
    bytes = PyUnicode_AsUTF8String(obj);
    s = PyBytes_AsString(bytes);
    print_chars(s);
    Py_DECREF(bytes);
}

chevron_right


Both of the code conversions guarantee NULL-terminated data, but there is no check for embedded NULL bytes elsewhere inside the string. That needs to be check if it’s important.

Note : There is a hidden memory overhead associated with using the “s” format code to PyArg_ParseTuple() that is easy to overlook. When writing a code that uses this conversion, a UTF-8 string is created and gets permanently attached to the original string object which if contains non-ASCII characters, makes the size of the string increase until it is garbage collected.

Code #8 :

filter_none

edit
close

play_arrow

link
brightness_4
code

import sys
s = 'Spicy Jalape\u00f1o'
print ("Size : ", sys.getsizeof(s))
  
# passing string
print("\n", print_chars(s))
  
# increasing size
print ("\nSize : ", sys.getsizeof(s))

chevron_right


Output :

Size : 87

53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f

Size : 103


My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.




Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.