Skip to content
Related Articles

Related Articles

Improve Article
Save Article
Like Article

Passing NULL-Terminated Strings to C Libraries

  • Last Updated : 29 Mar, 2019

If one wants an extension module that needs to pass a NULL-terminated string to a C library. Let’s see how to do it with Python’s Unicode string implementation. C libraries has many functions that operate on NULL-terminated strings declared as type char *.

The code given below has C function that we will illustrate and test the problem. The C function (Code #1) simply prints the hex representation of individual characters so that the passed strings can be easily debugged.

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course

Code #1 :






void print_chars(char *s)
{
    while (*s)
    {
        printf("%2x ", (unsigned char) *s);
        s++;
    }
    printf("\n");
}
  
print_chars("Hello");

Output :

48 65 6c 6c 6f

 
To call such C function from Python, there are few choices. First of it is that – it can be restricted to only operate on bytes using “y” conversion code to PyArg_ParseTuple() as shown in the code below.

Code #2 :




static PyObject * py_print_chars(PyObject * self, PyObject * args)
{
    char * s;
    if (! PyArg_ParseTuple(args, "y", &s))
    {
        return NULL;
    }
    print_chars(s);
    Py_RETURN_NONE;
}

 
Let’s see the how to resulting function operates and how bytes with embedded NULL bytes and Unicode strings are rejected.

Code #3 :




print (print_chars(b'Hello World'))
  
print ("\n", print_chars(b'Hello\x00World'))
  
print ("\n", print_chars('Hello World'))

Output :

48 65 6c 6c 6f 20 57 6f 72 6c 64

Traceback (most recent call last):
File "", line 1, in 
TypeError: must be bytes without null bytes, not bytes

Traceback (most recent call last):
File "", line 1, in 
TypeError: 'str' does not support the buffer interface

 
If you want to pass Unicode strings instead, use the “s” format code to PyArg_ParseTuple() as shown below.

Code #4 :






static PyObject *py_print_chars(PyObject *self, PyObject *args)
{
    char *s;
    if (!PyArg_ParseTuple(args, "s", &s))
    {
        return NULL;
    }
    print_chars(s);
    Py_RETURN_NONE;
}

 
Using above code (code #4) will automatically convert all strings to a NULL-terminated UTF-8 encoding. As shown in the code below.

Code #5 :




print (print_chars('Hello World'))
  
# UTF-8 encoding
print ("\n", print_chars('Spicy Jalape\u00f1o'))
   
print ("\n", print_chars('Hello\x00World'))
   
print ("\n", print_chars(b'Hello World'))

Output :

48 65 6c 6c 6f 20 57 6f 72 6c 64

53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f

Traceback (most recent call last):
File "", line 1, in 
TypeError: must be str without null characters, not str

Traceback (most recent call last):
File "", line 1, in 
TypeError: must be str, not bytes

 
If working with a PyObject * and can’t use PyArg_ParseTuple(), the code below explains how to check and extract a suitable char * reference, from both a bytes and string object.

Code #6 : Conversion from bytes




// Some Python Object
PyObject *obj;
  
// Conversion from bytes 
{
    char *s;
    s = PyBytes_AsString(o);
    if (!s)
    {
        /* TypeError already raised */
        return NULL; 
    }
    print_chars(s);
}

 
Code #7 : Conversion to UTF-8 bytes from a string




{
  
    PyObject *bytes;
    char *s;
  
    if (!PyUnicode_Check(obj))
    {
        PyErr_SetString(PyExc_TypeError, "Expected string");
        return NULL;
    }
  
    bytes = PyUnicode_AsUTF8String(obj);
    s = PyBytes_AsString(bytes);
    print_chars(s);
    Py_DECREF(bytes);
}

Both of the code conversions guarantee NULL-terminated data, but there is no check for embedded NULL bytes elsewhere inside the string. That needs to be check if it’s important.

Note : There is a hidden memory overhead associated with using the “s” format code to PyArg_ParseTuple() that is easy to overlook. When writing a code that uses this conversion, a UTF-8 string is created and gets permanently attached to the original string object which if contains non-ASCII characters, makes the size of the string increase until it is garbage collected.

Code #8 :




import sys
s = 'Spicy Jalape\u00f1o'
print ("Size : ", sys.getsizeof(s))
  
# passing string
print("\n", print_chars(s))
  
# increasing size
print ("\nSize : ", sys.getsizeof(s))

Output :

Size : 87

53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f

Size : 103



My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!