Open In App

C strings conversion to Python

Improve
Improve
Like Article
Like
Save
Share
Report

For C strings represented as a pair char *, int, it is to decide whether or not – the string presented as a raw byte string or as a Unicode string.

Byte objects can be built using Py_BuildValue() as




// Pointer to C string data
char *s; 
  
// Length of data 
int len; 
  
// Make a bytes object
PyObject *obj = Py_BuildValue("y#", s, len);


 
To create a Unicode string and is it is known that s points to data encoded as UTF-8, the code given below can be used as –




PyObject *obj = Py_BuildValue("s#", s, len);


 
If s is encoded in some other known encoding, a string using PyUnicode_Decode() can be made as:




PyObject *obj = PyUnicode_Decode(s, len, "encoding", "errors");
  
// Example
obj = PyUnicode_Decode(s, len, "latin-1", "strict");
obj = PyUnicode_Decode(s, len, "ascii", "ignore");


 
If a wide string needs to be represented as wchar_t *, len pair. Then are few options as shown below –




// Wide character string
wchar_t *w;
  
// Length
int len; 
  
// Option 1 - use Py_BuildValue()
PyObject *obj = Py_BuildValue("u#", w, len);
  
// Option 2 - use PyUnicode_FromWideChar()
PyObject *obj = PyUnicode_FromWideChar(w, len);


  • The data from C must be explicitly decoded into a string according to some codec
  • Common encodings include ASCII, Latin-1, and UTF-8.
  • If you’re encoding is not known, then it is best off to encode the string as bytes instead.
  • Python always copies the string data (being provided) when making an object.
  • Also, for better reliability, strings should be created using both a pointer and a size rather than relying on NULL-terminated data.


Last Updated : 02 Apr, 2019
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads