One can convert strings between C and Python vice-versa but the C encoding is of a doubtful or unknown nature. Let’s suppose that a given C data is supposed to be UTF-8, but it’s not being strictly enforced. So, it is important to handle such kind of malformed data so that it doesn’t crash Python or destroy the string data in the process.
In the code above, the string sdata contains a mix of malformed data and UTF-8. Nevertheless, if a user calls
print_chars(sdata, slen) in C, it works fine.
Now suppose one wants to convert the contents of sdata into a Python string, further passing that string to the
print_chars() function through an extension. The code given below shows the way that exactly preserves the original data even though there are encoding problems.
Code#3 : Using the above code 2
'Spicy Jalapeño\udcae' 53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f ae
Here, one can see that the malformed string got encoded into a Python string without errors and that when passed back into C, it turned back into a byte string that exactly encoded the same bytes as the original C string.
- Python | C Strings of Doubtful Encoding | Set-2
- Encoding and Decoding Base64 Strings in Python
- Run Length Encoding in Python
- ML | Label Encoding of datasets in Python
- ML | One Hot Encoding of datasets in Python
- Python | Character Encoding
- Python | Encoding Decoding using Matrix
- Python - Golomb Encoding for b=2n and b!=2n
- response.encoding - Python requests
- Elias Gamma Encoding in Python
- Encoding and Decoding Custom Objects in Python-JSON
- Mean Encoding - Machine Learning
- Feature Encoding Techniques - Machine Learning
- One Hot Encoding using Tensorflow
- Python | Remove empty strings from list of strings
- Python | Tokenizing strings in list of strings
- Interesting facts about strings in Python | Set 1
- Interesting facts about strings in Python | Set 2 (Slicing)
- Python | Set 3 (Strings, Lists, Tuples, Iterations)
- Python | Converting all strings in list to integers