Open In App

Convert Unicode String to a Byte String in Python

Last Updated : 30 Jan, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Python is a versatile programming language known for its simplicity and readability. Unicode support is a crucial aspect of Python, allowing developers to handle characters from various scripts and languages. However, there are instances where you might need to convert a Unicode string to a regular string. In this article, we will explore five different methods to achieve this in Python.

Convert A Unicode String to a Byte String In Python

Below, are the ways to convert a Unicode String to a Byte String In Python.

  • Using encode() with UTF-8
  • Using encode() with a Different Encoding
  • Using bytes() Constructor
  • Using str.encode() Method

Convert A Unicode to a Byte String Using encode() with UTF-8

In this example, a Unicode string containing English and Chinese characters is encoded to a byte string using UTF-8 encoding. The resulting `bytes_representation` is printed, demonstrating the transformation of the mixed-language Unicode string into its byte representation suitable for storage or transmission in a UTF-8 encoded format.

Python3




unicode_string = "Hello, 你好"
  
bytes_representation = unicode_string.encode('utf-8')
  
print(bytes_representation)


Output

b'Hello, \xe4\xbd\xa0\xe5\xa5\xbd'

Unicode String To Byte String Using encode() with a Different Encoding

In this example, the Unicode string is encoded into a byte string using UTF-16 encoding, resulting in a sequence of bytes that represents the mixed-language string. The byte string is then printed to demonstrate the UTF-16 encoded representation of the Unicode characters.

Python3




unicode_string = "Hello, 你好"
  
byte_string_utf16 = unicode_string.encode('utf-16')
  
# Displaying the byte string
print(byte_string_utf16)


Output

b'\xff\xfeH\x00e\x00l\x00l\x00o\x00,\x00 \x00`O}Y'

Convert A Unicode String To Byte String Using bytes() Constructor

In this example, the Unicode string is converted to a byte string using the bytes() constructor with UTF-8 encoding. The resulting `byte_string_bytes` represents the UTF-8 encoded byte sequence of the mixed-language Unicode string.

Python3




unicode_string = "Hello, 你好"
  
byte_string_bytes = bytes(unicode_string, 'utf-8')
  
# Displaying the byte string
print(byte_string_bytes)


Output

b'Hello, \xe4\xbd\xa0\xe5\xa5\xbd'

Python Unicode String To Byte String Using str.encode() Method

In this example, the Unicode string is transformed into a byte string using the str.encode() method with UTF-8 encoding. The resulting `byte_string_str_encode` represents the UTF-8 encoded byte sequence of the mixed-language Unicode string.

Python3




unicode_string = "Hello, 你好"
  
byte_string_str_encode = str.encode(unicode_string, 'utf-8')
  
# Displaying the byte string
print(byte_string_str_encode)


Output

b'Hello, \xe4\xbd\xa0\xe5\xa5\xbd'



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads