char8_t Data Type in C++ 20
Last Updated :
15 May, 2023
The most recent version of the C++ programming language, C++20, was introduced in the year 2020. The char8_t data type is one of the new features added to C++20. This data type was created especially to display UTF-8 encoded characters. Let’s understand what char8_t is and how is it different from other character data types.
char8_t Data Type
In C++, characters were represented by the char data type in earlier versions of C++. The size of the char data type might change depending on the implementation, and it is not specifically intended to represent any particular character set. As a result, there might be some misunderstanding over the use of the char data type to represent UTF-8 encoded characters.
The char8_t data type is an 8-bit unsigned integer type that was created in C++20 to solve this problem. This data type was designed especially to display UTF-8 encoded characters to ensure that Unicode characters are handled correctly.
How char8_t data type is different from other character data types?
The fact that char8_t is deliberately designed to represent UTF-8 encoded characters distinguishes and it can handle a wider range of characters distinguishes it from other character data types like char and wchar_t. In contrast, the size of the char data type might change depending on the implementation and is not declared to represent any particular character set but char8_t is always an 8-bit unsigned integer type. Additionally implementation-defined, the size of wchar_t can differ based on the system and compiler being used.
Example 1: Using char8_t data type to store a UTF-8 encoded character.
C++
#include <iostream>
int main()
{
char8_t c = u8 'A' ;
printf ( "%c" , c);
return 0;
}
|
Output
A
In this example, we declare a char8_t variable named c and initialize it with the UTF-8 encoded character ‘A’. We then output the value of c to the console.
Example 2: Using char8_t data type to store a string.
C++
#include <iostream>
#include <string>
int main()
{
std::u8string str = u8 "Welcome to GeeksforGeeks !" ;
printf ( "%s" , str.data());
return 0;
}
|
Output
Welcome to GeeksforGeeks !
In this example, a string variable std::u8string str is initialized with the string “Hello, world!”. The value of str is then printed to the console.
Conclusion
In conclusion, C++20 now has a new data type called char8_t that is made exclusively to represent UTF-8 encoded characters. This data type differs from other character data types like char and wchar_t as it is always an 8-bit unsigned integer type. When working with UTF-8 encoded characters, developers may make sure their code can handle a wider range of characters and is unambiguous by utilizing the char8_t data type.
Share your thoughts in the comments
Please Login to comment...