Open In App

char8_t Data Type in C++ 20

Last Updated : 15 May, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

The most recent version of the C++ programming language, C++20, was introduced in the year 2020. The char8_t data type is one of the new features added to C++20. This data type was created especially to display UTF-8 encoded characters. Let’s understand what char8_t is and how is it different from other character data types.

char8_t Data Type

In C++, characters were represented by the char data type in earlier versions of C++. The size of the char data type might change depending on the implementation, and it is not specifically intended to represent any particular character set. As a result, there might be some misunderstanding over the use of the char data type to represent UTF-8 encoded characters.

The char8_t data type is an 8-bit unsigned integer type that was created in C++20 to solve this problem. This data type was designed especially to display UTF-8 encoded characters to ensure that Unicode characters are handled correctly.

How char8_t data type is different from other character data types?

The fact that char8_t is deliberately designed to represent UTF-8 encoded characters distinguishes and it can handle a wider range of characters distinguishes it from other character data types like char and wchar_t. In contrast, the size of the char data type might change depending on the implementation and is not declared to represent any particular character set but char8_t is always an 8-bit unsigned integer type. Additionally implementation-defined, the size of wchar_t can differ based on the system and compiler being used.

Example 1: Using char8_t data type to store a UTF-8 encoded character.

C++




// C++ Program to demonstrate Using char8_t data 
// type to store a UTF-8 encoded character.
#include <iostream>
  
int main()
{
    char8_t c = u8'A';
    printf("%c", c);
  
    return 0;
}


Output

A

In this example, we declare a char8_t variable named c and initialize it with the UTF-8 encoded character ‘A’. We then output the value of c to the console.

Example 2: Using char8_t data type to store a string.

C++




// C++ Program to demonstrate using
// char8_t data type to store a string
#include <iostream>
#include <string>
  
int main()
{
    std::u8string str = u8 "Welcome to GeeksforGeeks !";
    printf("%s", str.data());
  
    return 0;
}


Output

Welcome to GeeksforGeeks !

In this example, a string variable std::u8string str is initialized with the string “Hello, world!”. The value of str is then printed to the console.

Conclusion

In conclusion, C++20 now has a new data type called char8_t that is made exclusively to represent UTF-8 encoded characters. This data type differs from other character data types like char and wchar_t as it is always an 8-bit unsigned integer type. When working with UTF-8 encoded characters, developers may make sure their code can handle a wider range of characters and is unambiguous by utilizing the char8_t data type.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads