Open In App

Character Literal in C++ 17: Unicode and UTF-8 Prefix

Last Updated : 15 May, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

C++17 programming introduces a foundational category known as character literals, which serve the purpose of embodying a single character. Quotation marks are employed to define them, such as ‘a’, ‘z’, or ‘0’. But in previous versions of C++, the available selection for character literals was confined to a comparatively minor pool of ASCII characters. The incorporation of C++17 has broadened the spectrum of characters that a character literally can depict, encompassing all the available Unicode characters. Unicode is a standard for the representation of characters in computer systems, which includes characters from most of the writing systems of the world, along with many other symbols.

How to use Unicode?

In order to denote a Unicode character through a character literal, one can employ the escape sequence \u connected with the hexadecimal representation of the character code. An instance of this would be the use of the character literal ‘\u03C0’ to represent the Unicode character U 03C0 which is the Greek alphabet letter pi. To depict characters outside of the basic multilingual plane (BMP), like the Unicode character U 1F600, you can employ the escape sequence \U and then add the hexadecimal representation of the character code. The character is literal ‘\U0001F600’ signifies a grinning face.

Syntax:

'character'

where the character is a single character enclosed in single quotes. The character can be an ASCII character or an escape sequence that represents a special character, such as \n for newline or \t for tab.

Examples of using Character Literal in C++

Example 1: 

C++14




// C++ program the character literals with Unicode and the
// UTF-8 prefix in C++17
#include <iostream>
  
int main()
{
    // Greek letter pi
    std::cout << "Unicode: " << '\u03C0' << '\n';
    // Grinning face
    std::cout << "Beyond BMP: " << '\U0001F600' << '\n';
    // Greek letter lambda
    std::cout << "UTF-8: " << u8 "\u03BB" << '\n';
    return 0;
}


Output:

Unicode: 53120
Beyond BMP: -257976192
UTF-8: λ

Explanation: 

The enhanced range of characters represented by character literals in the C++17 language provides greater elasticity in representing characters within your code. This feature offers immense benefits when dealing with localization and internationalization or manipulating character data exceeding the basic ASCII set.

Example 2: 

C++




// C++ Program to demonstrate the character literals with
// Unicode and the UTF-8 prefix in C++17
#include <iostream>
  
int main()
{
    char basicChar = 'a';
    wchar_t wideChar = L'\u00E9';
    char32_t utf8Char = U '\u03A9';
    char32_t utf16Char = U '\U0001F60A';
    char32_t utf32Char = U '\U0001F609';
  
    std::cout << "Basic character: " << basicChar
              << std::endl;
    std::wcout << "Wide character: " << wideChar
               << std::endl;
    std::cout << "UTF-8 character: " << utf8Char
              << std::endl;
    std::wcout << "UTF-16 character: " << utf16Char
               << std::endl;
    std::wcout << "UTF-32 character: " << utf32Char
               << std::endl;
  
    return 0;
}


Output:

Basic character: a
Wide character: �
UTF-8 character: 937
UTF-16 character: 128522
UTF-32 character: 128521

Explanation: 

In this code, the utf8Char variable is declared as char32_t, which can hold Unicode code points up to 0x7FFFFFFF. Similarly, utf16Char is declared as char32_t, and utf32Char is declared as char32_t to match the corresponding Unicode encodings.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads