mbrtowc() function in C/C++

Last Updated : 10 May, 2019

The mbrtowc() function in C/C++ converts multibyte sequence to wide characters. This function returns the length in bytes of a multibyte character. The multibyte character pointed by s is converted to a value of type wchar_t and stored at the location pointed by pwc. If s points to a null character, the function resets the shift state and returns zero after storing the wide null character at pwc.

Syntax:

size_t mbrtowc (wchar_t* pwc, const char* pmb, size_t max, mbstate_t* ps)

Parameter: The function accepts four parameters as described below:

pwc : pointer to the location where the resulting wide character will be written
s : pointer to the multibyte character string used as input
n : limit on the number of bytes in s that can be examined
ps : pointer to the conversion state used when interpreting the multibyte string

Return value: The function returns four value as follows:

If, null wide character or if pmb is a null pointer, the function returns 0
the number of bytes [1…n] of the multibyte character successfully converted from s
If the max first characters of pmb form an incomplete multibyte character, the function returns length-2
Otherwise, function returns length-1 and it sets errno to EILSEQ

Note: None of the values possibly returned is less than zero.

Below programs illustrate the above function:
Program 1:

// C++ program to illustrate 
// mbrtowc() function 
#include <bits/stdc++.h> 
using namespace std; 
  
// Function to convert multibyte 
// sequence to wide character 
void print_(const char* s) 
{ 
    // initial state 
    mbstate_t ps = mbstate_t(); 
  
    // length of the string 
    int length = strlen(s); 
  
    const char* n = s + length; 
    int len; 
    wchar_t pwc; 
  
    // printing each bytes 
    while ((len = mbrtowc(&pwc, s, n - s, &ps)) > 0) { 
        wcout << "Next " << len <<  
        " bytes are the character " << pwc << '\n'; 
        s += len; 
    } 
} 
  
// Driver code 
int main() 
{ 
    setlocale(LC_ALL, "en_US.utf8"); 
  
    // UTF-8 narrow multibyte encoding 
    const char* str = u8"z\u00df\u6c34\U0001d10b"; 
  
    print_(str); 
} 

Output:

Next 1 bytes are the character z
Next 2 bytes are the character Ã?
Next 3 bytes are the character æ°´
Next 4 bytes are the character ð??

Program 2:

// C++ program to illustrate 
// mbrtowc() function 
// with different UTF-8 characters 
#include <bits/stdc++.h> 
using namespace std; 
  
// Function to convert multibyte 
// sequence to wide character 
void print_(const char* s) 
{ 
    // initial state 
    mbstate_t ps = mbstate_t(); 
  
    // length of the string 
    int length = strlen(s); 
  
    const char* n = s + length; 
    int len; 
    wchar_t pwc; 
  
    // printing each bytes 
    while ((len = mbrtowc(&pwc, s, n - s, &ps)) > 0) { 
        wcout << "Next " << len <<  
        " bytes are the character " << pwc << '\n'; 
        s += len; 
    } 
} 
  
// Driver code 
int main() 
{ 
    setlocale(LC_ALL, "en_US.utf8"); 
  
    // UTF-8 narrow multibyte encoding 
    const char* str = u8"\xE2\x88\x83y\xE2\x88\x80x\xC2\xAC"; 
  
    print_(str); 
} 

Output:

Next 3 bytes are the character â??
Next 1 bytes are the character y
Next 3 bytes are the character â??
Next 1 bytes are the character x
Next 2 bytes are the character Â¬

Suggest improvement

iswpunct() function in C/C++

How to modify a const variable in C?

Share your thoughts in the comments

mbrtowc() function in C/C++

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?