Open In App

java.nio.charset.CharsetEncoder Class in Java

Last Updated : 26 Nov, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

For the purpose of character encoding and decoding, java offers a number of classes in the ‘java.nio.charset’ package. The ‘CharsetEncoder’ class of this package performs the important task of encoding. In this article, let us understand this class, its syntax, different methods, and some examples of error handling and optimization techniques.

What is a CharsetEncoder?

The ‘CharsetEncoder’ class is imported from ‘java.nio.charset’ package.

The basic function of the class is to use a certain character set or an encoding known as a Charset. It converts the character sequences into byte format. This class is commonly used for activities such as writing textual data to files, transmitting data over the network, and encoding/decoding data between different character encodings.

CharsetEncoder translates a character input to a byte output. The internal character representation of Java which is usually UTF-16, is encoded and converted into the byte representation of the chosen character encoding (eg. UTF-8, etc).

Syntax of CharsetEncoder

public abstract class CharsetEncoder extends Object

Constructors of CharsetEncoder

Constructor associated with CharsetEncoder and its description.

Constructor

Modifier

Description

CharsetEncoder(Charset cs, float averageBytesPerChar, float maxBytesPerChar)

protected

A new encoder for a given Charset is initialized with the maximum and average bytes per character specified by the CharsetEncoder constructor.

CharsetEncoder(Charset cs, float averageBytesPerChar, float maxBytesPerChar, byte[] replacement)

protected

A new encoder for a given Charset is initialized by the CharsetEncoder constructor with an estimated average and maximum number of bytes per character as well as a unique alternative byte sequence for characters that cannot be mapped.

Methods of CharsetEncoder

Table of the methods associated with CharsetEncoder and its description.

Modifier and Type

Method

Description

float

averageBytesPerChar()

Returns the average number of bytes that will be generated for every input character.

boolean

canEncode(char c)

Indicates if the specified character can be encoded by this encoder.

boolean

canEncode(CharSequence cs)

Indicates if the provided character sequence can be encoded by this encoder.

Charset

charset()

Returns the charset that created this encoder.

ByteBuffer

encode(CharBuffer in)

Encodes the remaining data from a single input character buffer into a newly-allocated byte buffer

CoderResult

encode(CharBuffer in, ByteBuffer out, boolean endOfInput)

Writes the results to the specified output buffer after encoding as many characters as possible from the provided input buffer.

protected abstract CoderResult

encodeLoop(CharBuffer in, ByteBuffer out)

Encodes one or more characters into one or more bytes.

CoderResult

flush(ByteBuffer out)

Flushes the encoder.

protected CoderResult

implFlush(ByteBuffer out)

Flushes the encoder.

protected void

implReset()

Clears any internal state specific to a given charset by resetting this encoder.

boolean

isLegalReplacement(byte[] repl)

Indicates if the provided byte array is a valid replacement value for this encoder.

float

maxBytesPerChar()

Returns the maximum number of bytes that can be generated for each input character.

CharsetEncoder

reset()

Resets the encoder, clearing any internal state.

byte[]

replacement()

Returns the replacement value for this encoder.

CharsetEncoder

replaceWith(byte[] newReplacement)

Modifies the replacement value of this encoder.

Inherited Methods

The Methods included with Charset class are inherited by java.lang.Object .

Examples of CharEncoder Class

Example 1: Basic use of CharsetEncoder

In this example, the input string is encoded into bytes using the CharsetEncoder with UTF-8 character encoding.

It covers on how to construct a CharsetEncoder, encode the characters, place the input text within a CharBuffer, then output the data that has been encoded. It has basic error handling to address any issues that may come up during the encoding process.

Java




// Java Program to construct a 
// CharsetEncoder using CharBuffer
import java.nio.*;
import java.nio.charset.*;
  
//Driver class
public class Main {
      
      // Main method
      public static void main(String[] args){
  
        // Create a Charset
        Charset ch = Charset.forName("UTF-8");
  
        // Initialize a CharsetEncoder
        CharsetEncoder ec = ch.newEncoder();
  
        // Input string
        String str = "CharsetEncoder Example";
  
        // Wrap the input text in a CharBuffer
        CharBuffer charBuffer = CharBuffer.wrap(str);
  
        try {
            // Encode the characters
            ByteBuffer bf = ec.encode(charBuffer);
  
            // Print the encoded data
            String ans = new String(bf.array());
            System.out.println(ans);
        }
        catch (Exception e) {
            // Handle the exception
            e.printStackTrace();
        }
    }
}


Output:

CharsetEncoder Example

Example 2: Error Handling

The UTF-8 character encoding can encode only the characters that lie within the Unicode standard. There are some special characters or symbols that cannot be recognized by this encoding technique. In order to prevent problems, the errors need to be handled using some methods. In the below given example, we have given an input string which contains a special symbol ‘Ω’, that is not mappable using UTF-8. We use the ‘onUnmappableCharacter‘ and ‘CodingErrorAction.REPLACE‘ methods to replace these unmappable characters with any different character.

In the code below, whenever we encounter ‘Ω’, it is replaced by ‘?‘ which indicates that the special symbol is replaced with a fallback character for error handling.

Java




// Java Program for Error handling
// Using onUnmappableCharacter
import java.nio.*;
import java.nio.charset.*;
  
//Driver Class
public class Main {
      
      //Main method
      public static void main(String[] args){
        
        // Create a Charset
        Charset ch = Charset.forName("UTF-8");
  
        // Initialize a CharsetEncoder
        CharsetEncoder ec = ch.newEncoder();
  
        // Input string (with Ω as an unmappable character)
        String str = "Charset Ω Encoder";
  
        // Handle the error by replacing the unmappable
        // character with a question mark
        ec.onUnmappableCharacter(CodingErrorAction.REPLACE);
        ec.replaceWith("?".getBytes());
  
        // Wrap the string into a CharBuffer
        CharBuffer cb = CharBuffer.wrap(str);
  
        try {
            // Encode the characters
            ByteBuffer bf = ec.encode(cb);
  
            // Convert the ByteBuffer to a String
            String ans = new String(bf.array());
            System.out.println("Encoded String: " + ans);
        }
        catch (Exception e) {
            // Handle the exception
            System.err.println("Error: " + e.getMessage());
        }
        
    }
}


Output:

Encoded String: Charset ? Encoder

How to Optimize the Encoding?

Now that we have understood about the encoding operations with the help of CharsetEncoder class, it is important to know about how to improve the efficiency and performance when dealing with larger volumes of data.

  • Buffer Management: Using CharBuffer and ByteBuffer, we can manage the size of data as it avoid frequent reallocations. Set aside buffers that are just sufficient to contain expected data. We have discussed this method in the examples given above
  • Reuse Buffers: Instead of creating new instances of CharBuffer and ByteBuffer everytime, consider reusing them for each encoding and decoding operations. This will significantly reduce the memory allocation.
  • Bulk Encoding: Always use the encode() method with CharSequence or a CharBuffer that contains all the characters to be encoded or decoded. Using this, the number of encoding calls will be minimized making your program efficient.
  • Precompute Buffer Size: To prevent unnecessary resizing, allocate the ByteBuffer with the right size or a little bit more capacity if you know the approximate amount of the encoded data in bytes.

In this article, we covered all the methods and best practices related to the CharsetEncoder class. From syntax, constructors to error handling and optimization techniques, we explored how to utilize this class for character encoding tasks in Java applications.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads