Compact Strings in Java 9 with Examples

Prerequisites: String

Compact String is one of the performance enhancements introduced in the JVM as part of JDK 9. Till JDK 8, whenever we create one String object then internally it is represented as char[], which consist the characters of the String object.

What is the need of Compact String?

  • Till JDK 8, Java represent String object as char[] because every character in java is of 2 bytes because Java internally uses UTF-16.
  • If any String contains a word in the English language then the character can be represented using a single byte only, we don’t need 2 bytes for each character. Many characters require 2 bytes to represent them but most of the characters require only 1 byte, which falls under LATIN-1 character set. So, there is a scope to improve memory consumption and performance.
  • Java 9 introduced the concept of compact Strings. The main purpose of the compact string is whenever we create a string object and the characters inside the object can be represented using 1 byte, which is nothing but LATIN-1 representation, then internally java will create one byte[]. In other cases, if any character requires more than 1 byte to represent it then each character is stored using 2 bytes i.e. UTF-16 representation.
  • Thats how Java developers changed the internal implementation of String i.e. known as Compact String, which will improve the memory consumption and performance of String.

String class internal implementation before Java 9:

Java 8 or before

filter_none

edit
close

play_arrow

link
brightness_4
code

import java.io.Serializable;
  
public final class String
    implements Serializable,
               Comparable<String>,
               CharSequence {
  
    // The value is used
    // for character storage.
    private final char value[];
}

chevron_right


Note: In the above program, we can see that before Java 9, Java represent String object as a char[] only. Suppose we create one String object and object contains the characters which can be represented using 1 byte. Instead of representing the object as byte[] it will create char[] only, which will consume more memory.



JDK developers analyzed that most of the strings can be represented only using Latin-1 characters set. A Latin-1 char can be stored in one byte, which is exactly half of the size of char. This will improve the performance of String.

String class internal implementation from Java 9

Java 9 and after

filter_none

edit
close

play_arrow

link
brightness_4
code

import java.io.Serializable;
  
public final class String
    implements Serializable,
               Comparable<String>,
               CharSequence {
  
    private final byte[] value;
  
    private final byte coder;
}

chevron_right


Note: Now the question is how will it distinguish between the LATIN-1 and UTF-16 representations? Java developers introduced one final byte variable coder that preserves the information about characters representation. The value of coder value can be:

static final byte LATIN1 = 0;
static final byte UTF16 = 1;

Thus, the new String implementation known as Compact String in Java 9 is better than String before Java 9 in terms of performance because Compact String uses approximately the half area as compared with String in the heap from JDK 9.

Let’s see the difference of the memory used by a String object before Java 9 and from Java 9:

filter_none

edit
close

play_arrow

link
brightness_4
code

// Program to illustrate the memory
// used by String before Java 9
  
public class Geeks {
    public static void main(String[] args)
    {
        String s
            = new String("Geeksforgeeks");
    }
}

chevron_right


Key points to note when we are running on Java 8 or earlier:

  • Here, we created a String object with 13 characters and characters inside the object can be represented using 1 byte, which is nothing but LATIN-1 representation.
  • If we run the above program with JDK version 8 or earlier then As JDK 8 uses UTF-16 as default, Internally String will be represented as char[].
  • Here we don’t need char[], we can represent each character with 1 byte only. Instead of creating byte[], char[] will be created and for each character, 2 bytes are assigned in the heap memory. This is nothing but wastage of heap memory.
filter_none

edit
close

play_arrow

link
brightness_4
code

// Program to illustrate the memory
// used by String from Java 9
  
public class Geeks {
    public static void main(String[] args)
    {
  
        String s1 = new String("Geeksforgeeks");
        String s2 = new String("Geeksforgeeks€");
    }
}

chevron_right


Key points to note when we are running on Java 9:

  • From Java 9 as per need char[] or byte[] will be created for String objects. Here as we can see we created String object s1 with 13 characters and object s2 with 14 characters.
  • Each character present inside object s1 can be represented using 1 byte only. That’s why for object s1, one byte[] will be created.
  • Now for s2, we have one additional character apart from the characters present in object s1 i.e. €. We cant represent € character using LATIN-1 character set. Here we need 2 bytes to represent €. That’s why here Java will use UTF-16 to present the characters represent inside s2.
  • For object s2, Internally char[] will be created.
  • This is how the new String implementation known as Compact String in Java 9 is better than String before Java 9 in terms of memory consumption and performance.

Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with the DSA Self Paced Course at a student-friendly price and become industry ready.




My Personal Notes arrow_drop_up


If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.