In the past, we only deal with one character set that is known as ASCII or the American Standard Code for Information Interchange. Here we use 7 bits to represent 128 characters, including upper and lowercase English letters, digits, and a variety of punctuations and device-control characters. Due to this a large number of the population of the world is not able to use their own writing system on the computer. So to solve this problem Unicode is invented. It is a superset of ASCII and contains all the characters present in the world’s writing system including accents and other diacritical marks, control codes like tab and carriage return, and assigns each one a standard number called a Unicode code point, or in Go language, a rune. The rune type is an alias of int32.
Important Points:
- Always remember a string is a sequence of bytes not of a rune. But it is possible that a string may contain Unicode text encoded in UTF-8 and as we knew that the go source code in always encodes as UTF-8 so, there is no need to encode the string in UTF-8.
- UTF-8 encodes all the Unicode in between 1 to 4 bytes, where 1 byte is used for ASCII and rest used for the rune.
- ASCII contains total 256 elements. In which 128 are characters and 0-127 are identified as code points. Here code point refers to the element which represents a single value.
Example:
♄
It is a rune with hexadecimal value ♄.
Rune Literal
It represents a rune constant where an integer value recognizes a Unicode code point. In Go language, rune literal expressed as one or more characters enclosed in single quotes like ‘g’, ‘\t’, etc. in between single quotes you are allowed place any character except a newline and an unescaped single quote. Here, these single-quoted characters itself represent the Unicode value of the given character and multi-character sequences with a backslash( at the beginning of the multi-character sequence) encodes values in a different format. In rune literals, all the sequences start with a backslash are illegal, only the following single-character escapes represent special values when you use them with a backslash:
Character | Unicode | Description |
---|---|---|
\a | U+0007 | Alert or Bell |
\b | U+0008 | backspace |
\f | U+000C | form feed |
\n | U+000A | line feed or newline |
\r | U+000D | carriage return |
\t | U+0009 | horizontal tab |
\v | U+000b | vertical tab |
\\ | U+005c | backslash |
\’ | U+0027 | single quote |
\” | U+0022 | double quote(legal only in string literals) |
Example 1:
// Simple Go program to illustrate // how to create a rune package main import ( "fmt" "reflect" ) func main() { // Creating a rune rune1 := 'B' rune2 := 'g' rune3 := '\a' // Displaying rune and its type fmt.Printf( "Rune 1: %c; Unicode: %U; Type: %s" , rune1, rune1, reflect.TypeOf(rune1)) fmt.Printf( "\nRune 2: %c; Unicode: %U; Type: %s" , rune2, rune2, reflect.TypeOf(rune2)) fmt.Printf( "\nRune 3: Unicode: %U; Type: %s" , rune3, reflect.TypeOf(rune3)) } |
Output:
Rune 1: B; Unicode: U+0042; Type: int32 Rune 2: g; Unicode: U+0067; Type: int32 Rune 3: Unicode: U+0007; Type: int32
Example 2:
Output:
Character: ♛, Unicode:U+265B, Position:0 Character: ♠, Unicode:U+2660, Position:1 Character: ♧, Unicode:U+2667, Position:2 Character: ♡, Unicode:U+2661, Position:3 Character: ♬, Unicode:U+266C, Position:4
Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with the DSA Self Paced Course at a student-friendly price and become industry ready.