In the past, we only had one character set, and that was known as ASCII (American Standard Code for Information Interchange). There, we used 7 bits to represent 128 characters, including upper and lowercase English letters, digits, and a variety of punctuations and device-control characters. Due to this character limitation, the majority of the population is not able to use their custom writing systems. To solve this problem, Unicode was invented. Unicode is a superset of ASCII that contains all the characters present in today’s world writing system. It includes accents, diacritical marks, control codes like tab and carriage return, and assigns each character a standard number called “Unicode Code Point”, or in Go language, a “Rune”. The Rune type is an alias of int32. Important Points:
- Always remember, a string is a sequence of bytes and not of a Rune. A string may contain Unicode text encoded in UTF-8. But, the Go source code encodes as UTF-8, therefore, no need to encode the string in UTF-8.
- UTF-8 encodes all the Unicode in the range of 1 to 4 bytes, where 1 byte is used for ASCII and the rest for the Rune.
- ASCII contains a total of 256 elements and out of which, 128 are characters and 0-127 are identified as code points. Here, code point refers to the element which represents a single value.
Example:
♄
It is a Rune with hexadecimal value ♄.
Rune Literal
It represents a Rune constant, where an integer value recognizes a Unicode code point. In Go language, a Rune Literal is expressed as one or more characters enclosed in single quotes like ‘g’, ‘\t’, etc. In between single quotes, you are allowed to place any character except a newline and an unescaped single quote. Here, these single-quoted characters themselves represent the Unicode value of the given character and multi-character sequences with a backslash (at the beginning of the multi-character sequence) encode values in a different format. In Rune Literals, all the sequences that start with a backslash are illegal, only the following single-character escapes represent special values when you use them with a backslash:
Character |
Unicode |
Description |
\a |
U+0007 |
Alert or Bell |
\b |
U+0008 |
backspace |
\f |
U+000C |
form feed |
\n |
U+000A |
line feed or newline |
\r |
U+000D |
carriage return |
\t |
U+0009 |
horizontal tab |
\v |
U+000b |
vertical tab |
\\ |
U+005c |
backslash |
\’ |
U+0027 |
single-quote |
\” |
U+0022 |
double quote(legal only in string literals) |
Example 1:
C
package main
import (
"fmt"
"reflect"
)
func main() {
rune1 := 'B'
rune2 := 'g'
rune3 := '\a'
fmt.Printf("Rune 1: %c; Unicode: %U; Type: %s", rune1,
rune1, reflect.TypeOf(rune1))
fmt.Printf("\nRune 2: %c; Unicode: %U; Type: %s", rune2,
rune2, reflect.TypeOf(rune2))
fmt.Printf("\nRune 3: Unicode: %U; Type: %s", rune3,
reflect.TypeOf(rune3))
}
|
Output:
Rune 1: B; Unicode: U+0042; Type: int32
Rune 2: g; Unicode: U+0067; Type: int32
Rune 3: Unicode: U+0007; Type: int32
Example 2: Output:
Character: â™›, Unicode:U+265B, Position:0
Character: â™ , Unicode:U+2660, Position:1
Character: ♧, Unicode:U+2667, Position:2
Character: ♡, Unicode:U+2661, Position:3
Character: ♬, Unicode:U+266C, Position:4
Last Updated :
11 Apr, 2022
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...