Character encoding is a method of defining a mapping between bytes and text. To display an HTML document correctly, we must choose a proper character encoding.
The different types of character encoding include:
- ASCII Character Set: It is the first ever character encoding standard. The major disadvantage with ASCII is that it contained only a limited range of characters (128 characters).
- ANSI Character Set: This standard was an extended version of standard ASCII character set. It supports 256 characters.
- ISO-8859-1 Character Set: It is the default character encoding in HTML 2.0. It is also an extension of ASCII standard with International characters. This used full bytes (8-bits) to show characters.
- UTF-8 Character Set: This standard covers almost all of the characters and symbols in the world. The limitations of ANSI and ISO-8859-1 were satisfied by the UTF-8 Character Set. The default character encoding for HTML5 is UTF-8.
The HTML5 specification encourages developers to use the UTF-8 character set.
A character can be 1-4 bytes long in the UTF-8 Encoding Standard. This is also the most preferred encoding for email and web pages.
- Character encoding can be specified in the meta tag in HTML.
- The meta tag is used for specifying metadata about the webpage and will not be displayed in the web pages.
- The meta tag helps search engines to understand what a web page is about.
- The meta tag should be placed with the head tag in HTML.
1. For HTML4
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
2. For HTML5
The default character encoding for HTML5 is UTF-8, but you can still specify this to be extra cautious.