How to indicate character set being used by a document in HTML ?
For web browser to understand the set of characters used in the HTML document, various HTML encoding character set representations has been used like ASCII, ISO-8859-1, UTF-8 etc. The character set being used by an HTML document is indicated using the charset attribute of a <meta> tag inside the <head> element of the HTML.
In HTML4, ISO-8859-1 was the by-default character set. HTML 4 also supported UTF-8.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
In HTML5, developers were encouraged to use the UTF-8 character set, which covers all characters and symbols. Therefore, UTF-8 is the default character set for HTML-5.
UTF stands for Unicode Transformation Format, where ‘8’ in UTF-8 means it uses 8-bit blocks to represent a character.
Ways to indicate character set: Character Set is also abbreviated as charset, which is an attribute in an HTML document to term the character encoding. Character Set in HTML document can be represented by following two ways:
1. Using Meta tag:
- The meta tag is used to specify the character encoding in an HTML document.
- The meta tag defines meta-data about HTML document which is not displayed on the webpage.
- It helps search engines to understand what that particular webpage is about.
<head> <meta charset="UTF-8"> </head>
2. Using Script tag:
- The script tag is used to specify the character encoding in an external file.
- The script tag defines client-side script.
- The script tag refers to an external script file through the ‘src’ attribute.
<script src="script.js" charset="UTF-8"></script>
Where the charset is an attribute for character encoding. Charset allows the web browser to use the character encoding while translating the machine code into human-readable text and vice-versa to display in the browser.
Example: Below in the code section, the example snippet of both the tags are mentioned.