Open In App

Different types of Coding Schemes to represent data

Last Updated : 08 Feb, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Any text-based data is stored by the computer in the form of bits(a series of 1s and 0s), and follows the specified Coding Scheme. The coding scheme is a Standard that tells the user’s machine which character represents which set of bytes. Specifying the coding scheme used is very important as without it, the machine could interpret the given bytes as a different character than intended. For Example : 0x6B may be interpreted as the character ‘k’ in ASCII, but as the character ‘, ‘ in the less commonly used

EBCDIC

coding scheme.

  • ASCII(American Standard Code for Information Interchange): ASCII may be considered the most widespread coding scheme used. Developed by the American Standards Association, ASCII was introduced in 1963 as ASA X3.4-1963. It has definitions for 128 characters- 0x00 to 0x7f, which are represented by 7 bits. In ASCII Format-
    Characters Decimal Hexadecimal
    0-9 48-57 30-39
    A-Z 65-90 41-5A
    a-z 97-122 61-7A

    The rest of the Hexadecimal is filled with other special characters and punctuation.

  • UTF-32 (Unicode Transformation Format 32-bit) : UTF-32 is a coding scheme utilizing 4 bytes to represent a character. It is a fixed length scheme, that is, each character is always represented by 4 bytes. It was used to represent all of Unicode’s 1, 112, 064 code points. Due to the large space requirements of this scheme, it was made obsolete by the later developed more efficient schemes.
  • UTF-16(Unicode Transformation Format 16-bit) : UTF-32 is a coding scheme utilizing either 2 or 4 bytes to represent a character. It can represent all of Unicode’s 1, 112, 064 code points.
  • UTF-8(Unicode Transformation Format 8-bit) : Introduced in 1993, UTF-8 is a coding scheme which requires each character to be represented by at least 1 byte. It can represent all of Unicode’s code points. UTF-8 is a super-set of ASCII, as the first 128 characters, from 0x00 to 0x7f, are the same as ASCII. Thus, this UTF scheme is reverse Compatible with ASCII. It is a variable length encoding, with either 1, 2, 3 or 4 bytes used to represent a character. In order to indicate that two(or more) consecutive bytes are the part of same character, or represent two different characters, the first few bits of each byte are used as indicators.
  • ISCII(Indian Script Code for Information Interchange) : It is a coding scheme which can accommodate the characters used by various Indian scripts. It is an 8-bit scheme. The First 128 characters are the same as ASCII, and only the next 128 bit space is used to represent ISCII specific characters.

Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads