Introduction to Data Compression

Last Updated : 27 Jul, 2021

In this article, we will discuss the overview of Data Compression and will discuss its method illustration, and also will cover the overview part entropy. Let’s discuss it one by one.

Overview :
One important area of research is data compression. It deals with the art and science of storing information in a compact form. One would have noticed that many compression packages are used to compress files. Compression reduces the cost of storage, increases the speed of algorithms, and reduces the transmission cost. Compression is achieved by removing redundancy, that is repetition of unnecessary data. Coding redundancy refers to the redundant data caused due to suboptimal coding techniques.

Method illustration :

To illustrate this method let’s assume that there are six symbols, and binary code is used to assign a unique address to each of these symbols, as shown in the following table
Binary code requires at least three bits to encode six symbols. It can also be observed that binary codes 110 and 111 are not used at all. This clearly shows that binary code is not efficient, and hence an efficient code is required to assign a unique address.

Symbols	W1	W2	W3	W4	W5	W6
Probability	0.3	0.3	0.1	0.1	0.08	0.02
Binary code	000	001	010	011	100	101

An efficient code is one that uses a minimum number of bits for representing any information. The disadvantage of binary code is that it is fixed code; a Huffman code is better, as it is a variable code.
Coding techniques are related to the concepts of entropy and information content, which are studied as a subject called information theory. Information theory also deals with uncertainty present in a message is called the information content. The information content is given as

                                 log_{2 (1/pi) or -log2 pi .}

Entropy :

Entropy is defined as a measure of orderliness that is present in the information. It is given as follows:

                                    H= - ∑ p_{i log2 pi}

Entropy is a positive quantity and specifies the minimum number of bits necessary to encode information. Thus, coding redundancy is given as the difference between the average number of bits used for coding and entropy.

coding redundancy = Average number of bits - Entropy

By removing redundancy, any information can be stored in a compact manner. This is the basis of data compression.

Suggest improvement

What is Data Encryption?

Protocols in Application Layer

Share your thoughts in the comments

OSI Model Basics

OSI Model and TCP/IP Model

Physical Layer

Data Link Layer

Network Layer

Transport Layer

Presentation Layer

Application Layer

Introduction to Data Compression

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?