We have discussed Huffman Encoding in a previous post. In this post, decoding is discussed.
Input Data: AAAAAABCCCCCCDDEEEEE
Frequencies: A: 6, B: 1, C: 6, D: 2, E: 5
Encoded Data: 0000000000001100101010101011111111010101010
Huffman Tree: ‘#’ is the special character usedfor internal nodes as character field
is not needed for internal nodes.
/ \ / \
A(6) C(6) E(5) #(3)
Code of ‘A’ is ’00’, code of ‘C’ is ’01’, ..
Decoded Data: AAAAAABCCCCCCDDEEEEE
Input Data: GeeksforGeeks
Character With there Frequencies
e 10, f 1100, g 011, k 00, o 010, r 1101, s 111
Encoded Huffman data: 01110100011111000101101011101000111
Decoded Huffman Data: geeksforgeeks
Follow the below steps to solve the problem:
Note: To decode the encoded data we require the Huffman tree. We iterate through the binary encoded data. To find character corresponding to current bits, we use the following simple steps:
- We start from the root and do the following until a leaf is found.
- If the current bit is 0, we move to the left node of the tree.
- If the bit is 1, we move to right node of the tree.
- If during the traversal, we encounter a leaf node, we print the character of that particular leaf node and then again continue the iteration of the encoded data starting from step 1.
The below code takes a string as input, encodes it, and saves it in a variable encoded string. Then it decodes it and prints the original string.
Below is the implementation of the above approach:
Character With there Frequencies: e 10 f 1100 g 011 k 00 o 010 r 1101 s 111 Encoded Huffman data: 01110100011111000101101011101000111 Decoded Huffman Data: geeksforgeeks
Time complexity of the Huffman coding algorithm is O(n log n), where n is the number of characters in the input string. The auxiliary space complexity is also O(n), where n is the number of characters in the input string.
In the given C++ implementation, the time complexity is dominated by the creation of the Huffman tree using the priority queue, which takes O(n log n) time. The space complexity is dominated by the maps used to store the frequency and codes of characters, which take O(n) space. The recursive functions used to print codes and store codes also contribute to the space complexity.
Comparing Input file size and Output file size:
Comparing the input file size and the Huffman encoded output file. We can calculate the size of the output data in a simple way. Let’s say our input is a string “geeksforgeeks” and is stored in a file input.txt.
Input File Size:
Total number of character i.e. input length: 13
Size: 13 character occurrences * 8 bits = 104 bits or 13 bytes.
Output File Size:
Character | Frequency | Binary Huffman Value |
e | 4 | 10 |
f | 1 | 1100 |
g | 2 | 011 |
k | 2 | 00 |
o | 1 | 010 |
r | 1 | 1101 |
s | 2 | 111 |
So to calculate output size:
e: 4 occurrences * 2 bits = 8 bits
f: 1 occurrence * 4 bits = 4 bits
g: 2 occurrences * 3 bits = 6 bits
k: 2 occurrences * 2 bits = 4 bits
o: 1 occurrence * 3 bits = 3 bits
r: 1 occurrence * 4 bits = 4 bits
s: 2 occurrences * 3 bits = 6 bits
Total Sum: 35 bits approx 5 bytes
Hence, we could see that after encoding the data we saved a large amount of data. The above method can also help us to determine the value of N i.e. the length of the encoded data.
This article is contributed by Harshit Sidhwa. If you like GeeksforGeeks and would like to contribute, you can also write an article using write.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Login to comment...