Huffman Coding is one of the most popular lossless data compression techniques. This article aims at diving deep into the Huffman Coding and its implementation in Python.
What is Huffman Coding?
Huffman Coding is an approach used in lossless data compression with the primary objective of delivering reduced transit size without any loss of meaningful data content. There is a key rule that lies in Huffman coding: use shorter codes for frequent letters and longer ones for uncommon letters. Through the employment of the binary tree, called the Huffman tree, the frequency of each character is illustrated, where each leaf node represents the character and its frequency. Shorter codes which are arrived at by traveling from the tree root to the leaf node representing the character, are the ones that are applied.
Implementation of Huffman Coding in Python
1. Define the Node Class:
First of all, we define a class to instantiate Huffman nodes. Every node includes the details such as character frequency, rank of its left and right children.
2. Build the Huffman Tree:
Now, we design a function to construct our Huffman tree. We apply priority queue (heap) to link the nodes according to the lowest frequencies, and when the only one node is left there, it roots the Huffman tree.
3. Generate Huffman Codes:
For that, next we move across the Huffman tree to produce the Huffman codes for each character. Raw binary code is built from the left (with no preceding bit) to the right (bit is one), until occurring upon the character's leaf.
Below is the implementation of above approach:
# Python program for Huffman Coding
import heapq
class Node:
def __init__(self, symbol=None, frequency=None):
self.symbol = symbol
self.frequency = frequency
self.left = None
self.right = None
def __lt__(self, other):
return self.frequency < other.frequency
def build_huffman_tree(chars, freq):
# Create a priority queue of nodes
priority_queue = [Node(char, f) for char, f in zip(chars, freq)]
heapq.heapify(priority_queue)
# Build the Huffman tree
while len(priority_queue) > 1:
left_child = heapq.heappop(priority_queue)
right_child = heapq.heappop(priority_queue)
merged_node = Node(frequency=left_child.frequency + right_child.frequency)
merged_node.left = left_child
merged_node.right = right_child
heapq.heappush(priority_queue, merged_node)
return priority_queue[0]
def generate_huffman_codes(node, code="", huffman_codes={}):
if node is not None:
if node.symbol is not None:
huffman_codes[node.symbol] = code
generate_huffman_codes(node.left, code + "0", huffman_codes)
generate_huffman_codes(node.right, code + "1", huffman_codes)
return huffman_codes
# Given example
chars = ['a', 'b', 'c', 'd', 'e', 'f']
freq = [4, 7, 15, 17, 22, 42]
# Build the Huffman tree
root = build_huffman_tree(chars, freq)
# Generate Huffman codes
huffman_codes = generate_huffman_codes(root)
# Print Huffman codes
for char, code in huffman_codes.items():
print(f"Character: {char}, Code: {code}")
Output
Character: f, Code: 0 Character: a, Code: 1000 Character: b, Code: 1001 Character: c, Code: 101 Character: d, Code: 110 Character: e, Code: 111
Time Complexity: O(N * logN)
Auxiliary Space: O(N)
The Huffman Coding is an effective algorithm for data compression because it saved both storage space and transmission time. Through the effective assignment of symbol codes of variable length to the symbols by following the rule of higher frequency to lower code, Huffman coding optimizes the compression ratio and maintain the soundness of the data.