Huffman Coding in Python

Huffman Coding is one of the most popular lossless data compression techniques. This article aims at diving deep into the Huffman Coding and its implementation in Python.

What is Huffman Coding?

Huffman Coding is an approach used in lossless data compression with the primary objective of delivering reduced transit size without any loss of meaningful data content. There is a key rule that lies in Huffman coding: use shorter codes for frequent letters and longer ones for uncommon letters. Through the employment of the binary tree, called the Huffman tree, the frequency of each character is illustrated, where each leaf node represents the character and its frequency. Shorter codes which are arrived at by traveling from the tree root to the leaf node representing the character, are the ones that are applied.

Implementation of Huffman Coding in Python

1. Define the Node Class:

First of all, we define a class to instantiate Huffman nodes. Every node includes the details such as character frequency, rank of its left and right children.

2. Build the Huffman Tree:

Now, we design a function to construct our Huffman tree. We apply priority queue (heap) to link the nodes according to the lowest frequencies, and when the only one node is left there, it roots the Huffman tree.

3. Generate Huffman Codes:

For that, next we move across the Huffman tree to produce the Huffman codes for each character. Raw binary code is built from the left (with no preceding bit) to the right (bit is one), until occurring upon the character's leaf.

Below is the implementation of above approach:

Python

# Python program for Huffman Coding
import heapq

class Node:
    def __init__(self, symbol=None, frequency=None):
        self.symbol = symbol
        self.frequency = frequency
        self.left = None
        self.right = None

    def __lt__(self, other):
        return self.frequency < other.frequency

def build_huffman_tree(chars, freq):
  
    # Create a priority queue of nodes
    priority_queue = [Node(char, f) for char, f in zip(chars, freq)]
    heapq.heapify(priority_queue)

    # Build the Huffman tree
    while len(priority_queue) > 1:
        left_child = heapq.heappop(priority_queue)
        right_child = heapq.heappop(priority_queue)
        merged_node = Node(frequency=left_child.frequency + right_child.frequency)
        merged_node.left = left_child
        merged_node.right = right_child
        heapq.heappush(priority_queue, merged_node)

    return priority_queue[0]

def generate_huffman_codes(node, code="", huffman_codes={}):
    if node is not None:
        if node.symbol is not None:
            huffman_codes[node.symbol] = code
        generate_huffman_codes(node.left, code + "0", huffman_codes)
        generate_huffman_codes(node.right, code + "1", huffman_codes)

    return huffman_codes

# Given example
chars = ['a', 'b', 'c', 'd', 'e', 'f']
freq = [4, 7, 15, 17, 22, 42]

# Build the Huffman tree
root = build_huffman_tree(chars, freq)

# Generate Huffman codes
huffman_codes = generate_huffman_codes(root)

# Print Huffman codes
for char, code in huffman_codes.items():
    print(f"Character: {char}, Code: {code}")

Output

Character: f, Code: 0
Character: a, Code: 1000
Character: b, Code: 1001
Character: c, Code: 101
Character: d, Code: 110
Character: e, Code: 111

Time Complexity: O(N * logN)
Auxiliary Space: O(N)

The Huffman Coding is an effective algorithm for data compression because it saved both storage space and transmission time. Through the effective assignment of symbol codes of variable length to the symbols by following the rule of higher frequency to lower code, Huffman coding optimizes the compression ratio and maintain the soundness of the data.

Article Tags :

DSA

python