Find the k most frequent words from a file

Last Updated : 13 Dec, 2023

Given a book of words. Assume you have enough main memory to accommodate all words. design a data structure to find top K maximum occurring words. The data structure should be dynamic so that new words can be added.
A simple solution is to use Hashing. Hash all words one by one in a hash table. If a word is already present, then increment its count. Finally, traverse through the hash table and return the k words with maximum counts.
We can use Trie and Min Heap to get the k most frequent words efficiently. The idea is to use Trie for searching existing words adding new words efficiently. Trie also stores count of occurrences of words. A Min Heap of size k is used to keep track of k most frequent words at any point of time(Use of Min Heap is same as we used it to find k largest elements in this post).
Trie and Min Heap are linked with each other by storing an additional field in Trie ‘indexMinHeap’ and a pointer ‘trNode’ in Min Heap. The value of ‘indexMinHeap’ is maintained as -1 for the words which are currently not in Min Heap (or currently not among the top k frequent words). For the words which are present in Min Heap, ‘indexMinHeap’ contains, index of the word in Min Heap. The pointer ‘trNode’ in Min Heap points to the leaf node corresponding to the word in Trie.
Following is the complete process to print k most frequent words from a file.
Read all words one by one. For every word, insert it into Trie. Increase the counter of the word, if already exists. Now, we need to insert this word in min heap also. For insertion in min heap, 3 cases arise:
1. The word is already present. We just increase the corresponding frequency value in min heap and call minHeapify() for the index obtained by “indexMinHeap” field in Trie. When the min heap nodes are being swapped, we change the corresponding minHeapIndex in the Trie. Remember each node of the min heap is also having pointer to Trie leaf node.
2. The minHeap is not full. we will insert the new word into min heap & update the root node in the min heap node & min heap index in Trie leaf node. Now, call buildMinHeap().
3. The min heap is full. Two sub-cases arise.
….3.1 The frequency of the new word inserted is less than the frequency of the word stored in the head of min heap. Do nothing.
….3.2 The frequency of the new word inserted is greater than the frequency of the word stored in the head of min heap. Replace & update the fields. Make sure to update the corresponding min heap index of the “word to be replaced” in Trie with -1 as the word is no longer in min heap.
4. Finally, Min Heap will have the k most frequent words of all words present in given file. So we just need to print all words present in Min Heap.

CPP

// A program to find k most frequent words in a file
#include <ctype.h>
#include <stdio.h>
#include <string.h>
 
#define MAX_CHARS 26
#define MAX_WORD_SIZE 30
 
// A Trie node
struct TrieNode {
    bool isEnd; // indicates end of word
    unsigned
        frequency; // the number of occurrences of a word
    int indexMinHeap; // the index of the word in minHeap
    TrieNode* child[MAX_CHARS]; // represents 26 slots each
                                // for 'a' to 'z'.
};
 
// A Min Heap node
struct MinHeapNode {
    TrieNode* root; // indicates the leaf node of TRIE
    unsigned frequency; // number of occurrences
    char* word; // the actual word stored
};
 
// A Min Heap
struct MinHeap {
    unsigned capacity; // the total size a min heap
    int count; // indicates the number of slots filled.
    MinHeapNode*
        array; // represents the collection of minHeapNodes
};
 
// A utility function to create a new Trie node
TrieNode* newTrieNode()
{
    // Allocate memory for Trie Node
    TrieNode* trieNode = new TrieNode;
 
    // Initialize values for new node
    trieNode->isEnd = 0;
    trieNode->frequency = 0;
    trieNode->indexMinHeap = -1;
    for (int i = 0; i < MAX_CHARS; ++i)
        trieNode->child[i] = NULL;
 
    return trieNode;
}
 
// A utility function to create a Min Heap of given capacity
MinHeap* createMinHeap(int capacity)
{
    MinHeap* minHeap = new MinHeap;
 
    minHeap->capacity = capacity;
    minHeap->count = 0;
 
    // Allocate memory for array of min heap nodes
    minHeap->array = new MinHeapNode[minHeap->capacity];
 
    return minHeap;
}
 
// A utility function to swap two min heap nodes. This
// function is needed in minHeapify
void swapMinHeapNodes(MinHeapNode* a, MinHeapNode* b)
{
    MinHeapNode temp = *a;
    *a = *b;
    *b = temp;
}
 
// This is the standard minHeapify function. It does one
// thing extra. It updates the minHapIndex in Trie when two
// nodes are swapped in in min heap
void minHeapify(MinHeap* minHeap, int idx)
{
    int left, right, smallest;
 
    left = 2 * idx + 1;
    right = 2 * idx + 2;
    smallest = idx;
    if (left < minHeap->count
        && minHeap->array[left].frequency
               < minHeap->array[smallest].frequency)
        smallest = left;
 
    if (right < minHeap->count
        && minHeap->array[right].frequency
               < minHeap->array[smallest].frequency)
        smallest = right;
 
    if (smallest != idx) {
        // Update the corresponding index in Trie node.
        minHeap->array[smallest].root->indexMinHeap = idx;
        minHeap->array[idx].root->indexMinHeap = smallest;
 
        // Swap nodes in min heap
        swapMinHeapNodes(&minHeap->array[smallest],
                         &minHeap->array[idx]);
 
        minHeapify(minHeap, smallest);
    }
}
 
// A standard function to build a heap
void buildMinHeap(MinHeap* minHeap)
{
    int n, i;
    n = minHeap->count - 1;
 
    for (i = (n - 1) / 2; i >= 0; --i)
        minHeapify(minHeap, i);
}
 
// Inserts a word to heap, the function handles the 3 cases
// explained above
void insertInMinHeap(MinHeap* minHeap, TrieNode** root,
                     const char* word)
{
    // Case 1: the word is already present in minHeap
    if ((*root)->indexMinHeap != -1) {
        ++(minHeap->array[(*root)->indexMinHeap].frequency);
 
        // percolate down
        minHeapify(minHeap, (*root)->indexMinHeap);
    }
 
    // Case 2: Word is not present and heap is not full
    else if (minHeap->count < minHeap->capacity) {
        int count = minHeap->count;
        minHeap->array[count].frequency
            = (*root)->frequency;
        minHeap->array[count].word
            = new char[strlen(word) + 1];
        strcpy(minHeap->array[count].word, word);
 
        minHeap->array[count].root = *root;
        (*root)->indexMinHeap = minHeap->count;
 
        ++(minHeap->count);
        buildMinHeap(minHeap);
    }
 
    // Case 3: Word is not present and heap is full. And
    // frequency of word is more than root. The root is the
    // least frequent word in heap, replace root with new
    // word
    else if ((*root)->frequency
             > minHeap->array[0].frequency) {
 
        minHeap->array[0].root->indexMinHeap = -1;
        minHeap->array[0].root = *root;
        minHeap->array[0].root->indexMinHeap = 0;
        minHeap->array[0].frequency = (*root)->frequency;
 
        // delete previously allocated memory and
        delete[] minHeap->array[0].word;
        minHeap->array[0].word = new char[strlen(word) + 1];
        strcpy(minHeap->array[0].word, word);
 
        minHeapify(minHeap, 0);
    }
}
 
// Inserts a new word to both Trie and Heap
void insertUtil(TrieNode** root, MinHeap* minHeap,
                const char* word, const char* dupWord)
{
    // Base Case
    if (*root == NULL)
        *root = newTrieNode();
 
    // There are still more characters in word
    if (*word != '\0')
        insertUtil(&((*root)->child[tolower(*word) - 97]),
                   minHeap, word + 1, dupWord);
    else // The complete word is processed
    {
        // word is already present, increase the frequency
        if ((*root)->isEnd)
            ++((*root)->frequency);
        else {
            (*root)->isEnd = 1;
            (*root)->frequency = 1;
        }
 
        // Insert in min heap also
        insertInMinHeap(minHeap, root, dupWord);
    }
}
 
// add a word to Trie & min heap. A wrapper over the
// insertUtil
void insertTrieAndHeap(const char* word, TrieNode** root,
                       MinHeap* minHeap)
{
    insertUtil(root, minHeap, word, word);
}
 
// A utility function to show results, The min heap
// contains k most frequent words so far, at any time
void displayMinHeap(MinHeap* minHeap)
{
    int i;
 
    // print top K word with frequency
    for (i = 0; i < minHeap->count; ++i) {
        printf("%s : %d\n", minHeap->array[i].word,
               minHeap->array[i].frequency);
    }
}
 
// The main function that takes a file as input, add words
// to heap and Trie, finally shows result from heap
void printKMostFreq(FILE* fp, int k)
{
    // Create a Min Heap of Size k
    MinHeap* minHeap = createMinHeap(k);
 
    // Create an empty Trie
    TrieNode* root = NULL;
 
    // A buffer to store one word at a time
    char buffer[MAX_WORD_SIZE];
 
    // Read words one by one from file. Insert the word in
    // Trie and Min Heap
    while (fscanf(fp, "%s", buffer) != EOF)
        insertTrieAndHeap(buffer, &root, minHeap);
 
    // The Min Heap will have the k most frequent words, so
    // print Min Heap nodes
    displayMinHeap(minHeap);
}
 
// Driver program to test above functions
int main()
{
    int k = 5;
    FILE* fp = fopen("file.txt", "r");
    if (fp == NULL)
        printf("File doesn't exist ");
    else
        printKMostFreq(fp, k);
    return 0;
}

Java

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
 
class TrieNode {
    boolean isEnd;
    int frequency;
    int indexMinHeap;
    TrieNode[] child;
 
    public TrieNode() {
        isEnd = false;
        frequency = 0;
        indexMinHeap = -1;
        child = new TrieNode[26]; // Represents 26 slots for 'a' to 'z'
    }
}
 
class MinHeapNode {
    TrieNode root;
    int frequency;
    String word;
 
    public MinHeapNode(TrieNode root, int frequency, String word) {
        this.root = root;
        this.frequency = frequency;
        this.word = word;
    }
}
 
class MinHeap {
    int capacity;
    int count;
    MinHeapNode[] array;
 
    public MinHeap(int capacity) {
        this.capacity = capacity;
        this.count = 0;
        this.array = new MinHeapNode[capacity];
    }
}
 
public class KMostFrequentWords {
 
    // Utility function to create a new Trie node
    public static TrieNode newTrieNode() {
        TrieNode trieNode = new TrieNode();
        return trieNode;
    }
 
    // Utility function to create a Min Heap of given capacity
    public static MinHeap createMinHeap(int capacity) {
        MinHeap minHeap = new MinHeap(capacity);
        return minHeap;
    }
 
    // Build Min Heap
    public static void buildMinHeap(MinHeap minHeap) {
        int n = minHeap.count - 1;
        for (int i = (n - 1) / 2; i >= 0; --i)
            minHeapify(minHeap, i);
    }
 
    // Insert a word into the Min Heap
    public static void insertInMinHeap(MinHeap minHeap, TrieNode root, String word) {
        if (root.indexMinHeap != -1) {
            minHeap.array[root.indexMinHeap].frequency += 1;
            minHeapify(minHeap, root.indexMinHeap);
        } else if (minHeap.count < minHeap.capacity) {
            int count = minHeap.count;
            minHeap.array[count] = new MinHeapNode(root, root.frequency, word);
            root.indexMinHeap = minHeap.count;
            minHeap.count += 1;
            buildMinHeap(minHeap);
        } else if (root.frequency > minHeap.array[0].frequency) {
            minHeap.array[0].root.indexMinHeap = -1;
            minHeap.array[0].root = root;
            minHeap.array[0].root.indexMinHeap = 0;
            minHeap.array[0].frequency = root.frequency;
            minHeap.array[0].word = word;
            minHeapify(minHeap, 0);
        }
    }
 
    // Insert a word into Trie and Min Heap
    public static void insertUtil(TrieNode root, MinHeap minHeap, String word, String dupWord) {
        if (root == null)
            root = newTrieNode();
 
        if (!word.isEmpty())
            insertUtil(root.child[word.toLowerCase().charAt(0) - 'a'], minHeap, word.substring(1), dupWord);
        else {
            if (root.isEnd)
                root.frequency += 1;
            else {
                root.isEnd = true;
                root.frequency = 1;
            }
            insertInMinHeap(minHeap, root, dupWord);
        }
    }
 
    // Wrapper for inserting into Trie and Min Heap
    public static void insertTrieAndHeap(String word, TrieNode root, MinHeap minHeap) {
        insertUtil(root, minHeap, word, word);
    }
 
    // Display the contents of the Min Heap
    public static void displayMinHeap(MinHeap minHeap) {
        for (int i = 0; i < minHeap.count; ++i) {
            System.out.println(minHeap.array[i].word + ": " + minHeap.array[i].frequency);
        }
    }
 
    // Standard Min Heapify function
    public static void minHeapify(MinHeap minHeap, int idx) {
        int left, right, smallest;
 
        left = 2 * idx + 1;
        right = 2 * idx + 2;
        smallest = idx;
        if (left < minHeap.count && minHeap.array[left].frequency < minHeap.array[smallest].frequency)
            smallest = left;
 
        if (right < minHeap.count && minHeap.array[right].frequency < minHeap.array[smallest].frequency)
            smallest = right;
 
        if (smallest != idx) {
            minHeap.array[smallest].root.indexMinHeap = idx;
            minHeap.array[idx].root.indexMinHeap = smallest;
 
            MinHeapNode temp = minHeap.array[smallest];
            minHeap.array[smallest] = minHeap.array[idx];
            minHeap.array[idx] = temp;
 
            minHeapify(minHeap, smallest);
        }
    }
 
    // Main function to read words from a file and print k most frequent words
    public static void printKMostFreq(String filePath, int k) throws FileNotFoundException {
        MinHeap minHeap = createMinHeap(k);
        TrieNode root = null;
 
        try (Scanner scanner = new Scanner(new File(filePath))) {
            while (scanner.hasNext()) {
                String[] words = scanner.next().split("\\s+");
                for (String word : words) {
                    insertTrieAndHeap(word, root, minHeap);
                }
            }
        }
 
        displayMinHeap(minHeap);
    }
 
    public static void main(String[] args) {
        int k = 5;
        String filePath = "/file.txt";
        try {
            printKMostFreq(filePath, k);
        } catch (FileNotFoundException e) {
            System.out.println("File doesn't exist.");
        }
    }
}

Python3

from queue import PriorityQueue
 
MAX_CHARS = 26
MAX_WORD_SIZE = 30
 
# A Trie node
class TrieNode:
    def __init__(self):
        self.isEnd = False
        self.frequency = 0
        self.indexMinHeap = -1
        self.child = [None] * MAX_CHARS
 
# A Min Heap node
class MinHeapNode:
    def __init__(self, root, frequency, word):
        self.root = root
        self.frequency = frequency
        self.word = word
 
# A Min Heap
class MinHeap:
    def __init__(self, capacity):
        self.capacity = capacity
        self.count = 0
        self.array = [None] * capacity
 
# A utility function to create a new Trie node
def newTrieNode():
    trieNode = TrieNode()
    return trieNode
 
# A utility function to create a Min Heap of given capacity
def createMinHeap(capacity):
    minHeap = MinHeap(capacity)
    return minHeap
 
# A standard function to build a heap
def buildMinHeap(minHeap):
    n = minHeap.count - 1
    for i in range((n - 1) // 2, -1, -1):
        minHeapify(minHeap, i)
 
# Inserts a word to heap, the function handles the 3 cases explained above
def insertInMinHeap(minHeap, root, word):
    # Case 1: the word is already present in minHeap
    if root.indexMinHeap != -1:
        minHeap.array[root.indexMinHeap].frequency += 1
        # Percolate down
        minHeapify(minHeap, root.indexMinHeap)
    # Case 2: Word is not present and heap is not full
    elif minHeap.count < minHeap.capacity:
        count = minHeap.count
        minHeap.array[count] = MinHeapNode(root, root.frequency, word)
        root.indexMinHeap = minHeap.count
        minHeap.count += 1
        buildMinHeap(minHeap)
    # Case 3: Word is not present and heap is full. And frequency of word
    # is more than root. The root is the least frequent word in heap,
    # replace root with a new word
    elif root.frequency > minHeap.array[0].frequency:
        minHeap.array[0].root.indexMinHeap = -1
        minHeap.array[0].root = root
        minHeap.array[0].root.indexMinHeap = 0
        minHeap.array[0].frequency = root.frequency
        minHeap.array[0].word = word
        minHeapify(minHeap, 0)
 
# Inserts a new word to both Trie and Heap
def insertUtil(root, minHeap, word, dupWord):
    if root is None:
        root = newTrieNode()
     
    if word:
        insertUtil(root.child[ord(word[0]) - ord('a')], minHeap, word[1:], dupWord)
    else:
        if root.isEnd:
            root.frequency += 1
        else:
            root.isEnd = True
            root.frequency = 1
        insertInMinHeap(minHeap, root, dupWord)
 
# Add a word to Trie and min heap. A wrapper over the insertUtil
def insertTrieAndHeap(word, root, minHeap):
    insertUtil(root, minHeap, word, word)
 
# A utility function to show results. The min heap
# contains the k most frequent words so far, at any time
def displayMinHeap(minHeap):
    for i in range(minHeap.count):
        print(f"{minHeap.array[i].word}: {minHeap.array[i].frequency}")
 
# The main function that takes a file as input, adds words to heap
# and Trie, finally shows results from the heap
def printKMostFreq(file_path, k):
    # Create a Min Heap of Size k
    minHeap = createMinHeap(k)
 
    # Create an empty Trie
    root = None
 
    # Read words one by one from the file. Insert the word in Trie and Min Heap
    with open(file_path, 'r') as file:
        for line in file:
            words = line.split()
            for word in words:
                insertTrieAndHeap(word, root, minHeap)
 
    # The Min Heap will have the k most frequent words, so print Min Heap nodes
    displayMinHeap(minHeap)
     
# This is the standard minHeapify function. It updates the minHeapIndex in Trie when two nodes are swapped in the min heap
def minHeapify(minHeap, idx):
    left = 2 * idx + 1
    right = 2 * idx + 2
    smallest = idx
 
    if left < minHeap.count and minHeap.array[left].frequency < minHeap.array[smallest].frequency:
        smallest = left
 
    if right < minHeap.count and minHeap.array[right].frequency < minHeap.array[smallest].frequency:
        smallest = right
 
    if smallest != idx:
        # Update the corresponding index in the Trie node
        minHeap.array[smallest].root.indexMinHeap = idx
        minHeap.array[idx].root.indexMinHeap = smallest
 
        # Swap nodes in the min heap
        minHeap.array[smallest], minHeap.array[idx] = minHeap.array[idx], minHeap.array[smallest]
        minHeapify(minHeap, smallest)
 
# Driver program to test the above functions
if __name__ == "__main__":
    k = 5
    file_path = "/file.txt"
    try:
        printKMostFreq(file_path, k)
    except FileNotFoundError:
        print("File doesn't exist.")

C#

using System;
using System.IO;
 
public class TrieNode
{
    public bool isEnd;
    public int frequency;
    public int indexMinHeap;
    public TrieNode[] child;
 
    public TrieNode()
    {
        isEnd = false;
        frequency = 0;
        indexMinHeap = -1;
        child = new TrieNode[26]; // 26 slots for 'a' to 'z'
    }
}
 
public class MinHeapNode
{
    public TrieNode root;
    public int frequency;
    public string word;
 
    public MinHeapNode(TrieNode root, int frequency, string word)
    {
        this.root = root;
        this.frequency = frequency;
        this.word = word;
    }
}
 
public class MinHeap
{
    public int capacity;
    public int count;
    public MinHeapNode[] array;
 
    public MinHeap(int capacity)
    {
        this.capacity = capacity;
        count = 0;
        array = new MinHeapNode[capacity];
    }
}
 
public class KMostFrequentWords
{
    // Constants defining the maximum characters and word size
    private const int MAX_CHARS = 26;
    private const int MAX_WORD_SIZE = 30;
 
    // Utility function to create a new Trie node
    private static TrieNode NewTrieNode()
    {
        return new TrieNode();
    }
 
    // Utility function to create a Min Heap of given capacity
    private static MinHeap CreateMinHeap(int capacity)
    {
        return new MinHeap(capacity);
    }
 
    // Function to build the min heap
    private static void BuildMinHeap(MinHeap minHeap)
    {
        int n = minHeap.count - 1;
        for (int i = (n - 1) / 2; i >= 0; --i)
        {
            MinHeapify(minHeap, i);
        }
    }
 
    // Helper function for min heapify
    private static void MinHeapify(MinHeap minHeap, int idx)
    {
        // Implementation of the min heapify operation
        // to maintain the min heap property
    }
 
    // Function to insert a word into the min heap
    private static void InsertInMinHeap(MinHeap minHeap, TrieNode root, string word)
    {
        // Logic to insert a word into the min heap
    }
 
    // Utility function to recursively insert a word into the Trie and Min Heap
    private static void InsertUtil(TrieNode root, MinHeap minHeap, string word, string dupWord)
    {
        // Logic to insert a word into Trie and Min Heap
    }
 
    // Function to add a word to Trie and Min Heap
    private static void InsertTrieAndHeap(string word, TrieNode root, MinHeap minHeap)
    {
        InsertUtil(root, minHeap, word, word);
    }
 
    // Function to display the contents of the Min Heap
    private static void DisplayMinHeap(MinHeap minHeap)
    {
        // Logic to display the contents of the Min Heap
    }
 
    // Function to find and print k most frequent words in a file
    public static void PrintKMostFreq(string filePath, int k)
    {
        MinHeap minHeap = CreateMinHeap(k);
        TrieNode root = null;
 
        try
        {
            // Reading words from the file and inserting into Trie and Min Heap
            using (StreamReader file = new StreamReader(filePath))
            {
                string line;
                while ((line = file.ReadLine()) != null)
                {
                    string[] words = line.Split(' ');
                    foreach (string word in words)
                    {
                        InsertTrieAndHeap(word, root, minHeap);
                    }
                }
            }
        }
        catch (FileNotFoundException)
        {
            Console.WriteLine("File doesn't exist.");
            return;
        }
 
        // Displaying the k most frequent words in the Min Heap
        DisplayMinHeap(minHeap);
    }
 
    // Main method to test the functionality
    public static void Main()
    {
        int k = 5; // Number of most frequent words to find
        string filePath = "/file.txt"; // Path to the file
        PrintKMostFreq(filePath, k); // Finding and printing k most frequent words
    }
}

Javascript

// JavaScript Code
 
const fs = require('fs');
 
class TrieNode {
    constructor() {
        this.isEnd = false;
        this.frequency = 0;
        this.indexMinHeap = -1;
        this.child = new Array(26).fill(null); // 26 slots for 'a' to 'z'
    }
}
 
class MinHeapNode {
    constructor(root, frequency, word) {
        this.root = root;
        this.frequency = frequency;
        this.word = word;
    }
}
 
class MinHeap {
    constructor(capacity) {
        this.capacity = capacity;
        this.count = 0;
        this.array = new Array(capacity);
    }
}
 
// Utility function to create a new Trie node
function newTrieNode() {
    return new TrieNode();
}
 
// Utility function to create a Min Heap of given capacity
function createMinHeap(capacity) {
    return new MinHeap(capacity);
}
 
// Function to build the min heap
function buildMinHeap(minHeap) {
    const n = minHeap.count - 1;
    for (let i = Math.floor((n - 1) / 2); i >= 0; --i) {
        minHeapify(minHeap, i);
    }
}
 
// Helper function for min heapify
function minHeapify(minHeap, idx) {
    let left, right, smallest;
 
    left = 2 * idx + 1;
    right = 2 * idx + 2;
    smallest = idx;
 
    if (left < minHeap.count && minHeap.array[left].frequency < minHeap.array[smallest].frequency)
        smallest = left;
 
    if (right < minHeap.count && minHeap.array[right].frequency < minHeap.array[smallest].frequency)
        smallest = right;
 
    if (smallest !== idx) {
        // Update the corresponding index in Trie node.
        minHeap.array[smallest].root.indexMinHeap = idx;
        minHeap.array[idx].root.indexMinHeap = smallest;
 
        // Swap nodes in min heap
        [minHeap.array[smallest], minHeap.array[idx]] = [minHeap.array[idx], minHeap.array[smallest]];
 
        minHeapify(minHeap, smallest);
    }
}
 
// Function to insert a word into the min heap
function insertInMinHeap(minHeap, root, word) {
    // Logic to insert a word into the min heap
}
 
// Utility function to recursively insert a word into the Trie and Min Heap
function insertUtil(root, minHeap, word, dupWord) {
    // Logic to insert a word into Trie and Min Heap
}
 
// Function to add a word to Trie and Min Heap
function insertTrieAndHeap(word, root, minHeap) {
    insertUtil(root, minHeap, word, word);
}
 
// Function to display the contents of the Min Heap
function displayMinHeap(minHeap) {
    // Logic to display the contents of the Min Heap
}
 
// Function to find and print k most frequent words in a file
function printKMostFreq(filePath, k) {
    const minHeap = createMinHeap(k);
    let root = null;
 
    try {
        // Reading words from the file and inserting into Trie and Min Heap
        const fileContent = fs.readFileSync(filePath, 'utf-8');
        const lines = fileContent.split('\n');
 
        lines.forEach(line => {
            const words = line.split(' ');
            words.forEach(word => {
                insertTrieAndHeap(word, root, minHeap);
            });
        });
    } catch (err) {
        console.log('File doesn\'t exist.');
        return;
    }
 
    // Displaying the k most frequent words in the Min Heap
    displayMinHeap(minHeap);
}
 
// Main method to test the functionality
const k = 5; // Number of most frequent words to find
const filePath = "file.txt"; // Path to the file
printKMostFreq(filePath, k); // Finding and printing k most frequent words

Output:

your : 3
well : 3
and : 4
to : 4
Geeks : 6

The above output is for a file with following content.

Welcome to the world of Geeks 
This portal has been created to provide well written well thought and well explained 
solutions for selected questions If you like Geeks for Geeks and would like to contribute 
here is your chance You can write article and mail your article to contribute at 
geeksforgeeks org See your article appearing on the Geeks for Geeks main page and help 
thousands of other Geeks

Suggest improvement

Find a Fixed Point (Value equal to index) in a given array

Find k closest elements to a given value

Share your thoughts in the comments

Most Common Searching Algorithms

Other Searching Algorithms

Comparisons between Searching Algorithms

Library implementations of Searching algorithms

Easy problems on Searching algorithms

Medium problems on Searching algorithms

Hard problems on Searching algorithms

Find the k most frequent words from a file

CPP

Java

Python3

C#

Javascript

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?