Open In App

Minimum size of subset od String with frequency more than half of Array

Last Updated : 27 Oct, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Given an Array of Strings (Arr), the task is to find the smallest subset of strings in the array such that the total count of those selected strings exceeds 50% of the size of the original array. In other words, find the minimum set of distinct strings that constitutes over 50% of the array’s elements.

Examples:

Input: Arr = [‘shoes’, ‘face’, ‘pizza’, ‘covid’, ‘shoes’, ‘covid’, ‘covid’, ‘face’, ‘shoes’]
Output: [‘covid’, ‘shoes’]
Explanation: Frequency of the strings is as follows: ‘shoes’ : 3, ‘covid’ : 3, ‘face’ : 2, ‘pizza’ : 1
So ‘shoes’ (3) + ‘covid’ (3) = 6 makes greater than the size of the array.

Input: Arr = [‘java’, ‘python’, ‘java’, ‘python’, ‘python’]
Output: [‘python’]
Explanation: Frequency of the strings is as follows: ‘python’ : 3, ‘java’ : 2.
So ‘python’ (3) makes greater than the size of the array.

Source: MindTickle Off-Campus Full Time Interview Experience

Approach #1 :

Iterate through the arr and form a key in dictionary of newly occurred element or if element is already occurred, increase its value by 1 to count the frequency and then sort the dictionary in decreasing order and iterate through the dictionary until we get a subset.

Code:

Below is the implementation of the above approach:

Python3




def min_subset_to_exceed_half(arr):
    # Initialize a dictionary to store string frequencies
    frequency = {}
 
    # Calculate the threshold frequency to exceed half of the array
    max_freq = (len(arr) // 2) + 1
 
    # Initialize a list to store the selected strings
    max_freq_strings = []
 
    # Count the frequency of each string in the array
    for string in arr:
        if string in frequency:
            frequency[string] += 1
        else:
            frequency[string] = 1
 
    # Sort the dictionary by frequency in descending order
    sorted_frequency = dict(
        sorted(frequency.items(), key=lambda item: item[1], reverse=True))
 
    # Initialize a variable to keep track of the current frequency sum
    curr_freq = 0
 
    # Iterate through the sorted dictionary and select strings until the threshold is reached
    for i in sorted_frequency:
        max_freq_strings.append(i)
        curr_freq += sorted_frequency[i]
 
        # Check if the threshold is exceeded, and if so, break out of the loop
        if curr_freq >= max_freq:
            break
 
    return max_freq_strings
 
 
# Driver Code
arr = ["shoes", "face", "pizza", "covid",
       "shoes", "covid", "covid", "face", "shoes"]
# Calling and printing the result
print(*min_subset_to_exceed_half(arr))  # Output: shoes covid


Javascript




// JavaScript code for the above approach:
function minSubsetToExceedHalf(arr) {
    // Initialize a Map to store string frequencies
    const frequency = new Map();
 
    // Calculate the threshold frequency to exceed half of the array
    const maxFreq = Math.floor(arr.length / 2) + 1;
 
    // Initialize an array to store the selected strings
    const maxFreqStrings = [];
 
    // Count the frequency of each string in the array
    for (const string of arr) {
        if (frequency.has(string)) {
            frequency.set(string, frequency.get(string) + 1);
        } else {
            frequency.set(string, 1);
        }
    }
 
    // Sort the Map by frequency in descending order
    const sortedFrequency = new Map(
        [...frequency.entries()].sort((a, b) => b[1] - a[1])
    );
 
    // Initialize a variable to keep track of the current frequency sum
    let currFreq = 0;
 
    // Iterate through the sorted Map and select strings until the threshold is reached
    for (const [key, value] of sortedFrequency) {
        maxFreqStrings.push(key);
        currFreq += value;
 
        // Check if the threshold is exceeded, and if so, break out of the loop
        if (currFreq >= maxFreq) {
            break;
        }
    }
 
    return maxFreqStrings;
}
 
// Driver Code
const arr = ["shoes", "face", "pizza", "covid", "shoes", "covid", "covid", "face", "shoes"];
 
// Calling and printing the result
console.log(minSubsetToExceedHalf(arr).join(' '));


Output

shoes covid


Time Complexity: O(N Log N),

Auxiliary Space: O(N), where N represents the number of unique strings in the input array.

Approach #2: Using collections.counter()

The most suggested method that could be used to find all occurrences is this method, which actually gets all element frequencies and could also be used to print single element frequencies if required.

Code:

Below is the implementation of the above approach:

Python3




from collections import Counter
 
 
def min_subset_to_exceed_half(arr):
    # Count the frequency of each string in the array using Counter
    frequency = Counter(arr)
 
    # Calculate the threshold frequency to exceed half of the array
    max_freq = (len(arr) // 2) + 1
 
    # Initialize a list to store the selected strings
    max_freq_strings = []
 
    # Sort the Counter by frequency in descending order
    sorted_frequency = dict(
        sorted(frequency.items(), key=lambda item: item[1], reverse=True))
 
    # Initialize a variable to keep track of the current frequency sum
    curr_freq = 0
 
    # Iterate through the sorted dictionary and select strings until the threshold is reached
    for i in sorted_frequency:
        max_freq_strings.append(i)
        curr_freq += sorted_frequency[i]
 
        # Check if the threshold is exceeded, and if so, break out of the loop
        if curr_freq >= max_freq:
            break
 
    return max_freq_strings
 
 
# Driver Code
arr = ["shoes", "face", "pizza", "covid",
       "shoes", "covid", "covid", "face", "shoes"]
# Calling and printing the result
print(*min_subset_to_exceed_half(arr))  # Output: shoes covid


Output

shoes covid


Time Complexity: O(N Log N),

Auxiliary Space: O(N), where N represents the number of unique strings in the input array.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads