Open In App

Flajolet Martin Algorithm

The Flajolet-Martin algorithm is also known as probabilistic algorithm which is mainly used to count the number of unique elements in a stream or database . This algorithm was invented by Philippe Flajolet and G. Nigel Martin in 1983 and since then it has been used in various applications such as , data mining and database management.

The basic idea to which Flajolet-Martin algorithm is based on is to use a hash function to map the elements in the given dataset to a binary string, and to make use of the length of the longest null sequence in the binary string as an estimator for the number of unique elements to use as a value element.



The steps for the Flajolet-Martin algorithm are:

The accuracy of Flajolet Martin Algorithm is determined by the length of the binary strings and the number of hash functions it uses. Generally, with increse in the length of the binary strings or using more hash functions in algorithm can often increase the algorithm’s accuracy.



The Flajolet Martin Algorithm is especially used for big datasets that cannot be kept in memory or analysed with regular methods. This algorithm , by using good probabilistic techniques, can provide a precise estimate of the number of unique elements in the data set by using less computing.

Code:




import random
import math
 
def trailing_zeros(x):
    """ Counting number of trailing zeros
    in the binary representation of x."""
    if x == 0:
        return 1
    count = 0
    while x & 1 == 0:
        count += 1
        x >>= 1
    return count
 
def flajolet_martin(dataset, k):
    """Number of distinct elements using
    the Flajolet-Martin Algorithm."""
    max_zeros= 0
    for i in range(k):
        hash_vals = [trailing_zeros(random.choice(dataset))
                     for _ in range(len(dataset))]
        max_zeros = max(max_zeros, max(hash_vals))
     
    return 2 ** max_zeros
 
# Example
dataset = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
dist_num = flajolet_martin(dataset, 10)
print("Estimated number of distinct elements:", dist_num)

Output :

'Estimated number of distinct elements:', 8

The output of the code will vary from run to run due to the random nature of the Flajolet Martin Algorithm. The output of this example should be close to 10.

Need of this Algorithm

The Flajolet-Martin algorithm, can be used to determine how many unique elements are there in a database. It is very helpful in situations where the size of the memory is large and it is difficult to process the complete dataset. The following are some of the main uses and benefits of the Flajolet-Martin algorithm:

Disadvantages

Conclusion

Finally, the Flajolet Martin Algorithm is an effective method for calculating the count of number of unique items in huge data sets. This approach works by mapping the data set elements to the binary strings using hash functions, and then make use of the length of the longest run of zeros in the binary strings as an estimation for the number of unique elements. While this process is also known as probabilistic as it will not always offer a precise count, it is an effective method for finding insights of big data sets that would be difficult or impossible to analyse using regular methods.


Article Tags :