Math is beautiful. Prime numbers are unique and Hashing is very popular in Computer Science.

Published on 2023-01-30

Could be Interesting! Big Data

No, we are not talking about Hash browns. 😗

In this write-up, I would like to pay my appreciation to Hash Function with Flajolet-Martin Algorithm.

Flajolet-Martin Algorithm

Flajolet-Martin Algorithm is used to estimate the number of distinct elements by hashing the elements of the universal set to a bit-string that is sufficiently long. The length of the bit-string must be sufficient that there are more possible results of the hash function than there are elements of the universal set. For example, 64 bits is sufficient to hash URLs.

You might be wondering why estimation? Can't we get the exact number of distinct elements? It is not always possible to get the exact number, especially in the context of real-time data processing and analytics. This is where Flajolet-Martin Algorithm shines.

Step wise solution of FM algorithm

  1. Select a Hash function h so that each element in the set is mapped to a string to at least log n bits
  2. For each element x, r(x) = length of trailing zeroes in h(x)
  3. R = max(r(x))
  4. Distinct elements = 2R

FM Algorithm is sensitive to the parameters of hash function and thus, the results from FM algorithm can vary significantly depending on the hash function values.

To improve the accuracy of algorithm - Introduce more hash functions and get the lengths of trailing zeros - Means by bucketing: results from (1) are split into buckets and compute means from each bucket - Take Median of means: sort the results from (2) and take the median as approximate value for the count of distinct elements

:D See you again!