Published on 2023-01-30
Could be Interesting! Big DataNo, we are not talking about Hash browns. 😗
In this write-up, I would like to pay my appreciation to Hash Function with Flajolet-Martin Algorithm.
Flajolet-Martin Algorithm is used to estimate the number of distinct elements by hashing the elements of the universal set to a bit-string that is sufficiently long. The length of the bit-string must be sufficient that there are more possible results of the hash function than there are elements of the universal set. For example, 64 bits is sufficient to hash URLs.
You might be wondering why estimation? Can't we get the exact number of distinct elements? It is not always possible to get the exact number, especially in the context of real-time data processing and analytics. This is where Flajolet-Martin Algorithm shines.
FM Algorithm is sensitive to the parameters of hash function and thus, the results from FM algorithm can vary significantly depending on the hash function values.
To improve the accuracy of algorithm - Introduce more hash functions and get the lengths of trailing zeros - Means by bucketing: results from (1) are split into buckets and compute means from each bucket - Take Median of means: sort the results from (2) and take the median as approximate value for the count of distinct elements
:D See you again!