Have you ever wondered how fast "username is not available" is returned when creating a new account?

Published on 2023-01-30

Could be Interesting! Big Data

In any system, there are millions and millions of users record. How could a system tells username is not available almost instantly. Enter Bloom Filter. 😎

Concept Checking availability of a username is a set membership problem where the set is a list of all the registered usernames. Both linear search and binary search are viable options, but they come with their own trade-offs: linear search would be painful and binary search would require multiple steps.

Bloom filter is a data structure that can do the job! The price we pay for efficiency is that a Bloom Filter is a probabilistic data structure and therefore, it can only tell us the element either definitely is not in the set or maybe in the set. In other words, Bloom Filter may produce false positives but never false negatives. The base data structure of a Bloom Filter is a Bit Vector (0 or 1).

How it works Bloom Filter uses Hash Function to allocate the value (e.g. username) to the Bit Vector. Then, when it comes to checking availability of a username, we simply use the Hash Function again to see if the space in the Bit Vector is filled. If filled, the username maybe taken because another element or some combination of other elements could have set the same bits.

The probability of false positives can be controlled by adjusting the size of the Bit Vector and the number of hash functions used. The question is how big should we make Bloom Filter? It depends. A large filter will have less false positives and a smaller one, more.

Further Reading Scalable Bloom Filters