0
2.6kviews
Explain Flajolet-Martin algorithm to count the distinct elements in a stream.

Explain Flajolet-Martin algorithm to count the distinct elements in a stream.

0
293views

Flajolet-Martin :-

• Flajolet-Martin algorithm approximates the number of unique objects in a stream or a database in one pass.

• If the stream contains n elements with m of them unique, this algorithm runs in O(n) time and needs O(log(m)) memory. So the real innovation here is the memory usage, in that an exact, brute-force algorithm would need O(m) memory.

• It gives an approximation for the number of unique objects, along with a standard deviation σ, which can then be used to determine bounds on the approximation with a desired maximum error σ, if needed.

The Flajolet-Martin algorithm :-

• Create a bit vector (bit array) of sufficient length L, such that $2^L$ > n, the number of elements in the stream. Usually a 64-bit vector is sufficient since $2^{64}$ is quite large for most purposes.

• The i-th bit in this vector/array represents whether we have seen a hash function value whose binary representation ends in $0^i$ . So initialize each bit to 0.

• Generate a good, random hash function that maps input (usually strings) to natural numbers.

• Read input. For each word, hash it and determine the number of trailing zeros. If the number of trailing zeros is k, set the k-th bit in the bit vector to 1.

• Once input is exhausted, get the index of the first 0 in the bit array (call this R). By the way, this is just the number of consecutive 1s plus one.

• Calculate the number of unique words as $2^R$/Φ, where Φ is 0.77351.