0
3.4kviews
Suppose a data stream consists of the integers 1,3,2,1,2,3,4,3,1,2,3,1. Let the Hash function being used is h(x) = (6x+1) mod 5;

estimate the number of distinct in this stream using Flajolet- Martin algorithm

2 Answers
1
87views
  • PCY algorithm exploits the observation that there may be much unused space in main memory on the first pass of apraisy.

  • In first pass only a bash function is applied on pair of item so that they bashes to a bucket.

  • we hash each pair and add 1 to the bucket.

  • At the end of 1st pass each bucket has count.

  • If count of bucket is > = support then the bucket is frequent bucket.

  • we can define a candidate pair C2 to be those pair { i , j } such that

1) i and j are frequent items.

2) { i , j } hashes to a frequent bucket.

enter image description here

  • Multistage uses several successive hash tables to reduce further the number of candidate pairs.

  • But requires more passes than normal PCY.

enter image description here

  • Document data store are used for storing, retrieving and managing document oriented information.

  • Hierarchical data structures can be directly stored in document database.

  • Document data store uses a key structure.

  • Document path is used like a key to success the leaf values of a document tree structure.

- For eg. Employee [id = '2300'] / Address / street / text ( )

- eg of document store: Mongo DB, Couch DB, Couch Base.

  • In column based store, data is stoned into columns and these columns are logically grouped into column families.

  • splitting the data column size helps in faster retrieval of data

  • example : Big Table, H Base, Hyper Table.

Please log in to add an answer.