Computer Engineering (Semester 7)
Total marks: 80
Total time: 3 Hours
(1) Question 1 is compulsory.
(2) Attempt any three from the remaining questions.
(3) Draw neat diagrams wherever necessary.
Explain Blooms filter for stream data mining.
Find the jaccard distance and cosine distance between the following pairs of set: X=(0,1,2,4,5,3) and Y=(5,6,7,9,10,8)
Explain the steps of the HITS algorithm.
Explain "Shuffle & Sort" phase and "Reducer Phase" in Map Reduce.
Write a Map reduce pseudo code to multiply two matrices. Illustrate with an example showing all the steps.
Explain Hadoop Ecosystem with core components. Explain its physical architecture. State the limitations of Hadoop.
Suppose a data stream consists of the integers 1,3,2,1,2,3,4,3,1,2,3,1. Let the Hash function being used is h(x) = (6x+1) mod 5; estimate the number of distinct in this stream using Flajolet- Martin algorithm
Distinguish the following: a) PCY, Multistage
Document data store and Column family data store
Give two applications for counting the number of 1's in a long stream of binary values. Using a stream of binary digits, Illustrate how DGIM will find the number of 1's
For the given graph show how clique percolation method will find cliques.
Consider the web graph given below six pages(A,B,C,D,E,F) with directed links as follows.
Assume that the PageRank values for any page m at iteration 0 is PR(m)=1 and teleportation factor for iterations is $\beta$=0.85.Perform the page rank algorithm and determine the rank for every page at iteration 2.
Explain clearly how the SON partition based algorithm helps to perform frequent item set mining for large data sets. How does this algorithm avoid false negatives?
Explain collaborative filtering system. How is it different from content based system?
Clearly explain how CURE algorithm can be used to cluster big data sets.