## Big Data Analytics - December 2016

### MU Information Technology (Semester 8)

Total marks: --

Total time: --
INSTRUCTIONS

(1) Assume appropriate data and state your reasons

(2) Marks are given to the right of every question

(3) Draw neat diagrams wherever necessary
**1(a)** What are the three Vs of Big Data? Give two examples of big data case studies. Indiacte which Vs are satisfied by these case studies.
5 marks

**1(b)** What is the role of a "combiner" in the Map reduce framework? Explain with the help of an example.
5 marks

**1(c)** Through an example illustrate how the triangular array can be usedn to optimally store and count pairs in a frequent itemset mining algorithm.
5 marks

**1(d)** List the different issues and challenges in data stream query processing.
5 marks

**2(a)** What are the different data architecture patterns on NOSQL? Explain "key value" store and "Document" store patterns with relevant examples.
5 marks

**2(b)** Show Map Reduce implementation for the following two tasks using pseudocode.

i) Multiplication of two matrces

ii) Computing Group-by and aggregation of a relational table.
5 marks

**3(a)** Give a formal definition of the Nearest Neighbor problem. Show how finding plagiarism in documents is Nearest Neighbor problem. What similarity measures can be used.
5 marks

**3(b)** Clearly explain the concept of a Bloom Filter with the help of an example.
5 marks

**4(a)** Suppose a data stream consists of the integers 3, 1, 4, 1, 5, 9, 2, 6, 5. Let the hash function being used is h(x) = 3x + 1 mod 5; Show how the Flajolet- Martin Algorithm will estimate the number of distinct element in this stream.
5 marks

**4(b)** Clearly explain how the CURE algorithm can be used to cluster big data sets.
5 marks

**5(a)** Define Collaborative filtering. Using an example of an e-commerce site like Filpkart of Amazon describe how it can be used to provide recommendations to users.
5 marks

**5(b)** Define PageRank. Using the web graph shown below compute the PageRank at every node at the end of the second iteration. Use teleport factor = 0.8

!mage
5 marks

**6(a)** Explain clearly with diagrams how the PCY algorithm helps to perform frequent itemset mining for large datasets.
5 marks

**6(b)** For the graph given below use betweenness factor and find all communities

!mage
5 marks