Question Paper: Big Data Analytics Question Paper - Dec 16 - Information Technology (Semester 8) - Mumbai University (MU)

0

## Big Data Analytics - Dec 16

### MU Information Technology (Semester 8)

Total marks: 80

Total time: 3 Hours
INSTRUCTIONS

(1) Assume appropriate data and state your reasons

(2) Marks are given to the right of every question

(3) Draw neat diagrams wherever necessary

**1(a)**What are the three Vs of Big Data? Give two examples of big data case studies. Indiacte which Vs are satisfied by these case studies.

**1(b)**What is the role of a "combiner" in the Map reduce framework? Explain with the help of an example.

**1(c)**Through an example illustrate how the triangular array can be usedn to optimally store and count pairs in a frequent itemset mining algorithm.

**1(d)**List the different issues and challenges in data stream query processing.

**2(a)**What are the different data architecture patterns on NOSQL? Explain "key value" store and "Document" store patterns with relevant examples.

**2(b)**Show Map Reduce implementation for the following two tasks using pseudocode.

i) Multiplication of two matrices

ii) Computing Group-by and aggregation of a relational table.

**3(a)**Give a formal definition of the Nearest Neighbor problem. Show how finding plagiarism in documents is Nearest Neighbor problem. What similarity measures can be used.

**3(b)**Clearly explain the concept of a Bloom Filter with the help of an example.

**4(a)**Suppose a data stream consists of the integers 3, 1, 4, 1, 5, 9, 2, 6, 5. Let the hash function being used is h(x) = 3x + 1 mod 5; Show how the Flajolet- Martin Algorithm will estimate the number of distinct element in this stream.

**4(b)**Clearly explain how the CURE algorithm can be used to cluster big data sets.

**5(a)**Define Collaborative filtering. Using an example of an e-commerce site like Filpkart of Amazon describe how it can be used to provide recommendations to users.

**5(b)**Define PageRank. Using the web graph shown below compute the PageRank at every node at the end of the second iteration. Use teleport factor = 0.8

**6(a)**Explain clearly with diagrams how the PCY algorithm helps to perform frequent itemset mining for large datasets.

**6(b)**For the graph given below use betweenness factor and find all communities