Question Paper: Big Data Analytics Question Paper - Dec 16 - Information Technology (Semester 8) - Mumbai University (MU)
Big Data Analytics - Dec 16
MU Information Technology (Semester 8)
Total marks: 80
Total time: 3 Hours
(1) Assume appropriate data and state your reasons
(2) Marks are given to the right of every question
(3) Draw neat diagrams wherever necessary
1(a) What are the three Vs of Big Data? Give two examples of big data case studies. Indiacte which Vs are satisfied by these case studies.
1(b) What is the role of a "combiner" in the Map reduce framework? Explain with the help of an example.
1(c) Through an example illustrate how the triangular array can be usedn to optimally store and count pairs in a frequent itemset mining algorithm.
1(d) List the different issues and challenges in data stream query processing.
2(a) What are the different data architecture patterns on NOSQL? Explain "key value" store and "Document" store patterns with relevant examples.
2(b) Show Map Reduce implementation for the following two tasks using pseudocode. i) Multiplication of two matrices ii) Computing Group-by and aggregation of a relational table.
3(a) Give a formal definition of the Nearest Neighbor problem. Show how finding plagiarism in documents is Nearest Neighbor problem. What similarity measures can be used.
3(b) Clearly explain the concept of a Bloom Filter with the help of an example.
4(a) Suppose a data stream consists of the integers 3, 1, 4, 1, 5, 9, 2, 6, 5. Let the hash function being used is h(x) = 3x + 1 mod 5; Show how the Flajolet- Martin Algorithm will estimate the number of distinct element in this stream.
4(b) Clearly explain how the CURE algorithm can be used to cluster big data sets.
5(a) Define Collaborative filtering. Using an example of an e-commerce site like Filpkart of Amazon describe how it can be used to provide recommendations to users.
5(b) Define PageRank. Using the web graph shown below compute the PageRank at every node at the end of the second iteration. Use teleport factor = 0.8
6(a) Explain clearly with diagrams how the PCY algorithm helps to perform frequent itemset mining for large datasets.
6(b) For the graph given below use betweenness factor and find all communities