Information Technology (Semester 8)
Total marks: 80
Total time: 3 Hours
INSTRUCTIONS
(1) Question 1 is compulsory.
(2) Attempt any three from the remaining questions.
(3) Draw neat diagrams wherever necessary.
1(a) What are the three Vs of Big Data? Give two examples of big data case studies. Indiacte which Vs are satisfied by these case studies.
5 marks
12234
1(b) What is the role of a "combiner" in the Map reduce framework? Explain with the help of an example.
5 marks
12235
1(c) Through an example illustrate how the triangular array can be usedn to optimally store and count pairs in a frequent itemset mining algorithm.
5 marks
12236
1(d) List the different issues and challenges in data stream query processing.
5 marks
6312
2(a) What are the different data architecture patterns on NOSQL? Explain "key value" store and "Document" store patterns with relevant examples.
10 marks
5980
2(b) Show Map Reduce implementation for the following two tasks using pseudocode.
i) Multiplication of two matrices
ii) Computing Group-by and aggregation of a relational table.
10 marks
12238
3(a) Give a formal definition of the Nearest Neighbor problem. Show how finding plagiarism in documents is Nearest Neighbor problem. What similarity measures can be used.
10 marks
12239
3(b) Clearly explain the concept of a Bloom Filter with the help of an example.
10 marks
6313
4(a) Suppose a data stream consists of the integers 3, 1, 4, 1, 5, 9, 2, 6, 5. Let the hash function being used is h(x) = 3x + 1 mod 5; Show how the Flajolet- Martin Algorithm will estimate the number of distinct element in this stream.
10 marks
12240
4(b) Clearly explain how the CURE algorithm can be used to cluster big data sets.
10 marks
6321
5(a) Define Collaborative filtering. Using an example of an e-commerce site like Filpkart of Amazon describe how it can be used to provide recommendations to users.
10 marks
12241
5(b) Define PageRank. Using the web graph shown below compute the PageRank at every node at the end of the second iteration. Use teleport factor = 0.8
10 marks
12242
6(a) Explain clearly with diagrams how the PCY algorithm helps to perform frequent itemset mining for large datasets.
10 marks
6319
6(b) For the graph given below use betweenness factor and find all communities
10 marks
12243