Data Mining & Business Intelligence : Question Paper Dec 2016 - Information Technology (Semester 6) | Mumbai University (MU)
1 Answer

Data Mining & Business Intelligence - Dec 2016

Information Technology (Semester 6)

(1) Question 1 is compulsory.
(2) Attempt any three from the remaining questions.
(3) Assume data if required.
(4) Figures to the right indicate full marks.
1(a) Define "Data Mining". Enumerate five example applications that can benefit by using Data Mining.(5 marks) 1(b) Clearly explain the data preprocessing phase for data mining.(5 marks) 1(c) Describe one hierarchical clustering algorithm using an example dendrogram.(5 marks) 1(d) Explain the concept of a decision support system with the help of an example application.(5 marks) 2(a) Partition the given data into 4 bins using Equi-depth binning method and perform smoothing according to the following methods.
Smoothing by bin mean
Smoothing by bin median
Smoothing by bin boundaries Data: 11, 13, 13, 15, 15, 16, 19, 20, 20, 20, 21, 21, 22, 23, 24, 30, 40, 45, 45, 71, 72, 73, 75
(10 marks)
2(b) For the same set of data points in question 2.(a)
a) Find Mean, Medium and Mode.
b) Show a boxplot of the data. Clearly indicating the five-number summary.
(10 marks)
3(a) The table below shows a sample dataset of whether a customer reponds to a survey of not. " Outcome" is the class label. Construct a Naive Bayes' Classifier for the dataset. For a new example (Rural, semidetached, low,No), what will be the predicted class label?

District House Type Income Previous  Customer Outcome
Suburban Detached High No Nothing
Suburban Detached High Yes Nothing
Rural Detached High No Reponded
Urban Semi-detached High No Reponded
Urban Semi-detached Low No Reponded
Urban Semi-detached Low Yes Nothing
Rural Semi-detached Low Yes Reponded
Suburban Terrace High No Nothing
Suburban Semi-detached Low No Reponded
Urban Terrace Low No Reponded
Suburban Terrace Low Yes Reponded
Rural Terrace High Yes Reponded
Rural Detached Low No Reponded
Urban Terrace High Yes Nothing
(10 marks) 3(b) Briefly explain Regression based Classifiers.(10 marks) 4(a) Using the Apriori algortihm to identify the frequent item-set in the following database. Them extract the strong association rules from these sets. Mini. Support = 30% Min. Confidence =75%
TID Items
01 A, B, D, E, F
02 B, C, E
04 A, B, D, E
04 A, B, C, E
05 A, B, C, D, E,F
06 B, C, D
07 A, B, D,E
(10 marks)
4(b) Explain multidimensional multi level Association rules with examples.(10 marks) 5(a) What is clustering? Explain k-means clustering algorithm. Suppose the date for clustering is {2, 4, 10, 12, 3, 20 ,11, 25} Consider k=2, cluster the given data using K-means algorithm.(10 marks) 5(b) What is an outlier? Describe methods that can be used for outlier analysis.(10 marks) 6(a) Consider the following case study: A telecom company wants to analyze and improve its performance by introducing a series of innovative mobile payment plants. For this case study design a BI system, clearly explaining all steps from data collection to decision making.(10 marks) 6(b) Clearly explain the working of the DBSCAN algorithm using appropriate diagrams.(10 marks)

Please log in to add an answer.