Question Paper: Data Mining & Business Intelligence : Question Paper Dec 2016 - Information Technology (Semester 6) | Mumbai University (MU)
0

## Data Mining & Business Intelligence - Dec 2016

### Information Technology (Semester 6)

TOTAL MARKS: 80
TOTAL TIME: 3 HOURS
(1) Question 1 is compulsory.
(2) Attempt any three from the remaining questions.
(3) Assume data if required.
(4) Figures to the right indicate full marks.
1(a) Define "Data Mining". Enumerate five example applications that can benefit by using Data Mining.(5 marks) 1(b) Clearly explain the data preprocessing phase for data mining.(5 marks) 1(c) Describe one hierarchical clustering algorithm using an example dendrogram.(5 marks) 1(d) Explain the concept of a decision support system with the help of an example application.(5 marks) 2(a) Partition the given data into 4 bins using Equi-depth binning method and perform smoothing according to the following methods.
Smoothing by bin mean
Smoothing by bin median
Smoothing by bin boundaries Data: 11, 13, 13, 15, 15, 16, 19, 20, 20, 20, 21, 21, 22, 23, 24, 30, 40, 45, 45, 71, 72, 73, 75
(10 marks)
2(b) For the same set of data points in question 2.(a)
a) Find Mean, Medium and Mode.
b) Show a boxplot of the data. Clearly indicating the five-number summary.
(10 marks)
3(a) The table below shows a sample dataset of whether a customer reponds to a survey of not. " Outcome" is the class label. Construct a Naive Bayes' Classifier for the dataset. For a new example (Rural, semidetached, low,No), what will be the predicted class label?

 District House Type Income Previous  Customer Outcome Suburban Detached High No Nothing Suburban Detached High Yes Nothing Rural Detached High No Reponded Urban Semi-detached High No Reponded Urban Semi-detached Low No Reponded Urban Semi-detached Low Yes Nothing Rural Semi-detached Low Yes Reponded Suburban Terrace High No Nothing Suburban Semi-detached Low No Reponded Urban Terrace Low No Reponded Suburban Terrace Low Yes Reponded Rural Terrace High Yes Reponded Rural Detached Low No Reponded Urban Terrace High Yes Nothing
(10 marks) 3(b) Briefly explain Regression based Classifiers.(10 marks) 4(a) Using the Apriori algortihm to identify the frequent item-set in the following database. Them extract the strong association rules from these sets. Mini. Support = 30% Min. Confidence =75%
 TID Items 01 A, B, D, E, F 02 B, C, E 04 A, B, D, E 04 A, B, C, E 05 A, B, C, D, E,F 06 B, C, D 07 A, B, D,E
(10 marks)
4(b) Explain multidimensional multi level Association rules with examples.(10 marks) 5(a) What is clustering? Explain k-means clustering algorithm. Suppose the date for clustering is {2, 4, 10, 12, 3, 20 ,11, 25} Consider k=2, cluster the given data using K-means algorithm.(10 marks) 5(b) What is an outlier? Describe methods that can be used for outlier analysis.(10 marks) 6(a) Consider the following case study: A telecom company wants to analyze and improve its performance by introducing a series of innovative mobile payment plants. For this case study design a BI system, clearly explaining all steps from data collection to decision making.(10 marks) 6(b) Clearly explain the working of the DBSCAN algorithm using appropriate diagrams.(10 marks)