Question Paper: Data Mining & Business Intelligence : Question Paper May 2016 - Information Technology (Semester 6) | Mumbai University (MU)
0

## Data Mining & Business Intelligence - May 2016

### Information Technology (Semester 6)

TOTAL MARKS: 80
TOTAL TIME: 3 HOURS
(1) Question 1 is compulsory.
(2) Attempt any three from the remaining questions.
(3) Assume data if required.
(4) Figures to the right indicate full marks.
1(a) Define 'Data Mining'. Enumerate five example applications that can benefit by using Data Mining.(5 marks) 1(b) What is Data Preprocessing? Explain the different methods for the Data Cleansing phase.(5 marks) 1(c) What is hierarchical clustering? Explain any two techniques for finding distance between the clusters in hierarchical clustering.(5 marks) 1(d) Explain the concept of a decision support system with the help of an example application.(5 marks) 2(a) Partition the given data into 4 bins using Equi-depth binning method and perform smoothing according to the following methods.
Smoothing by bin mean
Smoothing by bin median
Smoothing by bin boundaries.
Data: 11,13,13,15,15,16,19,20,20,20,21,21,22,23,24,30,40,45,45,45,71,72,73,75.
(10 marks)
2(b) For the same set of data points in question 2.(a)
(a) Find Mean, Median and Mode.
(b) Show a boxplot of the data. Clearly indicating the five-number summary.
(10 marks)
3(a) The table below shows a sample dataset of whether a customer responds to a survey or not. 'Outcome' is the class label.
Construct a Decision Tree Classifier for the dataset. For a new example (Rural, semidetached, low, No), what will be the predicated class label?

 District House Type Income Previous Customers Outcome Suburban Detached High No Nothing Suburban Detached High Yes Nothing Suburban Detached High No Responded Urban Semi-   Detached High NO Responded Urban Semi-   Detached Low NO Responded Urban Semi-   Detached Low NO NOthing Rural Semi-   Detached Low Yes Responded Suburban Terrace High NO Nothing Suburban Semi-   Detached Low NO Responded Urban Terrace Low NO Responded Suburban Terrace Low Yes Responded Rural Terrace High Yes Responded Rural Detached Low No Responded Urban Terrace High Yes Nothing
(10 marks) 3(b) Briefly explain Bagging and Boosting of Classifiers(10 marks) 4(a) Use the Apriori to algorithm to identify the frequent item-sets in the folloeing database. Then extract the strong association rules from these sets.
Min. Support = 30% Min. Confidence=75%
 TID Items 01 A, B, D, E, F 02 B, C, E 03 A, B, D, E 04 A, B, C, E 05 A, B, C, D, E, F 06 B, C, D 07 A, B, D, E
(10 marks)
4(b) Explain multidimensional and multi level Association rules with examples.(10 marks) 5(a) use any hierarchical clustering algorithm to cluster the following 8 example into 3 clusters:
A1=(2, 10),     A2=(2, 5),     A3=(8, 4),     A4=(5, 8),
A5=(7, 5),     A6(6, 4),     A7=(1, 2),     A8=(4, 9)
(10 marks)
5(b) What is an outlier? Describe methods that can be used for outlier analysis.(10 marks) 6(a) Consider the following case study: An International chain of hotels wants to analysis and improve its performance using several performance indicators-quality of rooms, service facilities, check in, breakfast , popular time of visits, duration of stay etc.
For this case study design a B1 system, clearly explaining all steps from data collection to decision making.
(10 marks)
6(b) Clearly explain the working of the DB_SCAN algorithm using appropriate diagrams.(10 marks)