Data Mining & Business Intelligence : Question Paper May 2016 - Information Technology (Semester 6)

39views

written 7.3 years ago by

Data Mining & Business Intelligence - May 2016

Information Technology (Semester 6)

TOTAL MARKS: 80
TOTAL TIME: 3 HOURS (1) Question 1 is compulsory.
(2) Attempt any three from the remaining questions.
(3) Assume data if required.
(4) Figures to the right indicate full marks. 1(a) Define 'Data Mining'. Enumerate five example applications that can benefit by using Data Mining.(5 marks) 1(b) What is Data Preprocessing? Explain the different methods for the Data Cleansing phase.(5 marks) 1(c) What is hierarchical clustering? Explain any two techniques for finding distance between the clusters in hierarchical clustering.(5 marks) 1(d) Explain the concept of a decision support system with the help of an example application.(5 marks) 2(a) Partition the given data into 4 bins using Equi-depth binning method and perform smoothing according to the following methods.
Smoothing by bin mean
Smoothing by bin median
Smoothing by bin boundaries.
Data: 11,13,13,15,15,16,19,20,20,20,21,21,22,23,24,30,40,45,45,45,71,72,73,75.(10 marks) 2(b) For the same set of data points in question 2.(a)
(a) Find Mean, Median and Mode.
(b) Show a boxplot of the data. Clearly indicating the five-number summary.(10 marks) 3(a) The table below shows a sample dataset of whether a customer responds to a survey or not. 'Outcome' is the class label.
Construct a Decision Tree Classifier for the dataset. For a new example (Rural, semidetached, low, No), what will be the predicated class label?

District	House Type	Income	Previous Customers	Outcome
Suburban	Detached	High	No	Nothing
Suburban	Detached	High	Yes	Nothing
Suburban	Detached	High	No	Responded
Urban	Semi- Detached	High	NO	Responded
Urban	Semi- Detached	Low	NO	Responded
Urban	Semi- Detached	Low	NO	NOthing
Rural	Semi- Detached	Low	Yes	Responded
Suburban	Terrace	High	NO	Nothing
Suburban	Semi- Detached	Low	NO	Responded
Urban	Terrace	Low	NO	Responded
Suburban	Terrace	Low	Yes	Responded
Rural	Terrace	High	Yes	Responded
Rural	Detached	Low	No	Responded
Urban	Terrace	High	Yes	Nothing

(10 marks) 3(b) Briefly explain Bagging and Boosting of Classifiers(10 marks) 4(a) Use the Apriori to algorithm to identify the frequent item-sets in the folloeing database. Then extract the strong association rules from these sets.
Min. Support = 30% Min. Confidence=75%

TID	Items
01	A, B, D, E, F
02	B, C, E
03	A, B, D, E
04	A, B, C, E
05	A, B, C, D, E, F
06	B, C, D
07	A, B, D, E

(10 marks) 4(b) Explain multidimensional and multi level Association rules with examples.(10 marks) 5(a) use any hierarchical clustering algorithm to cluster the following 8 example into 3 clusters:
A1=(2, 10), A2=(2, 5), A3=(8, 4), A4=(5, 8),
A5=(7, 5), A6(6, 4), A7=(1, 2), A8=(4, 9)(10 marks) 5(b) What is an outlier? Describe methods that can be used for outlier analysis.(10 marks) 6(a) Consider the following case study: An International chain of hotels wants to analysis and improve its performance using several performance indicators-quality of rooms, service facilities, check in, breakfast , popular time of visits, duration of stay etc.
For this case study design a B1 system, clearly explaining all steps from data collection to decision making.(10 marks) 6(b) Clearly explain the working of the DB_SCAN algorithm using appropriate diagrams.(10 marks)

ADD COMMENT EDIT