Data Mining & Business Intelligence : Question Paper Dec 2016 - Information Technology (Semester 6)

40views

written 8.9 years ago by

teamques10 ★ 70k

Data Mining & Business Intelligence - Dec 2016

Information Technology (Semester 6)

TOTAL MARKS: 80
TOTAL TIME: 3 HOURS (1) Question 1 is compulsory.
(2) Attempt any three from the remaining questions.
(3) Assume data if required.
(4) Figures to the right indicate full marks. 1(a) Define "Data Mining". Enumerate five example applications that can benefit by using Data Mining.(5 marks) 1(b) Clearly explain the data preprocessing phase for data mining.(5 marks) 1(c) Describe one hierarchical clustering algorithm using an example dendrogram.(5 marks) 1(d) Explain the concept of a decision support system with the help of an example application.(5 marks) 2(a) Partition the given data into 4 bins using Equi-depth binning method and perform smoothing according to the following methods.
Smoothing by bin mean
Smoothing by bin median
Smoothing by bin boundaries Data: 11, 13, 13, 15, 15, 16, 19, 20, 20, 20, 21, 21, 22, 23, 24, 30, 40, 45, 45, 71, 72, 73, 75(10 marks) 2(b) For the same set of data points in question 2.(a)
a) Find Mean, Medium and Mode.
b) Show a boxplot of the data. Clearly indicating the five-number summary.(10 marks) 3(a) The table below shows a sample dataset of whether a customer reponds to a survey of not. " Outcome" is the class label. Construct a Naive Bayes' Classifier for the dataset. For a new example (Rural, semidetached, low,No), what will be the predicted class label?

District	House Type	Income	Previous Customer	Outcome
Suburban	Detached	High	No	Nothing
Suburban	Detached	High	Yes	Nothing
Rural	Detached	High	No	Reponded
Urban	Semi-detached	High	No	Reponded
Urban	Semi-detached	Low	No	Reponded
Urban	Semi-detached	Low	Yes	Nothing
Rural	Semi-detached	Low	Yes	Reponded
Suburban	Terrace	High	No	Nothing
Suburban	Semi-detached	Low	No	Reponded
Urban	Terrace	Low	No	Reponded
Suburban	Terrace	Low	Yes	Reponded
Rural	Terrace	High	Yes	Reponded
Rural	Detached	Low	No	Reponded
Urban	Terrace	High	Yes	Nothing

(10 marks) 3(b) Briefly explain Regression based Classifiers.(10 marks) 4(a) Using the Apriori algortihm to identify the frequent item-set in the following database. Them extract the strong association rules from these sets. Mini. Support = 30% Min. Confidence =75%

TID	Items
01	A, B, D, E, F
02	B, C, E
04	A, B, D, E
04	A, B, C, E
05	A, B, C, D, E,F
06	B, C, D
07	A, B, D,E

(10 marks) 4(b) Explain multidimensional multi level Association rules with examples.(10 marks) 5(a) What is clustering? Explain k-means clustering algorithm. Suppose the date for clustering is {2, 4, 10, 12, 3, 20 ,11, 25} Consider k=2, cluster the given data using K-means algorithm.(10 marks) 5(b) What is an outlier? Describe methods that can be used for outlier analysis.(10 marks) 6(a) Consider the following case study: A telecom company wants to analyze and improve its performance by introducing a series of innovative mobile payment plants. For this case study design a BI system, clearly explaining all steps from data collection to decision making.(10 marks) 6(b) Clearly explain the working of the DBSCAN algorithm using appropriate diagrams.(10 marks)

ADD COMMENT EDIT