1
3.0kviews
Data Mining & Business Intelligence : Question Paper May 2016 - Information Technology (Semester 6) | Mumbai University (MU)
1 Answer
0
97views

Data Mining & Business Intelligence - May 2016

Information Technology (Semester 6)

TOTAL MARKS: 80
TOTAL TIME: 3 HOURS
(1) Question 1 is compulsory.
(2) Attempt any three from the remaining questions.
(3) Assume data if required.
(4) Figures to the right indicate full marks.
1(a) Define 'Data Mining'. Enumerate five example applications that can benefit by using Data Mining.(5 marks) 1(b) What is Data Preprocessing? Explain the different methods for the Data Cleansing phase.(5 marks) 1(c) What is hierarchical clustering? Explain any two techniques for finding distance between the clusters in hierarchical clustering.(5 marks) 1(d) Explain the concept of a decision support system with the help of an example application.(5 marks) 2(a) Partition the given data into 4 bins using Equi-depth binning method and perform smoothing according to the following methods.
Smoothing by bin mean
Smoothing by bin median
Smoothing by bin boundaries.
Data: 11,13,13,15,15,16,19,20,20,20,21,21,22,23,24,30,40,45,45,45,71,72,73,75.
(10 marks)
2(b) For the same set of data points in question 2.(a)
(a) Find Mean, Median and Mode.
(b) Show a boxplot of the data. Clearly indicating the five-number summary.
(10 marks)
3(a) The table below shows a sample dataset of whether a customer responds to a survey or not. 'Outcome' is the class label.
Construct a Decision Tree Classifier for the dataset. For a new example (Rural, semidetached, low, No), what will be the predicated class label?

  District   House Type   Income   Previous Customers   Outcome
  Suburban   Detached   High   No   Nothing
  Suburban   Detached   High   Yes   Nothing
  Suburban   Detached   High   No   Responded
  Urban

  Semi-

  Detached

  High   NO   Responded
  Urban

  Semi-

  Detached

  Low   NO   Responded
  Urban

  Semi-

  Detached

  Low   NO   NOthing
  Rural

  Semi-

  Detached

  Low   Yes   Responded
  Suburban   Terrace   High   NO   Nothing
  Suburban

  Semi-

  Detached

  Low   NO   Responded
  Urban   Terrace   Low   NO   Responded
  Suburban   Terrace   Low   Yes   Responded
  Rural   Terrace   High   Yes   Responded
  Rural   Detached   Low   No   Responded
  Urban   Terrace   High   Yes   Nothing
(10 marks) 3(b) Briefly explain Bagging and Boosting of Classifiers(10 marks) 4(a) Use the Apriori to algorithm to identify the frequent item-sets in the folloeing database. Then extract the strong association rules from these sets.
Min. Support = 30% Min. Confidence=75%
TID Items
01 A, B, D, E, F
02 B, C, E
03 A, B, D, E
04 A, B, C, E
05 A, B, C, D, E, F
06 B, C, D
07 A, B, D, E
(10 marks)
4(b) Explain multidimensional and multi level Association rules with examples.(10 marks) 5(a) use any hierarchical clustering algorithm to cluster the following 8 example into 3 clusters:
A1=(2, 10),     A2=(2, 5),     A3=(8, 4),     A4=(5, 8),
A5=(7, 5),     A6(6, 4),     A7=(1, 2),     A8=(4, 9)
(10 marks)
5(b) What is an outlier? Describe methods that can be used for outlier analysis.(10 marks) 6(a) Consider the following case study: An International chain of hotels wants to analysis and improve its performance using several performance indicators-quality of rooms, service facilities, check in, breakfast , popular time of visits, duration of stay etc.
For this case study design a B1 system, clearly explaining all steps from data collection to decision making.
(10 marks)
6(b) Clearly explain the working of the DB_SCAN algorithm using appropriate diagrams.(10 marks)

Please log in to add an answer.