0
Data Mining & Business Intelligence : Question Paper May 2015 - Information Technology (Semester 6) | Mumbai University (MU)

## Data Mining & Business Intelligence - May 2015

### Information Technology (Semester 6)

TOTAL MARKS: 80
TOTAL TIME: 3 HOURS
(1) Question 1 is compulsory.
(2) Attempt any three from the remaining questions.
(3) Assume data if required.
(4) Figures to the right indicate full marks.
1 (a) Describe the different types of attributes one may come across in a data mining data set with two examples of each type.(5 marks) 1 (b) Explain the different distance measures that can be used to compute distance between two clusters.(5 marks) 1 (c) Define "Business Intelligence" and "Support System" with examples.(5 marks) 1 (d) Define "Outlier". What are the different types of Outliers that occur in dataset?(5 marks) 2 (a) Consider the following data points: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
i) What is the mean of the data? What is the median?
ii) What is the mode of the data?
iii) What is the mid-range of the data?
iv) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the data?
v) Show a box plot of the data.
(10 marks)
2 (b) Design a BI system for fraud detection. Describe all the steps from Data collection to Decision Making clearly.(10 marks) 3 (a) Illustrate any one classification technique for the above data set. Show how we can classify a new tuple. With (Homeowner=Yes; status=Employed; Income=Average).

 Id Homeowner Status Income Defaulted 1 Yes Employed High No 2 No Business Average No 3 No Employed Low No 4 Yes Business HIgh No 5 No Unemployed Average Yes 6 No Business Low No 7 Yes Unemployed High No 8 No Employed Average Yes 9 No Business Low No 10 No Employed Average Yes
(10 marks) 3 (b) Why is Data Preprocessing required? Explain the different steps involved in Data Preprocessing.(10 marks) 4 (a) Use K-means to cluster the following data set into 3 clusters.
 Protein 20 21 15 22 20 25 26 20 18 20 Fat 9 9 7 17 8 12 14 9 9 9
(10 marks)
4 (b) Describe the different visualization techniques that can be used in Data Mining.(10 marks) 5 (a) Consider the following transaction database:
 TID Items 01 A,B,C,D 02 A,B,C,D,E,G 03 A,C,G,H,K 04 B,C,D,E,K 05 D,E,F,H,L 06 A,B,C,D,L 07 B,I,E,K,L 08 A,B,D,E,LK 09 A,E,E,H,L 10 B,C,D,F

Apply the Apriori algorithm with minimum support of 30% and minimum confidence of 70% and find all the association rules in the data set.
(10 marks)
5 (b) Explain different methods that can be used evaluate and compare the accuracy of different classification algorithms.(10 marks) 6 (a) DBSCAN clustering algorithm with an example.(10 marks) 6 (b) Multilevel and Multidimensional Association rules.(10 marks)

0