Question Paper: Data Warehouse & Mining : Question Paper May 2013 - Computer Engineering (Semester 8) | Mumbai University (MU)
0

Data Warehouse & Mining - May 2013

Computer Engineering (Semester 8)

TOTAL MARKS: 100
TOTAL TIME: 3 HOURS
(1) Question 1 is compulsory.
(2) Attempt any four from the remaining questions.
(3) Assume data wherever required.
(4) Figures to the right indicate full marks.
1 (a) Differences between Data warehouse and Data mart(5 marks) 1 (b) For a Supermarket Chain consider the following dimensions, namely Product, store, time, promotion. The schema contains a central fact table, sales facts with measures unit_sales, dollars_sales and dollar_cost. Design STAR schema example: supermarket.(5 marks) 1 (c) Calculate the maximum number of base fact table records for warehouse with the following values given below:
- Time period: 5 years
- Store: 300 stores reporting daily sales
- Product: 40,000 products in each store (about 4000 sell in each store daily)
(5 marks)
1 (d) Illustrate how the supermarket can use clustering methods to improve sales.(5 marks)


Define the following terms by giving examples:-

2 (a) Factless fact tables(5 marks) 2 (b) Snowflake schema(5 marks) 2 (c) Web Structure Mining(5 marks) 2 (d) Concept Hierarchy(5 marks) 3 (a) Apply Agglomerative Hierarchical Clustering and draw single Link and average Link dendrogram for the following distance matrix.

  A B C D E
A 0 2 6 10 9
B 2 0 3 9 8
C 6 3 0 7 5
D 10 9 7 0 4
E 9 8 5 4 0
(10 marks) 3 (b) Explain Page Rank technique with algorithm.(10 marks) 4 (a) Consider a data warehouse for a hospital, where there are three dimensions (1) Doctor (2) Patient (3) Time
and two measures (1) Count & (2) Fees
For this example create a OLAP cube and describe the following OLAP operations:
(1) Slice (2) Dice (3) Rollup (4) Drill Down (5) Pivot
(10 marks)
4 (b) Consider the following transaction database:
TID Items
01 A,B,C,D
02 A,B,C,D,E,G
03 A,C,G,H,K
04 B,C,D,E,K
05 D,E,F,H,L
06 A,B,C,D,L
07 B,I,E,K,L
08 A,B,D,E,K
09 A,E,F,H,L
10 B,C,D,F

Apply the Apriori algorithm with minimum support of 30% and minimum confidence of 70%, and find all the association rules in the data set.
(10 marks)
5 (a) A simple example from the stock market involving only discrete ranges has Profit as categorically attribute, with values {up, down} and the training data is

AGE COMPETITION TYPE PROFIT
Old Yes Software Down
Old No Software Down
Old No Hardware Down
Mid Yes Software Down
Mid Yes Hardware Down
Mid No Hardware Up
Mid No Software Up
New Yes Software Up
New No Hardware Up
New No Software Up


Apply decision tree algorithm and show the generated rules.(10 marks) 5 (b) What is meant by ETL? Explain the ETL process in detail.(10 marks) 6 (a) Define multidimensional and multilevel association mining.(10 marks) 6 (b) Explain role for Meta data in Data Warehouse.(10 marks)


Write detailed notes on:-

7 (a) Data Warehouse Architecture.(10 marks) 7 (b) K-Means Clustering. (10 marks)

Please log in to add an answer.