Question Paper: Data Warehouse & Mining : Question Paper May 2013 - Computer Engineering (Semester 8) | Mumbai University (MU)
0

## Data Warehouse & Mining - May 2013

### Computer Engineering (Semester 8)

TOTAL MARKS: 100
TOTAL TIME: 3 HOURS
(1) Question 1 is compulsory.
(2) Attempt any four from the remaining questions.
(3) Assume data wherever required.
(4) Figures to the right indicate full marks.
1 (a) Differences between Data warehouse and Data mart(5 marks) 1 (b) For a Supermarket Chain consider the following dimensions, namely Product, store, time, promotion. The schema contains a central fact table, sales facts with measures unit_sales, dollars_sales and dollar_cost. Design STAR schema example: supermarket.(5 marks) 1 (c) Calculate the maximum number of base fact table records for warehouse with the following values given below:
- Time period: 5 years
- Store: 300 stores reporting daily sales
- Product: 40,000 products in each store (about 4000 sell in each store daily)
(5 marks)
1 (d) Illustrate how the supermarket can use clustering methods to improve sales.(5 marks)

### Define the following terms by giving examples:-

2 (a) Factless fact tables(5 marks) 2 (b) Snowflake schema(5 marks) 2 (c) Web Structure Mining(5 marks) 2 (d) Concept Hierarchy(5 marks) 3 (a) Apply Agglomerative Hierarchical Clustering and draw single Link and average Link dendrogram for the following distance matrix.

 A B C D E A 0 2 6 10 9 B 2 0 3 9 8 C 6 3 0 7 5 D 10 9 7 0 4 E 9 8 5 4 0
(10 marks) 3 (b) Explain Page Rank technique with algorithm.(10 marks) 4 (a) Consider a data warehouse for a hospital, where there are three dimensions (1) Doctor (2) Patient (3) Time
and two measures (1) Count & (2) Fees
For this example create a OLAP cube and describe the following OLAP operations:
(1) Slice (2) Dice (3) Rollup (4) Drill Down (5) Pivot
(10 marks)
4 (b) Consider the following transaction database:
 TID Items 01 A,B,C,D 02 A,B,C,D,E,G 03 A,C,G,H,K 04 B,C,D,E,K 05 D,E,F,H,L 06 A,B,C,D,L 07 B,I,E,K,L 08 A,B,D,E,K 09 A,E,F,H,L 10 B,C,D,F

Apply the Apriori algorithm with minimum support of 30% and minimum confidence of 70%, and find all the association rules in the data set.
(10 marks)
5 (a) A simple example from the stock market involving only discrete ranges has Profit as categorically attribute, with values {up, down} and the training data is

 AGE COMPETITION TYPE PROFIT Old Yes Software Down Old No Software Down Old No Hardware Down Mid Yes Software Down Mid Yes Hardware Down Mid No Hardware Up Mid No Software Up New Yes Software Up New No Hardware Up New No Software Up

Apply decision tree algorithm and show the generated rules.(10 marks) 5 (b) What is meant by ETL? Explain the ETL process in detail.(10 marks) 6 (a) Define multidimensional and multilevel association mining.(10 marks) 6 (b) Explain role for Meta data in Data Warehouse.(10 marks)

### Write detailed notes on:-

7 (a) Data Warehouse Architecture.(10 marks) 7 (b) K-Means Clustering. (10 marks)