written 8.0 years ago by |
Data Warehouse And Data Mining - May 2013
Computer Engineering (Semester 6)
TOTAL MARKS: 100
TOTAL TIME: 3 HOURS
(1) Question 1 is compulsory.
(2) Attempt any four from the remaining questions.
(3) Assume data wherever required.
(4) Figures to the right indicate full marks.
1 (a) Differences between Data warehouse and Data mart(5 marks)
1 (b) For a Supermarket Chain consider the following dimensions, namely Product, store, time, promotion. The schema contains a central fact table, sales facts with measures unit_sales, dollars_sales and dollar_cost. Design STAR schema example: supermarket.(5 marks)
1 (c) Calculate the maximum number of base fact table records for warehouse with the following values given below:
- Time period: 5 years
- Store: 300 stores reporting daily sales
- Product: 40,000 products in each store (about 4000 sell in each store daily)(5 marks)
1 (d) Illustrate how the supermarket can use clustering methods to improve sales.(5 marks)
Define the following terms by giving examples:-
2 (a) Factless fact tables(5 marks) 2 (b) Snowflake schema(5 marks) 2 (c) Web Structure Mining(5 marks) 2 (d) Concept Hierarchy(5 marks) 3 (a) Apply Agglomerative Hierarchical Clustering and draw single Link and average Link dendrogram for the following distance matrix.
A | B | C | D | E | |
A | 0 | 2 | 6 | 10 | 9 |
B | 2 | 0 | 3 | 9 | 8 |
C | 6 | 3 | 0 | 7 | 5 |
D | 10 | 9 | 7 | 0 | 4 |
E | 9 | 8 | 5 | 4 | 0 |
and two measures (1) Count & (2) Fees
For this example create a OLAP cube and describe the following OLAP operations:
(1) Slice (2) Dice (3) Rollup (4) Drill Down (5) Pivot(10 marks) 4 (b) Consider the following transaction database:
TID | Items |
01 | A,B,C,D |
02 | A,B,C,D,E,G |
03 | A,C,G,H,K |
04 | B,C,D,E,K |
05 | D,E,F,H,L |
06 | A,B,C,D,L |
07 | B,I,E,K,L |
08 | A,B,D,E,K |
09 | A,E,F,H,L |
10 | B,C,D,F |
Apply the Apriori algorithm with minimum support of 30% and minimum confidence of 70%, and find all the association rules in the data set. (10 marks) 5 (a) A simple example from the stock market involving only discrete ranges has Profit as categorically attribute, with values {up, down} and the training data is
AGE | COMPETITION | TYPE | PROFIT |
Old | Yes | Software | Down |
Old | No | Software | Down |
Old | No | Hardware | Down |
Mid | Yes | Software | Down |
Mid | Yes | Hardware | Down |
Mid | No | Hardware | Up |
Mid | No | Software | Up |
New | Yes | Software | Up |
New | No | Hardware | Up |
New | No | Software | Up |
Apply decision tree algorithm and show the generated rules.(10 marks)
5 (b) What is meant by ETL? Explain the ETL process in detail.(10 marks)
6 (a) Define multidimensional and multilevel association mining.(10 marks)
6 (b) Explain role for Meta data in Data Warehouse.(10 marks)
Write detailed notes on:-
7 (a) Data Warehouse Architecture.(10 marks) 7 (b) K-Means Clustering. (10 marks)