Question: Explain sequence mining in Transactional databases?
0

Subject: Data Mining And Business Intelligence

Topic: Frequent pattern

Difficulty: High

dmbi(26) • 3.3k views
ADD COMMENTlink
modified 21 months ago by gravatar for awari.swati831 awari.swati831250 written 3.3 years ago by gravatar for ASHISH RAVINDRA SALVE ASHISH RAVINDRA SALVE10
0
  • Sequential pattern mining is trying to find relationships between occurrences of sequential events, to find if there exists any specific order of the occurrences.
  • We can find the sequential patterns of specific individual items also we can find the sequential patterns across different items.

Application:

  • Sequential pattern mining is widely used in analyzing of DNA sequence.
  • Sequential pattern can be widely used in different areas, such as mining user access patterns for the web sites, using the history of symptoms to predict certain kind of disease, also by using sequential pattern mining, the retailers can make the inventory control more efficient.

Challenges on Sequential pattern mining:

  • A huge number of possible sequential patterns are hidden in databases.
  • A mining algorithm should find the complete set of patterns, be highly efficient, scalable.

Example:

  • Consider the database shown in fig 1 (This database has been sorted on customer-id and transaction-time) fig 2 shows this database expressed as a set of customer sequences.

  • With minimum support set to 25%, i.e. a minimum support of 2 customers, two sequences :< ( 30) (90)> and < (30) (40 70)> are maximal among those satisfying the support constraint, and are desired sequential patterns.

  • The sequential pattern < (30) (90)> is supported by customers 1 and 4.Customer 4 buys items (40 70) in between items 30 and 90, but supports the pattern < (30) (90)> since we are looking for patterns that are not necessarily contiguous. The sequential pattern <30((40 70)> is supported by customers 2 and 4. Customer 2 buys 60 along with 40 and 70, but supports this pattern since (40 70) is a subset of (40 60 70).

  • An example of a sequence that does not have minimum support is the sequence < (10 20)(30)>, which is only supported by customers 2. The sequences <(30)>,<(40)>.<(70)>,<(90)>,<(30)(40)>,<(30)(70)>and <(40)(70)>, though having minimum support, are not in the answer because they are not maximal.

Customer Id Transaction Time Items Bought
1
1
June 25’93
June 30’93
30
90
2
2
2
June 10’93
June 15’93
June 20’93
10,20
30
40,60,70
3 June 25’93 30,50,70
4
4
4
June 25’93
June 30’93
June 25’93
30
40,70
90
5 June 12’93 90

Fig1: Database Sorted by Customer Id and Transaction Time

Customer Id Customer Sequence
1
2
3
4
5
< (30) (90)>
< (10 20)(30) (40 60 70)>
<(30,50,70)>
<(30)(40,70)(90)>
<(90)>

Fig2: Customer Sequence Version of the Database

Sequential Patterns with support >25%
< (30) (90)>
<(30)(40,70)>

Fig3: The answer set

Scalable Methods for Mining Sequential Patterns:

  • Apriori-based Approaches:
  1. GSP: (A sequential Pattern Mining Algorithm Based on Candidate Generate and Test) It integrates with time constraints and relaxes the definition of transaction; also consider the knowledge of taxonomies.

  2. SPADE: (Sequential Pattern Discovery using Equivalent Class) SPADE is an algorithm proposed to find frequent sequences using efficient lattice search techniques and simple joins.

  • Pattern –Growth –based Approaches:
  1. PrefixSpan :( Mining Sequential Patterns by Prefix Projections) It mainly employs the method of database projection to make the database projection to make the database for next pass much smaller and consequently make the algorithm more speedy.
ADD COMMENTlink
modified 3.3 years ago by gravatar for Ramnath Ramnath3.8k written 3.3 years ago by gravatar for ASHISH RAVINDRA SALVE ASHISH RAVINDRA SALVE10
Please log in to add an answer.