Date: Tue, 5 Jan 1998 09:41:10 -0500 (EST) From: Gregory Piatetsky-Shapiro gps Subject: Discovery of Sequences I have received quite a few responses to Ismail Parsa's request for tools for Sequential Associations. Robert St. Amant stamant@eos.ncsu.edu points to a UMass web page for multi-stream dependency detection. Inge Jonassen inge@ii.uib.no points to tools for mining biosequences (DNA, RNA, and protein sequences) for conserved patterns. Yves Chauvin yves@netid.com describes their software HMMpro (available for download) which uses machine learning for knowledge discovery in molecular biology. Jonathan D. Becher becher@neovista.com describes how Neovista DecisionAR mining engine finds sequential and temporal patterns. -- Date: Wed, 16 Dec 1998 08:41:12 -0500 (EST) From: Robert St. Amant stamant@eos.ncsu.edu Adele Howe, at Colorado State, developed techniques some time ago for detecting dependencies in symbolic traces of plan behavior. Paul Cohen and his students at UMass extended these techniques to do multi-stream dependency detection, incremental dependency detection, and other variations. Paul's Web page, http://www-eksl.cs.umass.edu, has pointers to the relevant papers. The Lisp code was developed under a DARPA grant, and may be available for the asking (contact Tim Oates at oates@cs.umass.edu). -- Date: Wed, 16 Dec 1998 15:19:21 +0100 (MET) From: Inge Jonassen inge@ii.uib.no This may not be exactly what you asked for, but it is closely related. There are many tools available for mining biosequences (DNA, RNA, and protein sequences) for conserved patterns. See http://www.ebi.ac.uk/~brazma/patterns.html (collection of links) and http://www.ii.uib.no/~inge/patterns.html (short introductory text) -- Date: Wed, 16 Dec 1998 10:04:06 -0800 From: Yves Chauvin yves@netid.com You may want to check our web site: http://www.netid.com We use (mostly) HMMs for sequence data analysis applied to biological sequences. Our software, HMMpro, is available for download. HMMpro uses machine learning for knowledge discovery in molecular biology. All the HMM concepts can be obviously extended to other types of sequence analysis. -- Date: Thu, 17 Dec 1998 12:08:37 -0800 From: Jonathan D. Becher becher@neovista.com Subject: Neovista Tools for Sequence Data AnalysisThe NeoVista DecisionAR mining engine does indeed find sequential patterns, described as temporal analysis in the NeoVista literature. In temporal analysis, association rules are computed for items purchased by the same customer in different visits over a period of time. Temporal analysis takes all transactions for a single customer, and considers them to be a single transaction sequence. The data must include a customer ID, and a date and/or time value for each transaction. Associations are computed for pairs of items. Associations are always forward in time. DecisionAR has three different options for temporal analysis:
For temporal analysis by occurrence, DecisionAR measures the percentage of customers who purchase item X and also purchase item Y at least once within a specified period of time. In this form of temporal analysis subsequent purchases of items X and Y by the same customer are disregarded. For example:
Visit1 | Visit2 | Visit3 | Visit4 | Visit5 | Count(XY) | |
Cust1 | X | Y | X | Y | X | 1 |
Cust2 | X | X | Y | Y | X | 1 |
Cust3 | X | Z | Y | X | Y | 1 |
Cust4 | XX | YY | Y | Y | X | 1 |
Temporal analysis by subsequent visits answers a different question. What is the probability that a purchase of item Y will follow the purchase of item X in subsequent visits within a specified time period? For example:
Visit1 | Visit2 | Visit3 | Visit4 | Visit5 | Count(XY) | |
Cust1 | X | Y | X | Y | X | 2 |
Cust2 | X | X | Y | Y | X | 1 |
Cust3 | X | Z | Y | X | Y | 2 |
Cust4 | XX | YY | Y | Y | X | 1 |
In temporal analysis by next visit, the purchase of item Y must occur in the very next visit by the same customer in order to be counted. For example:
Visit1 | Visit2 | Visit3 | Visit4 | Visit5 | Count(XY) | |
Cust1 | X | Y | X | Y | X | 2 |
Cust2 | X | X | Y | Y | X | 1 |
Cust3 | X | Z | Y | X | Y | 1 |
Cust4 | XX | YY | Y | Y | X | 1 |
For more information, contact NeoVista Software at www.neovista.com.
Jonathan D. Becher
VP, Applications and Technology
becher@neovista.com