The Exhibit session will be held in the Astor Ballroom, 7th Floor. The schedule is as follows:
Contact: Hany Azmy 1450 Palisade Ave. #M1D Fort Lee, NJ 07024 Tel: 201-947-1881 Fax: 201-947-1804 Email: mail@azmy.com Web: www.azmy.comAZMY Thinkware is exhibiting SuperQuery 2.0 Discovery Edition, a data analysis and mining tool that runs under Windows 95 and NT. Using rule induction technology, SuperQuery searches data tables and reports all interesting patterns and exceptions. The Fact Discovery Engine is easily tuned to meet various analysis needs. SuperQuery is the only tool that allows "remining" of the discovered facts.
SuperQuery also assists in preparing data for analysis by providing a number of facilities for partitioning, classifying and processing data columns. In addition, SuperQuery helps users explore and analyze their data by automatically displaying graphs and calculating statistics. It contains a number of Wizards that help read, update, and analyze data effortlessly. SuperQuery can access and query a number of databases, spreadsheets, text files directly, and through ODBC drivers. A 16-bit version of SuperQuery is also available for Windows 3.x.
SuperQuery is used in many applications including quality control, survey analysis, medical studies, and defense. Bring a sample of your data to our booth and we will show you what SuperQuery can discover for you.
Contact: Miranda Noy 2 Chalfont Square, Old Foundry Road Ipswich, Suffolk IP4 2AJ UK Tel: 44-1473-267103 Fax: 44-1473-267104 Email: mnoy@gentia.comFor Knowledge Discovery (KD) software to succeed, the enthusiasm of the innovator and early adopter customers must be translated into practical benefits for the majority. This demands that KD software be made easy to use. Users familiar with Web browsers must be able to navigate and manipulate KD projects with ease. The systems must run from the Web, run on mainstream client server hardware and address very large data volumes.
K.wiz from Compression Sciences delivers a new KD solution. Combining ease of use with Scalability, this advanced client server framework encompasses all nine stages of the KD process. K.wiz components within this frame deliver data transformation, visualization, and discovery algorithms to the desk and Web top. External Components extend the already powerful range of functionality and ensure each organization's unique demands are leveraged by K.wiz.
Designed for ease of use, K.wiz provides wizards for the novice user and expert mode for the experienced knowledge worker. A full range of Automation, Scheduling Agents and API's empower the application developer to capture K.wiz plans and embed them in custom applications.
Compression Sciences are showcasing K.wiz at KDD98. Due for launch this Fall, information and demonstrations are available at the Compression Sciences booth.
Contact: John Sammis PO Box 4555 Ithaca, NY 14852 tel: 607-257-1000 fax: 607-257-4146 email: jsammis@datadesk.com web: www.datadesk.comExhibiting Data Desk, a data visualization, exploration and analysis software program.
Contact: Michael Gilman 1500 Hampstead Turnpike East Meadow, NY 11554 Tel: 516-542-8900 Email: info@data-mine.com Web: www.data-mine.comData Mining Technologies Inc. has developed a unique new data mining technology and embedded it in a data mining toolkit called Nuggets.
Nuggets uses state of the art, proprietary computer algorithms to search databases for patterns in the form of rules. It differs from other rule induction methods in that it is not statistically based and therefore does not require any statistical assumptions. It handles missing and noisy data. Other tree building methods build rules by looking at one attribute at a time. This means that complex non-linear interactions among variables that are the essence of the power of data mining might be overlooked. Nuggets however is a true rule induction system which searches for simultaneous attribute interactions. This insures that all rules are implicitly searched, thereby releasing the full power of data mining.
Nuggets predicts, classfies, segments and validates and is easy to use. Ask about the many unique features to simplify your data mining needs.
30 Freedom Business Center, Suite 314 King of Prussia, PA 19406 Tel: 610 768 7725 Fax: 610 768 7774 Web: www.isl.co.ukISL Decision Systems Inc. is a leading data mining software company. It is part of the ISL Group which also has affiliates in the U.K. and Singapore and a network of distributors serving countries around the world. Launched in 1994, the Group's award winning Clementine product was the first enterprise -strength data mining system to be aimed at business users. There are now over 550 clients worldwide.
Clementine is consistently acknowledged by users and analysts as the leading interactive data mining system and the ISL Group has led the way in discovering new applications for data mining including fraud prevention, pharmaceutical research, consumer buying patterns, financial risk assessment, point of sale data analysis and customer profiling and production process analysis.
Current developments include a collaboration with NCR and Daimler Benz on an open standard methodology for data mining, new middleware to provide truly scalable data mining performance on multiple database platforms and mining of the web, on the web.
Contact: Raphaelle Thomas Chemin du Moulon 91190 Gif sur Yvette France Email: rthomas@isoft.fr Web: www.isoft.frALICE D'ISOFT is a powerful desktop Data Mining tool designed for mainstream business users. Based on Decision Tree technology, its user-friendly interface and visual Data Mining approach makes it ideal for Marketing Managers, Commercial Directors, Financial Directors and all decision makers who need to make strategic decisions.
ALICE D'ISOFT accesses company data bases directly, segments and classifies the data and allows decision makers to test their hypotheses. That means answers to questions such as: Who is the target audience for my product? Which type of clients represents a credit risk? etc.
Applications areas: population analysis, risk evaluation, data classification, forecasting, quality control.
Contact: Sergei Ananyan 1518 E. Fairwood Drive Bloomington, IN 47408 Tel: 812-339-1646 Fax: 812-339-1646 Email: megaputers@aol.com Web: www.megaputer.ruPolyAnalyst - unique data mining solution for Windows NT and 95.
PolyAnalyst is a complete multi-strategy data mining environment based on the latest achievements in the field of automated knowledge discovery in databases. PolyAnalyst presents the discovered relations in explicit symbolic form. A large selection of exploration engines allows the user to predict values of continuous variables, model complex phenomena, determine the most influential independent variables, and solve classification and clustering tasks. An object-oriented design, point-and-click GUI, versatile visualization and reporting capabilities, minimum of statistics, and a simple interface with data storage architectures make PolyAnalyst a very easy-to-use system. PolyAnalyst for Windows NT has a solid record of successful applications in marketing, banking, finance, insurance, retailing, and pharmaceuticals. The system is also available in the Client/Server architecture, while a simplified system, PolyAnalyst Lite, works under Windows 95. A free evaluation copy of the software is available at http://www.megaputer.ru
"Unlike neural network programs, PolyAnalyst displays a symbolic representation of the relationship between the independent and dependent variables. This is a critical advantage for business applications, because managers are reluctant to use a model if they don't understand how it works," says Raymond Burke, Kelley Chair of BA at IU.
Contact: Kerry Martin Salford Systems 8880 Rio San Diego Drive, Suite 1045 San Diego, CA 92108 Tel: 619-543-8880 Fax: 619-543-8888 Email: info@salford-systems.com Web: www.salford-systems.comCART is a robust and scalable decision-tree tool for data mining, predictive modeling and data preprocessing. CART automatically discovers cause-and-effect relationships, isolates significant patterns and forecasts trends. The software's advanced functionality and new resampling technology, deployed via a highly intuitive graphical user interface, generates accurate and reliable predictive trees that graphically depict which factors drive the results and how.
As an affordable, stand-alone application, CART's unique combination of automated solutions-including adjustable misclassification penalties, embedded self-validation procedures, committees of experts, and missing-value surrogates-empowers business users and data analysts to effectively tackle real-world modeling problems. And, when used as powerful supplemental analysis, CART improves the performance of other data-mining techniques (e.g., neural networks and logistic regression).
Worldwide, CART has more than 1,000 users found in nearly all industry segments, including marketing, financial services, insurance, retail, healthcare, pharmaceutical, manufacturing, telecommunications, energy, agricultural, and education. In these data-intensive industries, CART is especially efficient harvesting a high return on companies' investments in large, complex data warehouses. CART applications span market research segmentation, direct marketing, fraud detection, credit scoring, risk management, biomedical research and manufacturing quality control. Users include AT&T Universal Card Services, Cabela's, Fleet Financial Group, Pfizer Inc., and Sears, Roebuck and Co.
Contact: Rich Rovner SAS Campus Drive Cary, NC 27513 Tel: 919-677-8000 Fax: 919-677-4444 Web: www.sas.comSAS Institute Inc., the world's largest privately held software company and a leader in data mining software, is presenting Enterprise Miner, a complete process-driven solution for small- and large-scale data mining applications.
Enterprise Miner builds and extends 22 years of proven analytic software into a complete, GUI-based solution integrating traditional statistics, computational methods, and artificial intelligence.
Enterprise Miner provides regression, neural networks, decision trees, clustering, associations, sequences, visualization, transformation, outlier handling, assessment methods, automatic scoring, model manager, and a modeling API. An interactive process flow interface presents this and more, allowing statisticians and analysts to develop, assess, and share solutions.
Enterprise Miner can be deployed in a client/server environment for fully scalable processing, and output is Web enabled, delivering results organization-wide. Yphise, the respected firm of industry anaysts specializing in software evaluations, awarded Enterprise Miner top marks in its survey of data mining solutions.
In addition to software, SAS Institute offers training and consulting services to provide numerous paths to data mining expertise. Courses are available on data mining techniques and Enterprise Miner usage. SAS consultants can work on short- or long-term data mining projects, delivering solutions and knowledge transfer.
Contact: Peter van der Putten Baarsjesweg 224 1058 AA Amsterdam The Netherlands Tel: 31-20-6186927 Fax: 31-20-6124504 Email: info@smr.nl Web: www.smr.nlSentient Machine Research, a Dutch R&D company founded in 1990, develops software and technology in image processing, multimedia information retrieval and data mining. At KDD98 will present the new 2.0 release of its successful DataDetective datamining environment. DataDetective is build around an efficient fuzzy search and match engine and covers the full range of data mining functionalities: predictive modeling, profiling, clustering and segmentation. (The model generated by DataDetective's modeling assistant 'Targas' ended at a respectable fifth place in last years KDD-Cup) Most distinctive within the DataDetective environment is the graphical clustering tool 'Looking Glass', based on proprietary animated clustering algorithms, which allows for a uniquely interactive style of visual data mining.
Contact: Aydin Senkut 2051 N. Shoreline Blvd. Mail Stop 08L-855 Mountain View, CA 94043MineSet 2.5 (TM)
MineSet[tm] is Silicon Graphics' industry leading integrated data mining product. Mineset offers a unique combination of scalable performance, intuitive user interface, unparalleled visualization features and sophisticated analytics, geared towards both technical and business users.
The Meta Group ranked MineSet third (behind statistical packages) in data mining market share in its January 1998 industry report on Data Warehouse Marketing Trends and Opportunities. MineSet also won the Bronze Miner Award and ranked highest among commercial data mining vendor products in last August's Knowledge Discovery and Data Mining (KDD) competition.
MineSet 2.5 provides the user with a revolutionary paradigm for knowledge discovery by offering parallelized data mining algorithms for faster performance as well as new analytical tools, such as regression, clustering, and decision tables for more intuitive comprehension of data. Combining powerful integrated, interactive tools for data access and transformation, analytical data mining, and visual data mining, MineSet will maximize the value of your data.
Contact: Jim Hayden 4350 Fair Lakes Court Fairfax, VA 22033 Tel: 703-503-1856 Fax: 703-803-1509 Email: jim_hayden@sra.com Web: www.sra.comSRA International, Inc. offers a complete line of fully scaleable data mining tools and professional services, empowering organizations with the ability to discover and detect patterns critical to their success.
SRA's KDD Explorer toolset includes multi-strategy algorithms for discovering Associations, Classifications, Sequences, and Clusters, as well as high-speed rule and sequence-based pattern matching algorithms. These algorithms access relational databases directly for mining data, using parallel computing methods to exploit powerful multiprocessor platforms and rapidly analyze extremely large data sets. Our user interfaces are JDBC-compliant and Java-based, communicating to the RDBMS across distributed networks. KDD Explorer offers serious data mining professionals an integrated workflow-driven environment for configuration and execution of algorithms as well as visualization or results for analysis and interpretation.
SRA's knowledge discovery specialists understand how best to apply these advanced capabilities to enable you to utilize your most strategic asset: electronic information. Together, SRA's KDD Explorer toolset and professional services provide solutions giving you flexibility and power to apply to business areas such as fraud detection and prevention, cost understanding, competitive intelligence, and trend analysis.
SRA International has been creating innovative solutions to practical problems faced by businesses and government agencies for twenty years. We specialize in the fields of:
Contact: Dee Dobbs 10400 N. Tantau Ave. #248-49 Cupertino, CA 95014 Tel: 408-285-3280 Fax: 408-285-3255 Email: dee.dobbs@tandem.com Web: www.tektonic.comProduct: InfoCharger Engine
The InfoCharger's speed enables OLAP or Data Mining tools to work on large volumes of data. InfoCharger is a software component for interactive sessions on inexpensive hardware. The InfoCharger allows users to process detailed data instead of working on condensed subtotals, enabling them to discover meaningful patterns. This kind of data analysis is critical in order to make better decisions or exploit new business opportunities, including:
Contact: Charles Berger 16 New England Executive Park Burlington, MA 01803 Tel: 781-238-3418 Fax: 781-238-3440 Email: cberger@think.com Web: www.think.comDarwin is an easy-to-use, Windows client/UNIX server, scalable, multi-algorithmic data mining software suite designed to build predictive models from large customer databases. Darwin supports prediction and classification modeling via neural networks, classification and regression tress (C&RT), and k-nearest neighbor algorithms.
As an open solution, Darwin can run on some of the world's fastest and most powerful computing platforms running Sun Solaris and HP-UX including parallel processing, and symmetric multiprocessors (SMP) configurations. With Darwin, financial services, telecommunications, database marketing companies and other large corporations can uncover vital information that was previously undetected because of the sheer size and complexity of their database.
Darwin also offers a scripting tool that records data mining steps to re-run and automate the data mining process and a workflow feature that graphically documents the data mining steps and provides information about each of the steps taken.
An optional feature of Darwin is the ability to generate predictive models in C, C++ or Java code for deployment outside of the Darwin environment. These deployable models can be easily integrated into existing systems and procedures, so any new business information can be made available where and when it's most needed-for example, in call centers and web-based applications.
Contact: Steve Waterhouse 2560 Bancroft Way #213 Berkeley, CA 94704 Tel: 510-548-8978 Fax: 510-845-2292 Email: stevew@ultimode.com Web: www.ultimode.comACPro is an data mining tool for automatic segmentation of databases. Segmentation (or clustering) is the discovery of similar groups of records in databases. ACPro is based on the successful NASA AutoClass research program, and was developed in collaboration with the AutoClass team using a NASA commercialization award.
Unlike other segmentation tools, ACPro discovers the optimal number of segments without requiring user specification. It also is an order of magnitude faster than its predecessor, AutoClass. ACPro handles missing data in a coherent manner. It also assigns relevance to the attributes in each segment which aids understanding.
ACPro has been applied in a number of commercial and scientific settings. It has been used for analysis of telecommunications churn data, market segmentation, visualization of geological data and is currently being used at NASA for spectral analysis of rock samples.
ACPro is available with either a command line or platform independent GUI interface for Windows NT/95 and most Unix platforms.
Contact: Scott Sassone Lincoln North Lincoln, MA 01773-1125 Tel: 781-259-5900 Fax: 781-259-5901 Email: ssassone@unica-usa.com Web: www.unica-usa.comUnica Technologies, Inc. is the leading provider of data mining and predictive modeling software and services for database marketing and customer relationship optimization. MODEL 1, our award-winning product line for marketing applications, includes templates for response modeling, cross selling, customer segmentation, and customer valuation. PRW (Pattern Recognition Workbench) our general purpose data mining tool offers twelve algorithms, six methods of intelligent automation & optimization, and three methods of validation.
Both PRW and Model 1 are scalable, from the desktop up to NT & Unix SMP servers. These fully integrated applications offer everything needed for data access, pre-processing, modeling, results interpretation and deployment. API's are also available for customized applications. Unica wrote the book on data mining. Our textbook "Solving Data Mining Problems" is published by Prentice-Hall.
Unica focuses on providing business solutions, not just software. Our Integrated Customer Management Program ensures that Unica's education, training and consulting services are tailored to meet your needs, providing a demonstratable ROI. Put our worldwide experience to work for you today!
Contact: Mark Yuhn 200 Renaissance Center, Suite 1900 Detroit, MI 48243 Tel: 313-259-9900 Fax: 313-259-1362 Email: mcyuhn@urbanscience.com Web: www.urbanscience.com
GainSmarts utilizes sophisticated predictive modeling technology that can analyze past purchase behavior, demographic and lifestyle characteristics, promotion and risk information to predict the likelihood of response as well as to develop an understanding of consumer characteristics.
Economic analysis is then applied to these results, allowing the optimum expenditure of marketing resources to achieve your prospecting, retention or cross-selling objectives. GainSmarts has proven to be extremely effective for businesses worldwide, demonstrating ROI for its users in many application areas.
Additionally, the system received a first place "Gold Miner" award in the KDD-cup 97 competition at the Knowledge Discovery and Data Mining Conference, Newport Beach, CA sponsored by the American Association for Artificial Intelligence (AAAI). The competition involved targeting prospects for a financial services promotion and then comparing predicted vs. actual behavior.
Contact: Abraham Meidan 3 Beit-Hillel St. Tel-Aviv 67017 ISRAEL Email: info@wizsoft.com web: www.wizsoft.comWizSoft exhibits two data mining applications, WizWhy and WizRule.
WizWhy is a knowledge discovery application based on a proprietary association rules algorithm. WizWhy reveals all the if-then rules that relate to the dependent variable, and uses these rules in order to issue predictions, summarize the data and reveal unexpected phenomena. WizWhy avoids overfitting by calculating the error probability of each rule.
WizRule is a data auditing application based on data mining technology. WizRule reveals all the if-then rules and the mathematical formulas that govern the data under analysis, and points at the records that deviate from the set of the discovered rules as cases to be audited. WizRule avoids false alarms by calculating the level of unlikelihood of each deviation.
Both products run on Windows 95 / 98 / NT, read any ODBC compliant database, and have OCX versions that can be embedded in other applications.
Paper Title: BAYDA: Software for Bayesian Classification and Feature Selection
Development Team: Henry Tirri, Petri Kontkanen, Jussi Lahtinen, Petri Myllymäki, Tomi Silander, University of Helsinki
Telephone: +358-9-708-44173
BAYDA is a Java software package for flexible data analysis in predictive data mining tasks. BAYDA performs fully Bayesian predictive inference of class memberships based on a Naive Bayes model build from the data set. It is well-known that the Naive Bayes classifier performs well in predictive data mining tasks, when compared to approaches using more complex models. However, the model makes strong independence assumptions that are frequently violated in practice. For this reason, the BAYDA software also provides a feature selection scheme which can be used for analyzing the problem domain, and for improving the prediction accuracy of the models constructed by BAYDA. The feature selection can be done either manually or automatically. In manual selection the user has an opportunity to use BAYDA for evaluating different feature subsets by leave-one-out cross validation scheme. In the automatic feature selection case the program selects the relevant features by using a novel Bayesian criterion.
The current version features of BAYDA include (1) missing data handling; (2) an external leave-one-out cross validated estimate of the classifier performance in graphical format; (3) "intelligent document" style graphical interface; (4) forward selection/backward elimination feature subset selection; (5) free format data files (such as tab-delimited format of SPSS).
BAYDA is available free of charge for research and teaching purposes from www.cs.Helsinki.FI/research/cosco under section "Software", and it is currently tested on Windows'95/NT, SunOS and Linux platforms. However, being implemented in 100% Java, it should be executable on all platforms supporting Java Runtime Environment 1.1.3 or later.
What is Unique about the System? (1) intelligent, adaptive HTML-document interface; (2) Bayesian criterion for variable subset selection; (3) fully Bayesian prediction based on model parameter averaging.
Development Team: Marco Ramoni and Paola Sebastiani, The Open University
Telephone: 413-577-0338
Bayesian Knowledge Discoverer (BKD) computer program to discover of Bayesian Belief Networks (BBNs) from (possibly incomplete) databases. A BBN is a direct cyclic graph where nodes represent stochastic variables and direct arcs identify dependencies between a set of parent variables and a child variable. Each dependency is then quantified by a conditional probability distribution shaping the behavioral relationships between the set of parent variables and the child variable. In this way, a BBN provides a dependency model of the underlying domain knowledge and a graphical representation of decision problems, grounded on solid foundations of probability theory, able to perform prediction, explanation and classifications. Given a database, BKD is able to extract the graphical structure from data, estimate the conditional probability distributions from data, discretize continuos variables, handle missing data, automatically define network nodes from data. Once generated, the extracted BBN can be used as a self-contained intelligent decision support system able to provide predictions and explanations. A goal-oriented propagation algorithm is included in BKD. A Graphical User Interface capitalizes on the graphic nature of BBN to allow the user to easily navigate the dependencies embedded in the database. BKD is currently distributed in over 1,000 copies world wide.
What is Unique about the System? BKD is the first available program implementing a Bayesian approach to the discovery of BBNs. BKD uses a novel method, called Bound and Collapse, to efficiently handle missing data.
Development Team: Jie Cheng, University of Ulster
Telephone: +44-1232-366500
Belief network is a powerful knowledge representation and reasoning tool under conditions of uncertainty. In DM systems, it can be used for classification, predication and decision support. Because belief network can handle uncertain information in a natural way and the learned knowledge is well structured and can be easily understood it becomes more and more popular in DM research. To use belief networks in DM systems, a crucial step is to learn the belief networks from large training data sets efficiently and accurately.
PowerConstructor is such a belief network learning tool, which includes a user-friendly interface and a construction engine. The system takes a database table as input and constructs the belief network structure as output. The construction engine is based on our three-phase belief network learning algorithm, which takes an information theoretical approach and has the complexity of O(N^4) on conditional independence (CI) test while all other algorithms require exponential number of CI tests. (N is the number of attributes.) We evaluate our system using a widely accepted benchmark data set with 37 attributes and 10,000 records and other data sets. The results show that our system is the most accurate and efficient system available. The system is available for evaluation at our web site (http://mmr.infj.ulst.ac.uk/jcheng/bnpc.htm and http://infosys.susqu.edu/bnpc/) and enjoys over 700 downloads from academic and industrial users. From the encouraging feedback we know that some users have already used it to solve real-world problems.
What is Unique about the System?
Development Team: Jiawei Han, Sonny Chee and Jenny Chiang, Simon Fraser University
Telephone: 604-291-4411 or 604-291-5371
A data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational databases and data warehouses. The system implements a wide spectrum of data mining functions, including characterization, comparison, association, classification, prediction, clustering, data dispersion analysis and time-series analysis. It also builds up a user-friendly, interactive data mining environment and a set of knowledge visualization tools. In-depth research has been performed on the efficiency and scalability of data mining methods. Moreover, the research has been extended to spatial data mining, multimedia data mining, financial mining, and Web mining with several new data mining system prototypes constructed or currently under construction, including GeoMiner, MultiMediaMiner, FinancialMiner, and WebLogMiner. This demo will show the most recent research and development status of the DBMiner system. Hopefully, the system will be available commercially by the time of demo.
What is Unique about the System? On-line (interactive) analytical mining, multiple integrated data mining functions, integration of data mining with OLAP, and knowledge visualization tools.
Paper Title: An Enhanced KDD Process Model and its Visualisation
Development Team: M. Kolher, J. Chattratichat, Y. Guo and S. Hedvall, Imperial College
Telephone: +44-171-594-83-57
The Kensington System provides an enterprise solution for large-scale data mining in environments where data is logically and geographically distributed over multiple databases. Supported by an intuitive Integrated Programming/Visualisation Tool kit, an analyst explores remote databases and visually defines and executes procedures that model the entire KDD process. The system provides high performance components for the most common data mining tasks, such as classification, prediction, clustering, and association. Generated decision models are evaluated and modified using powerful interactive visualisation techniques.
Designed as a 3-tier application based on the Enterprise JavaBeans (EJB) architecture, application servers can be transparently distributed for scalability or replicated for increased availability. Defined KDD procedures and generated decision models are realized as persistent objects, which can easily be reused and shared between group members. Kensington imposes strong security on data transfer and model distribution through secure socket communications. Access control mechanisms protects user/group specific resources from unauthorized access.
For maximum flexibility and easy deployment, client tools are 100% Java compliant applets and runs securely in Web browsers everywhere on the\ Internet. A data analyst is therefore not bound to any specific location or computer.
What is Unique about the System?
Paper Titles:
TextVis: An Integrated Visual Environment for Text Mining
and Text Mining at the Term Level;
Trend Graphs: Visualizing the
Evolution of Concept Relationships in Large Document Collections
Development Team: Ronen Feldman, Yonatan Aumann, David Landau, Moshe Fresko, Orly Lipchtat, Yehuda Lindel, Yaron Ben Yehuda, Yonatan Schelr, Amir Zilberstein and Moshe Martziano, Bar-Ilan University and Instinct Software Ltd.
Telephone: +972-3-5318629
TextVis is a visual data mining system for document collections. Such a collection represents an application domain, and the primary goal of the system is to derive patterns that provide knowledge about this domain. Additionally, the derived patterns can be used to browse the collection. TextVis takes a multi-strategy approach to text mining, and enables defining complex analysis schemas from basic components, provided by the system.
An analysis schema is constructed by dragging functional icons from a tool-pallette onto the workspace and connecting them according to the desired flow of information. The system provides a large collection of basic analysis tools, including: frequent sets, associations, concept distributions, and concept correlations. The discovered patterns are presented in a visual interface allowing the user to operate on the results, and to access the associated documents. TextVis is a complete text mining system which uses agent technology to access various online information sources, text preprocessing tools to extract relevant information from the documents, a variety of data mining algorithms, and a set of visual browsers to view the results.
What is Unique about the System? A Unique collection of tools for Text Mining. A special set of Visual Maps. Easy Customization to match the exact needs of the user. Ability to build very complex Text analysis schemas.
Paper Title: Data Reduction Based on Hyper Relations
Development Team: Hui Wang, University of Ulster
Telephone: +44-1232-368981
Data reduction makes datasets smaller but preserves classification structures of interest. In data mining, data reduction is regarded as a main task of data mining hence any data mining technique can be regarded as a method for data reduction [Usama Fayyad, 1997]. We proposed a general (algebraic) approach to data reduction, which in turn can be used for data mining. A paper describing this approach is submitted to KDD98. We have developed a system (called DR) based on this approach. We want to demonstrate DR with respect to the followings: (1) Data mining can be achieved via direct data reduction. (2) Data and models can be uniformly represented by hyper relations. (3) Datasets can be significantly reduced in size while the classification structures are preserved. (4) Attribute selection and discretization of continuous attributes can be achieved as a by-product of data reduction. (5) Missing values and overfitting can be naturally dealt with in DR. (6) DR can outperform C4.5 in many cases using public datasets
What is Unique about the System? (1) Algebraic and theoretically well founded approach to data mining; (2) Uniform representation of data and models -- hyper relations, a generalization of database relations in the traditional sense -- therefore data mining can be taken to be an operation of database systems; (3) missing values and overfitting are naturally dealt with; (4) attribute selection and continuous attribute discretization can be achieved as a by-product; (5) the model built by DR can be further mined, if needed.
Paper Title: Fast Computation of 2-Dimensional Depth Contours
Development Team: Raymond T. Ng and Ivy Kwok, University of British Columbia; Ted Johnson, AT&T Research Center
Telephone: 604-822-2394
"One person's noise is another person's signal." For many applications, including the detection of credit card frauds and the monitoring of criminal activities in electronic commerce, an important knowledge discovery problem is the detection of rare/exceptional/outlying events.
In computational statistics, one well-known approach to detect outlying data points in a 2-D dataset is to assign a depth to each data point. Based on the assigned depths, the data points are organized in layers in the 2-D space, with the expectation that shallow layers are more likely to contain outlying points than are the deep layers. One robust notion of depth, called depth contours, was introduced by Tukey [17,18]. ISODEPTH, developed by Ruts and Rousseeuw [16], is an algorithm that computes 2-D depth contours.
In this demo, we show a fast algorithm, called FDC, for computing 2-D depth contours. The idea is that to compute the first k depth contours, it is sufficient to restrict the computation to a small selected subset of data points, instead of examining all data points. Consequently, FDC scales up much better than ISODEPTH. For instance, for 1,000 data points FDC is 4 times faster than ISODEPTH, and for 5,000 points FDC is 50 times faster. While 100,000 points are too many for ISODEPTH to handle, FDC takes about 50 seconds to compute the first 20 depth contours.
Last but not least, ISODEPTH relies on the non-existence of collinear points. Removing all collinear points can be time consuming. FDC is robust against collinear points.
What is Unique about the System? The last two paragraphs of the above description summarize the key points.
Paper Title: Finding Frequent Substructures in Chemical Compounds
Development Team: Luc Dehaspe, Hendrik Blockeel, Wim Van Laer and Luc De Raedt, Katholieke Universiteit Leuven; Hannu Toivonen, University of Helsinki
Telephone: +32-1632-7658
WARMR is a general purpose tool for the discovery of frequent patterns, association rules at its simplest and first-order logic rules in the general case. We will demonstrate how both new and known variants of frequent pattern discovery are handled, and how the user can switch from one setting to another with minor efforts.
As a prototypical example of applications where the additional expressivity offered by WARMR is useful, we consider the discovery of frequent substructures in a biochemical database of compounds that are classified as being carcinogenic or not. In this context, patterns concern general properties of the compound, but also more complex features such as bonds between atoms, membership of atoms to chemical groups such as alcohols, and connections between chemical groups. Preparation of the experiment involves representation of the data and background knowledge in a DATALOG format, and the definition of the hypothesis space by means of a declarative language bias formalism.
Other applications with similar needs for highly expressive patterns are taken from the domains of (advanced) market basket analysis, discovery of linguistic knowledge in tree-banks, and telecommunication alarm analysis.
WARMR is freely available for academic purposes upon request.
What is Unique about the System? WARMR discovers useful frequent patterns that are way beyond the complexity of association rules or their known variants. By changing the language bias the user can easily search for different patterns without modifying the algorithm. The very natural facility to add background knowledge further enhances the flexibility of the tool.
Development Team: Jerzy Bala, Mirco Manuci, Srinivas Gutta and Sung Baik, Datamat Systems Research, Inc.; Peter Pachowicz, George Mason University
Telephone: 703-917-0880, ext. 226
A research prototype of the InferView system will be demonstrated. InferView is being developed under the Ballistic Missile Organization and U.S. Army Space and Missile Defense Command sponsored project on data mining and decision support tools for situational awareness. InferView consists of three major components; (i) visualization, (ii) inference engine, and (iii) predictor. The user interacts with the system through a visual representation space where various graphical objects are rendered. Graphical objects represent: data, knowledge (e.g. as induced rules), and query explanations (decisions on unknown data identifications). InferView integrates graphical objects through the use of visually cognitive, human oriented depictions. A user can also examine non-graphical explanations (i. e. text based) to posed queries. InferView's transfer of data mining and decision support processes to the visualization space enhances the user's capabilities to see, explore, and gain decision making insights as never before.
The following two modes of operation will be demonstrated; (i) the data mining mode which uses the inference engine module to generate knowledge and subsequently represents it as 3D graphical objects, and (ii) the decision support mode the predictor module is used to support user's queries. In both modes user's directed navigation, zooming, and other spatially oriented operations will be demonstrated.
What is Unique about the System? The uniqueness of the InferView system is its synergistic integration of advanced computer graphic/visualization and inference based data generalization techniques.
InferView's knowledge visualization techniques contribute to better human decision-making insights through facilitation of spatial operations such as navigation, zooming, etc. A graphically appealing human computer interfacing and capability to visualize large and complex knowledge bases through spatial and graphical depictions of knowledge components add to InferView's uniqueness.
Papers Titles: (1) Integration of Classification and Association Rule Mining; (2) Visual Aided Exploration of Interesting Association Rules
Development Team: Bing Liu, Wynne Hsu, Yiming Ma and Chen Shu, National University of Singapore
Telephone: +65-874-6736
We would like to demonstrate two main themes of our data mining system:
Paper Title: Ranking - Methods for Flexible Evaluation and Efficient Comparison of Classification Performance
Development Team: Dr. Gadi Pinkas, Dr. Yizhak Idan, Rony Paz and Saharon Rosset, Amdocs Inc.
Telephone: +972-3-5765174
The demonstration will present a full modeling and analysis process of churn data coming from a telecommunication operator data warehouse. The demonstration will include:
What is Unique about the System? (1) Visualization and analysis module for combining the automated discovery results with human expert knowledge; (2) Optimization with regard to revenue flow rather than customer flow. This is in contrast with actual methods limitations that either introduce value into post-DM analysis or perform pre-DM segmentation by value.
Contact: Traci Taylor 655 Avenue of the Americas New York, NY 10010 Tel: 212-633-3766 Fax: 212-633-3764 Email: t.taylor@elsevier.com Web: www.elsevier.com
Contact: Marcia Kidston 101 Philip Drive Norwell, MA 02061 Tel: 781-871-6600 Fax: 781-871-6528 Email: kluwer@wkap.com Web: www.wkap.nl
Contact: Mary Jo Donnelly WCB/McGraw-Hill 1333 Burr Ridge Parkway Burr Ridge, IL 60521 Tel: 800-634-3963McGraw-Hill is a leading publisher of computer science texts. We continue our excellence with titles such as Machine Learning by Tom Mitchell, Database Management Systems by Raghu Ramakrishnan, and Database System Concepts by Abraham Silberschatz, Henry Korth and S. Sudarshan. We invite you to stop by and browse through our catalog and book display, as we will be offering discounts between 10%-30% at this conference.
Contact: Katja Kolinke 340 Pine Street, 6th Floor San Francisco, CA 94104-3205 Tel: 415-392-2665 Fax: 415-982-2665 Email: mkp@mkp.com Web: www.mkp.comMorgan Kaufmann publishes the finest technical information resources for computer science and engineering professionals. We publish in book and digital form in such areas as databases, networking, computer architecture, human computer interaction, computer graphics, multimedia information systems, artificial intelligence, and software engineering. Many of our books are considered to be the definitive works in their fields.
We believe strongly in seeking out the most authoritative, expert authors. Our family of authors and series editors includes many of the world's most respected computer scientists and engineers and their books often represent the wisdom gained from years of research, development, and teaching.
We believe it is our responsibility to add value to our books by working with authors to improve content and exposition. MK books are extensively peer reviewed and, in the case of textbooks, are often class tested with hundreds of students. All of our books are professionally edited.
Contact: Robin Okun P.O. Box 30130 Phoenix, AZ 85046 Tel: 602-971-1869 Fax: 602-971-2321 Email: info@pcai.com Web: www.pcai.com/pcai/PC AI Magazine provides the information necessary to help managers, programmers, executives, and other professionals understand the quickly unfolding realm of artificial intelligence (AI) and intelligent applications (IA). PC AI addresses the entire range of personal computers including the Mac, IBM PC, neXT, Apollo, and more. PC AI features developments in expert systems, neural networks, object oriented development, and all other areas of artificial intelligence. Feature articles, product reviews, real-world application stories, and a Buyer's Guide present a wide range of topics in each issue.
Back to KDD-98 Schedule