KDD-89 Panel on
Research Issues and Applications of Knowledge Discovery

Participants: Larry Kershberg, Pat Langley, and J. Ross Quinlan


Incorporating Knowledge Discovery into Expert Database Systems

Larry Kerschberg
Professor and Chairman
Department of Information Systems and Systems Engineering
George Mason University
Fairfax, Virginia 22030

Definition of an Expert Database System

An Expert Database System (EDS) [KE86, KE87, KE88] supports applications that require knowledge-directed processing of shared information. Expertise may reside within the system to improve performance by: 1) providing intelligent user dialog management, 2) using database semantic integrity constraints for query optimization, and 3) combining knowledge- and data-driven search techniques in efficient inference schemes. Conversely, expertise may reside outside the system in knowledge-based applications that interpret vast quantities of data and provide recommendations to decision-makers. Thus the goal of EDS research and development is to provide tools and techniques to make databases "active" agents that can reason, and to allow database systems to support artificial intelligence applications that manage and access large knowledge bases and databases.

Knowledge Discovery in Expert Database Systems

The term "Discovery" has various synonyms: Detection, Revelation, Perception, Breakthrough, and Unearthing. All of these terms denote a degree of newness or surprise; the fact is that the discovery provides new insights into the particular Domain of Discourse.

The perspective with which a database is viewed is important for discovery, because a well-known fact from one perspective might be crucial strategic knowledge from another perspective. For example, Airline Computer Reservation Systems (CRS) maintain the Frequent Flyer Program records of clients. This database can be used to produce highly selective marketing mailings, to discover customer travel patterns and preferences, and to monitor the usage of a CRS by travel agents. Thus the same database may be viewed from differing perspectives, and the discovery of new patterns, events, and relationships may be crucial for strategic decision-making.

The knowledge discovered may have differing representations. For example, suppose we have instrumented a dynamical system to measure the input-output relationship of signals at periodic instants of time. The system is viewed as a "black box" and we wish to discover characterizations of its behavior. We can construct an initial database represented as a relation in the Codd sense with the attributes representing the input-output signal ports.

Each tuple of the relation would contain the input-output signal values measured at a particular instant of time.

One might want to have an EDS discover different knowledge representation:

Supporting Multiple Types of Knowledge in EDS

As indicated by the above example multiple types of knowledge might be discovered from a database, and an EDS should be able to manage discovered system knowledge. This knowledge represents an alternative and possibly more succinct characterization of system behavior. Therefore, this knowledge may be used to 1) replace portions of the database by incorporating it into the knowledge/data base, 2) infer missing data to complement the database, and 3) support meta-level "what-if" type reasoning regarding the knowledge representations and their possible interactions. These points are being addressed in the INLEN System research project [KMK89] described in this workshop.

References

[KBH89] L. Kerschberg, R. Baum, and J. Hung, ``KORTEX: An Expert Database System Shell for a Knowledge-Based Entity Relationship Model," International Conference on the Entity/Relationship Approach, Toronto, October, 1989.

[KE86] L. Kerschberg, (editor), Expert Database Systems: Proceedings from the First International Workshop, Benjamin/Cummings Publishing Company, Menlo Park, CA, 1986.

[KE87] L. Kerschberg, (editor), Expert Database Systems: Proceedings from the First International Conference, Benjamin/Cummings Publishing Company, Menlo Park, CA, 1987.

[KE88] L. Kerschberg, (editor), Expert Database Systems: Proceedings from the Second International Conference, George Mason University, Benjamin/Cummings Publishing Company, Menlo Park, CA, 1988.

[KMK89] K. Kaufman, R. Michalski, L. Kerschberg, ``The INLEN System for Extracting Knowledge from Databases: Goals and General Description," IJCAI-89 Workshop on Knowledge Discovery in Databases, Detroit, MI, August 1989.


Issues in Knowledge Discovery

Pat Langley
Department of Computer Science
University of California, Irvine, CA 92717 USA
Langley@ics.uci.edu

In everyday language, the word discovery suggests the acquisition of new knowledge on one's own initiative, without aid from a teacher. Research on machine discovery has focused on such unsupervised learning tasks, much of it drawing inspiration from the history of science. One can identify five broad classes of discovery tasks that have been studied in the literature:

  1. Taxonomy formation, which organizes observations into a hierarchy of classes that group similar events together (Michalski & Stepp, 1983; Fisher, 1987);
  2. Qualitative discovery, which formulates qualitative laws that relate different classes (Lenat, 1977; Jones, 1986);
  3. Quantitative discovery, which induces empirical laws that summarize numeric data (Langley, Bradshaw, & Simon, 1983; Falkenhainer & Michalski, 1986);
  4. Model construction, which infers structural models that explain observations in terms of unobserved components and their structure (Zytkow & Simon, 1986; Rose & Langley, 1987; Rajamoney, 1989).
  5. Process formation, which infers process theories that explain observations involving changes over time (Falkenhainer, 1988; Kulkarni & Simon, 1988).
Early research in machine discovery focused on the first three problems, which deal with empirical laws. More recently, attention has turned to the last two problems, which deal with theory formation and revision. However, all five issues remain important, and a major challenge for future research is the development of integrated discovery systems that include many of these capabilities (Nordhausen & Langley, in press).

Application to real-world domains should also be a priority, but this will require more robust variants of existing discovery algorithms. Ideally, an applied discovery system must be able to: (a) handle noise and exceptions; (b) incorporate background knowledge to constrain search; (c) and process data in an incremental manner for the sake of efficiency. Some systems with these capabilities already exist, but the field must test them on difficult domains and extend them when they encounter problems. Researchers must also be careful to evaluate their methods using some performance measure like predictive accuracy; anecdotal evidence (e.g., this hierarchy looks good) is much too subjective.

In coming years, discovery tasks should occupy a central role in the application of machine learning technology. This is precisely because discovery are designed for domains where there is little a priori knowledge. Supervised learning methods require some tutor, which suggests the presence of a domain expert; in such cases, traditional approaches to knowledge acquisition are possible. Machine discovery holds the promise of automating knowledge acquisition without external aid, and this will be necessary in the coming decades, as science and technology acquire new and poorly understood data.

References

Falkenhainer, B. C., & Michalski, R. S. (1986). Integrating quantitative and qualitative discovery: The ABACUS system. Machine Learning, 1, 367--422.

Falkenhainer, B. C. (1989). Learning from physical analogies: A study in analogy and the explanation process. Doctoral dissertation, Department of Computer Science, University of Illinois, Urbana.

Fisher, D. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139--172.

Jones, R. (1986). Generating predictions to aid the scientific discovery process. Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 513--522). Philadelphia, PA: Morgan Kaufmann.

Kulkarni, D., & Simon, H. A. (1988). The process of scientific discovery: The strategy of experimentation. Cognitive Science, 12, 139--175.

Lenat, D. B. (1977). Automated theory formation in mathematics. Proceedings of the Fifth International Joint Conference on Artificial Intelligence (pp. 833--841). Cambridge, MA: Morgan Kaufmann.

Langley, P., Bradshaw, G. L., & Simon, H. A. (1983). Rediscovering chemistry with the BACON system. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach. San Mateo, CA: Morgan Kaufmann.

Michalski, R. S., & Stepp, R. (1983). Learning from observation: Conceptual clustering. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach. San Mateo, CA: Morgan Kaufmann.

Nordhausen, B, & Langley, P. (in press). An integrated approach to empirical discovery. In J. Shrager & P. Langley (Eds.), Computational models of discovery and theory formation.

Rajamoney, S. (1989). Explanation-based theory revision: An approach to the problems of incomplete and incorrect theories. Doctoral dissertation, Department of Computer Science, University of Illinois, Urbana.

Rose, D., & Langley, P. (1986) Chemical discovery as belief revision. Machine Learning, 1, 423--451.

Zytkow, J. M., & Simon. H. A. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107--136.


Requirements for Knowledge Discovery in Databases

J. R. Quinlan
Basser Department of Computer Science
University of Sydney
Sydney, N.S.W. 2006 Australia

As databases grow in both number and size, the prospect of mining them for new, useful knowledge becomes yet more enticing. The following are some of the points I see as important in the development of this approach: