The perspective with which a database is viewed is important for discovery, because a well-known fact from one perspective might be crucial strategic knowledge from another perspective. For example, Airline Computer Reservation Systems (CRS) maintain the Frequent Flyer Program records of clients. This database can be used to produce highly selective marketing mailings, to discover customer travel patterns and preferences, and to monitor the usage of a CRS by travel agents. Thus the same database may be viewed from differing perspectives, and the discovery of new patterns, events, and relationships may be crucial for strategic decision-making.
The knowledge discovered may have differing representations. For example, suppose we have instrumented a dynamical system to measure the input-output relationship of signals at periodic instants of time. The system is viewed as a "black box" and we wish to discover characterizations of its behavior. We can construct an initial database represented as a relation in the Codd sense with the attributes representing the input-output signal ports.
Each tuple of the relation would contain the input-output signal values measured at a particular instant of time.
One might want to have an EDS discover different knowledge representation:
[KBH89] L. Kerschberg, R. Baum, and J. Hung, ``KORTEX: An Expert Database System Shell for a Knowledge-Based Entity Relationship Model," International Conference on the Entity/Relationship Approach, Toronto, October, 1989.
[KE86] L. Kerschberg, (editor), Expert Database Systems: Proceedings from the First International Workshop, Benjamin/Cummings Publishing Company, Menlo Park, CA, 1986.
[KE87] L. Kerschberg, (editor), Expert Database Systems: Proceedings from the First International Conference, Benjamin/Cummings Publishing Company, Menlo Park, CA, 1987.
[KE88] L. Kerschberg, (editor), Expert Database Systems: Proceedings from the Second International Conference, George Mason University, Benjamin/Cummings Publishing Company, Menlo Park, CA, 1988.
[KMK89] K. Kaufman, R. Michalski, L. Kerschberg, ``The INLEN System for Extracting Knowledge from Databases: Goals and General Description," IJCAI-89 Workshop on Knowledge Discovery in Databases, Detroit, MI, August 1989.
In everyday language, the word discovery suggests the acquisition of new knowledge on one's own initiative, without aid from a teacher. Research on machine discovery has focused on such unsupervised learning tasks, much of it drawing inspiration from the history of science. One can identify five broad classes of discovery tasks that have been studied in the literature:
Application to real-world domains should also be a priority, but this will require more robust variants of existing discovery algorithms. Ideally, an applied discovery system must be able to: (a) handle noise and exceptions; (b) incorporate background knowledge to constrain search; (c) and process data in an incremental manner for the sake of efficiency. Some systems with these capabilities already exist, but the field must test them on difficult domains and extend them when they encounter problems. Researchers must also be careful to evaluate their methods using some performance measure like predictive accuracy; anecdotal evidence (e.g., this hierarchy looks good) is much too subjective.
In coming years, discovery tasks should occupy a central role in the application of machine learning technology. This is precisely because discovery are designed for domains where there is little a priori knowledge. Supervised learning methods require some tutor, which suggests the presence of a domain expert; in such cases, traditional approaches to knowledge acquisition are possible. Machine discovery holds the promise of automating knowledge acquisition without external aid, and this will be necessary in the coming decades, as science and technology acquire new and poorly understood data.
Falkenhainer, B. C., & Michalski, R. S. (1986). Integrating quantitative and qualitative discovery: The ABACUS system. Machine Learning, 1, 367--422.
Falkenhainer, B. C. (1989). Learning from physical analogies: A study in analogy and the explanation process. Doctoral dissertation, Department of Computer Science, University of Illinois, Urbana.
Fisher, D. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139--172.
Jones, R. (1986). Generating predictions to aid the scientific discovery process. Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 513--522). Philadelphia, PA: Morgan Kaufmann.
Kulkarni, D., & Simon, H. A. (1988). The process of scientific discovery: The strategy of experimentation. Cognitive Science, 12, 139--175.
Lenat, D. B. (1977). Automated theory formation in mathematics. Proceedings of the Fifth International Joint Conference on Artificial Intelligence (pp. 833--841). Cambridge, MA: Morgan Kaufmann.
Langley, P., Bradshaw, G. L., & Simon, H. A. (1983). Rediscovering chemistry with the BACON system. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach. San Mateo, CA: Morgan Kaufmann.
Michalski, R. S., & Stepp, R. (1983). Learning from observation: Conceptual clustering. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial intelligence approach. San Mateo, CA: Morgan Kaufmann.
Nordhausen, B, & Langley, P. (in press). An integrated approach to empirical discovery. In J. Shrager & P. Langley (Eds.), Computational models of discovery and theory formation.
Rajamoney, S. (1989). Explanation-based theory revision: An approach to the problems of incomplete and incorrect theories. Doctoral dissertation, Department of Computer Science, University of Illinois, Urbana.
Rose, D., & Langley, P. (1986) Chemical discovery as belief revision. Machine Learning, 1, 423--451.
Zytkow, J. M., & Simon. H. A. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107--136.
As databases grow in both number and size, the prospect of mining them for new, useful knowledge becomes yet more enticing. The following are some of the points I see as important in the development of this approach: