Date: Tue, 20 Apr 1999 22:50:36 -0700 (PDT) From: Hillol Kargupta hillol@eecs.wsu.edu Subject: Distributed Data Mining Paper The following distributed data mining paper is currently available from http://www.eecs.wsu.edu/~hillol/pubs.html Title: Distributed Multivariate Regression Using Wavelet-based Collective Data Mining Authors: Daryl E. Hershberger and Hillol Kargupta School of EECS, Washington State University Technical Report EECS 99-002 Abstract: This paper presents a method for distributed multivariate regression using wavelet-based Collective Data Mining (CDM). The method seamlessly blends machine learning and information theory with the statistical methods employed in multivariate regression to provide an effective data mining technique for use in a distributed data and computation environment. Evaluation of the method in terms of model accuracy as a function of appropriateness of the selected wavelet function, relative number of non-linear cross-terms, and sample size demonstrates that accurate multivariate regression models can be generated from distributed, heterogeneous, data sets with minimal data communication overhead compared to that required to aggregate a centralized data set. Application of this method to Linear Discriminant Analysis, which is closely related to multivariate regression, produced classification results on the Iris data set that are comparable to those obtained with centralized data analysis. Hillol Kargupta School of EECS, Washington State University http://www.eecs.wsu.edu/~hillol
Copyright © 1999 KDnuggets