Software for statistically significant positive dependencies

Kingfisher program searches for the best non-redundant dependency rules, which express statistically significant positive dependencies in data.

Kingfisher program searches for the best non-redundant dependency rules, which express statistically significant positive dependencies in data. The rules are of the form X->A or X->~A, where X is a set of binary attributes and consequence is a single attribute or its negation. With classical statistical measures (like chi-squared and Fisher's p), positive dependence between X and ~A is the same as negative dependence between X and A, and thus the program can find both positive and negative dependencies. The non-redundancy of dependencies means that no rule X->A (X->~A) is selected if it is suboptimal compared to a more general rule Y->A (Y->~A), where Y is a proper subset of X.

The current implementation offers two search measures: Fisher's exact test (p-value) and the chi-squared measure. The latter can be used with or without continuity correction. However, Fisher's exact test is always recommendable, because it produces more accurate results (which hold better in the future data) and the search is also more efficient. In addition, the user can easily add her or his own measures.

The program scales up well without any minimum frequency thresholds or restrictions on the rule complexity. For example, one can analyze all classical benchmark data sets (like Chess, Pumsb, Accidents, Retail) with just 1GB memory. However, if the computation turns out to be too heavy, one can always search for only positive (negative) rules, decrease the number of searched rules, set more demanding initial thresholds for the goodness measure, or use the classical restrictions concerning the frequency and complexity of rules.

Get it at www.cs.helsinki.fi/u/whamalai/sourcecode.html