KDnuggets Home » News » 2011 » Dec » Software » MINE: Maximal Information Nonparametric Exploration software using MIC  ( < Prev | 11:n31 | Next > )

MINE: Maximal Information Nonparametric Exploration software using MIC


 
  
The breakthrough method from Reshef brothers (described in a recent Science paper) improves upon Pearson correlation coefficient and introduces a new MIC criteria to find a wide range of non-linear association. The corresponding software is available in Java and R.


The maximal information coefficient (MIC) is a new and very promising measure of two-variable dependence designed specifically for rapid exploration of many-dimensional data sets. MIC is a part of a larger family of maximal information-based nonparametric exploration (MINE) statistics, which can be used to identify and characterize important relationships in data.

A paper Detecting Novel Associations in Large Data Sets describing MINE and applying it to data from global health, genomics, the human microbiome, and Major League Baseball was published in Science magazine.

MINE was developed by brothers David Reshef and Yakir Reshef, working with Professors Pardis Sabeti and Michael Mitzenmacher.

The MINE software based on this approach is available at www.exploredata.net/.

Maximal Information Coefficient (MIC)

Paper: Detecting Novel Associations in Large Data Sets, David N. Reshef, et al., Science 334, 1518 (2011); DOI: 10.1126/science.1205438

Abstract:

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

See also a very good video which explains MIC visualization of datasets


 
Related
Data Mining Software

KDnuggets Home » News » 2011 » Dec » Software » MINE: Maximal Information Nonparametric Exploration software using MIC  ( < Prev | 11:n31 | Next > )