KDnuggets Home » News » 2010 » Jun » Publications » Data mining explains temporal gene interactions  ( < Prev | 10:n15 | Next > )

Data mining algorithm explains complex temporal interactions among genes


 
  
The algorithm reconstructs temporal models of cellular processes from gene expression data.


Date:

Researchers at Virginia Tech, New York University (NYU), and the University of Milan, Italy, have created a data mining algorithm they call GOALIE that can automatically reveal how biological processes are coordinated in time.

Biological processes such as cell division, metabolism, and development must be carefully synchronized for proper cell function. How such events are coordinated in time is a complex problem in the field of systems biology. While researchers can gather temporal data about the activity of thousands of genes simultaneously, interpreting these datasets in order to understand higher order phenomena such as cell division requires the development of new analysis tools. The mathematically rigorous data mining algorithm GOALIE (Gene Ontology based Algorithmic Logic and Invariant Extractor) reconstructs temporal models of cellular processes from gene expression data.

[See Reverse engineering dynamic temporal models of biological processes and their relationships, on-line Proceedings of the National Academy of Sciences (PNAS).]

The researchers developed and applied their algorithm to time-course gene expression datasets from the well-studied organism Saccharomyces cerevisiae, a budding yeast that is also used for raising bread dough and the manufacture of beer, wine, and distilled spirits. They applied their novel temporal logic-based algorithm to a range of yeast data sets involving cell division, metabolism, and various stresses. "A key goal of GOALIE is to be able to computationally integrate data from distinct stress experiments even when the experiments had been conducted independently," said Naren Ramakrishnan, professor of computer science at Virginia Tech, and lead author.

"GOALIE is part of a broader effort to combine data mining with modeling tools", said Bud Mishra, professor of computer science and mathematics with the Courant Institute of Mathematical Sciences at NYU, and corresponding author. Mishra, also a professor of cell biology with the NYU School of Medicine, is investigator on a $10 million National Science Foundation (NSF) Expeditions grant to develop novel computational reasoning tools for complex systems, focusing on biological organs to complex diseases as well as engineered systems. "GOALIE can not just mine patterns but also extract entire formal models that can then be used for posing biological questions and reasoning about hypotheses," said Mishra.

A hypothesis in the yeast example is how genes organize into groups to perform a specific concerted behavior. "However, these gene groupings are not permanent, but shift as the cell begins orchestrating its next step. These transitions correspond to significant 'regrouping' of genes, which is indicative of a change in cellular state," said Richard Helm, associate professor of biochemistry at Virginia Tech, and co-author. Tracking down these transitions in time-based experiments is difficult, especially with thousands of genes changing in levels simultaneously. "When confronted with datasets this large we tend to focus on our 'favorite' genes or processes, leading potentially to a biased viewpoint," said Helm.

Read more.


KDnuggets Home » News » 2010 » Jun » Publications » Data mining explains temporal gene interactions  ( < Prev | 10:n15 | Next > )