*Given ~7400 genetic variables and outcome data, deliver R package that selects the most powerfully predictive variables, then uses these to accurately predict outcomes. 30K prize from the Cleveland Clinic.*

**Innocentive Challenge ID 9932794, AWARD: $30K USD**

DEADLINE: Dec 6, 2011

DEADLINE: Dec 6, 2011

Given genetic variables (expression levels for ~7400 genes) and outcome data, deliver an efficient computer program that selects the most powerfully predictive variables, then uses these to accurately predict outcomes. Acceptable solutions must be implemented as an R package, with source code delivered to the Seeker.

**Challenge Overview**

The Seeker, the Cleveland Clinic, has developed a computational algorithm that predicts survival outcomes for individuals with disease, given a large number of genetic variables. However, while the Seeker's reference algorithm is competitive in terms of selecting powerful variables and making accurate predictions, it computes too slowly to be of practical use in research or clinical settings. This Challenge is to deliver an efficient computer program that predicts cancer survival outcomes with accuracy equal or better than the reference algorithm, including 10-fold validation, in less than 15 hours of real world (wall clock) time.

A reference algorithm (written in R), an academic article describing the approach, training data, and more details are available in the Detailed Description. R is a freely available package for statistical computation; extensions to R (in any programming language) are acceptable for this Challenge.

This Challenge includes "Prodigy II", an online tool that enables Solvers to easily score their own code for accuracy on the blinded data set and compare run-time performance versus other Solvers (i.e., execution time on our standardized test server). A leaderboard shows the top ten Solvers who have tested accurate algorithms with the fastest run-times and their position relative to the required threshold.