KDnuggets Home » News » 2011 » Apr » Software » Starting with R on HHP  ( < Prev | 11:n11 | Next > )

Starting with R on Heritage Health Prize


 
  
This blog posts looks at $3M Heritage Health Prize competition and how to get started using the R platform.


(Editor: This blog posts looks at $3M Heritage Health Prize and shows how to get started using the R platform.)

CYBAEA Journal, 2011-04-08, Allan Engelhardt
We do not have the full set of data yet, so this is a simple warm-up session to predict the days in hospital in year 2 based on the year 1 data.

Prerequisites

Obviously you need to have R installed, and you should also have signed up for the competition (be sure to read the terms carefully) and downloaded and extracted the release 1 data file.

Data preparation ...

Scoring

#### FUNCTION TO CALCULATE SCORE
HPPScore <- function (p, a) { ### Scorng function after ### www.heritagehealthprize.com/c/hhp/Details/Evaluation sqrt(mean((log(1+p, 10) - log(1+a, 10))^2)) }

The author initial effort gets score using glm(LengthOfStay): 0.278914

Read more.



See also more medical-oriented look at the HHP
Using R and clinical heuristics to explore the Heritage Health Prize: what do we gain?, by gary
If you were just planning to grind the data set straight through your Weka engine, or simply run an ensemble of 100,000 decision trees (am I allowed to say random forest in my blog?) through your Beowulf cluster, you can stop reading here. If, however, you wonder if an understanding of pathophysiology, epidemiology, and clinical medicine might yield some insight into your approach for analytics in this competition, read on.
An epidemiologist often wants to know, what is the burden of disease in a population? It would be interesting to see the prevalence of each condition in the data set. The number of claims submitted for each condition can be used as a proxy.

KDnuggets Home » News » 2011 » Apr » Software » Starting with R on HHP  ( < Prev | 11:n11 | Next > )