IEEE ICDM Contest - Overview of Top Solutions, part 1

the authors revealed a good deal of important details about their approaches, but also kept the descriptions straightforward and concise, giving all of us an unprecedented opportunity to learn the essence of data mining know-how.

TunedIT.org, October 26, 2010 by magdalena pancewicz

The IEEE ICDM Contest: TomTom Traffic Prediction for Intelligent GPS Navigation came to an end. As promised, we publish descriptions of top solutions, provided by participants. Although the reports had to be brief, the authors not only revealed a good deal of important details about their approaches, but also kept the descriptions straightforward and concise, giving all of us an unprecedented opportunity to learn the essence of data mining know-how. This is a good supplement to fully scientific articles that will be presented during Contest Workshop at the ICDM conference in Sydney.

Today, we publish descriptions for Task 1, "Traffic". In the nearest days we'll make another post with Task 2 and 3 reports - stay tuned! We thank all the authors for their contributions.

By Alexander Groznetsky (alegro), the winner. Alex is an experienced data miner who had participated (nick orgela) in the Netflix Prize contest in its early days - this fact becomes pretty clear when you look at the list of algorithms used by him for ICDM - they sound very Netflix-like :) . To learn about the task, see "Traffic" task description page.

My solution was a linear mixture of about 20 predictors, based on three types of algorithms used with different parameters:
1. Linear Least Squares (LLS),
2. Supervised SVD-like factorization (Singular Value Decomposition),
3. Restricted Boltzmann Machine (RBM) neural network.
The first one was based on weighted linear regression model. One set of regression parameters was computed for each target value (summary congestion at minutes 41-50 at road segment). Known target values were used as regressands. Averaged congestions per each segment were used as regressors. Regression weights (one per design matrix row) were computed as product of similarity and time distance from the target to the regressors averaging intervals. Limited amount of neighbors most similar to the predicted one was used for modeling. Several predictions were produced by this predictor with different averaging intervals, amounts of selected neighbors, using aligned or not aligned on hour boundary neighbors.

Read more.