EMC Israel Data Science Challenge

The goal in this Kaggle-hosted challenge is to match source code files to the open source code project.

The EMC source code classification challenge requires you to classify source code files according to the projects they belong to.

Given a set of source code files collected from various open source projects, how well can unseen source code files from the same set of open source projects can be classified?

Possible real-world applications:

Protecting intellectual property
Data Loss Protection (DLP)
Automatic categorization of source code repositories

Participate in the challenge at

www.kaggle.com/c/emc-data-science

Related
→ Data Mining Competitions