The goal in this Kaggle-hosted challenge is to match source code files to the open source code project.
The EMC source code classification challenge requires you to classify source code files according to the projects they belong to.
Given a set of source code files collected from various open source projects, how well can unseen source code files from the same set of open source projects can be classified?
Possible real-world applications:
- Protecting intellectual property
- Data Loss Protection (DLP)
- Automatic categorization of source code repositories
Participate in the challenge at
www.kaggle.com/c/emc-data-science
|