The EMC source code classification challenge requires you to classify source code files according to the projects they belong to.
Given a set of source code files collected from various open source projects, how well can unseen source code files from the same set of open source projects can be classified?
Possible real-world applications:
- Protecting intellectual property
- Data Loss Protection (DLP)
- Automatic categorization of source code repositories