Awesome Public Datasets on GitHub
A long, categorized list of large datasets (available for public use) to try your analytics skills on. Which one would you pick?
- Challenges in Machine Learning
- D4D Challenge of Orange
- DrivenData Competitions for Social Good
- ICWSM Data Challenge (since 2009)
- Kaggle Competition Data
- KDD Cup by Tencent 2012
- Localytics Data Visualization Challenge
- Netflix Prize
- Yelp Dataset Challenge
- CBOE Futures Exchange
- Google Finance
- Google Trends
- OSU Financial data
- St Louis Federal
- Yahoo Finance
You can also find various datasets for the following categories:
GitHub Link: https://github.com/caesar0301/awesome-public-datasets
Xia Ming is a Ph.D. candidate at Shanghai Jiao Tong Univ. He received B.S. in Optical Information and Science Technology in 2010 at Xidian University, Xi'an, China. His research area is the measurement and analysis of mobile network traffic, especially on the renewed models and characteristics of networks traffic, employing statistical and machine learning techniques on distributed processing platforms such as Apache Spark.
So, which dataset would you pick today? Would you like to add anything to this list?
Let us know your thoughts in the comments below.
- KDnuggets Datasets for Data Mining and Data Science
- Interesting Social Media Datasets
- Free Urban Data – What’s It Good For?