KDnuggets Home » News :: 2013 :: Dec :: News, Software :: Top Datasets on Reddit ( 14:n01 )

Top Datasets on Reddit

          


Most popular dataset posts on Reddit include NFL Game Metadata, Reddit top 2.5 Million posts, Zillow housing prices, and, of course, a database of cat pictures.

By Gregory Piatetsky, Dec 28, 2013.

Thanks to +RichGillin for a pointer to a Reddit page on Datasets Redditwww.reddit.com/r/datasets/

The top datasets for December 2013 include

NFL Game Metadata Since 1980 (CSV file). mapItOut reddit user explains how to link the metadata with the results:

  • Download the schedule and results as a CSV from pro football reference for each season that you want (example: www.pro-football-reference.com/years/2007/games.htm). Add a year variable to each file.
  • Stack up all the CSV files into a single CSV.
  • Using the date variable and the year variable that you added, construct an ID variable that looks like one in the metadata file: yyyymmdd0[home team abbreviation]. You'll probably need to look through the metadata to get all the team abbreviations, but they look pretty self-explanatory ("den" for Denver, "dal" for Dallas, etc.).
  • Merge the results data onto the metadata by that ID.

Top 2.5 Million posts. This is a dataset of the all-time top 1,000 posts, from the top 2,500 subreddits by subscribers, pulled from reddit between August 15-20, 2013.

The 911Dataset ProjectThe 911Dataset Project: 3TB across 254,822 files.

Average wait times for emergency rooms across the country, from [ProPublica/CMMS].

The top reddit dataset posts for 2013 include:

You can haz datasets! We now have over 4M financial, economic, and social datasets available.

Our DaaS platform Quandl is a free and open index of currently over 4 million datasets that is growing daily. We also released a Python package to go with our R, MATLAB, and excel ones this week. They allow easy API access to every single one. www.quandl.com/

173+ publicly available Social Network Datasets

Yelp Dataset Challenge

A generous sample of their data from the greater Phoenix, AZ metropolitan area including: 11,537 businesses - 8,282 checkin sets - 43,873 users - and 229,907 reviews.

Cats PicturesDatabase of Cat Pictures. (really, and not /r/pics).

The CAT dataset includes 10,000 cat images. Each image has annotations of the cat head with nine points, two for eyes, one for mouth, and six for ears.

Zillow Housing Data, including house price, rental rate and sales data for 37,000 locations.

The "gilded" posts include:

Happy data mining, and check also KDnuggets








Most popular last 30 days


 

Most viewed last 30 days

  1. The Grammar of Data Science: Python vs R - Mar 28, 2015.
  2. More Free Data Mining, Data Science Books and Resources - Mar 25, 2015.
  3. Deep Learning, The Curse of Dimensionality, and Autoencoders - Mar 12, 2015. 4, up3
  4. Awesome Public Datasets on GitHub - Apr 6, 2015.
  5. Deep Learning for Text Understanding from Scratch - Mar 13, 2015.
  6. PredictionIO (Open Source Version) vs Microsoft Azure Machine Learning - Mar 26, 2015.
  7. 10 things statistics taught us about big data analysis - Feb 10, 2015.
  8. Top 10 Data Analysis Tools for Business - Jun 13, 2014.
  9. Forrester Wave(tm) Big Data Predictive Analytics 2015: Gainers and Losers - Apr 3, 2015.

 
 

Most shared last 30 days

  1. The Grammar of Data Science: Python vs R - Mar 28, 2015.
  2. Forrester Wave(tm) Big Data Predictive Analytics 2015: Gainers and Losers - Apr 3, 2015.
  3. Awesome Public Datasets on GitHub - Apr 6, 2015.
  4. PredictionIO (Open Source Version) vs Microsoft Azure Machine Learning - Mar 26, 2015.
  5. Data Science as a profession - time is now - Mar 30, 2015.
  6. Cloud Machine Learning Wars: Amazon vs IBM Watson vs Microsoft Azure - Apr 16, 2015.
  7. Top LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science - from "Big Bang" to Now - Apr 19, 2015.
  8. Computing Platforms for Analytics, Data Mining, Data Science - Apr 1, 2015.
  9. How Big Data Can Improve the Lives of the Poor - Mar 31, 2015.
  10. Gold Mine or Blind Alley? Functional Programming for Big Data & Machine Learning - Apr 1, 2015.

KDnuggets Home » News :: 2013 :: Dec :: News, Software :: Top Datasets on Reddit ( 14:n01 )