Data Sources for Cool Data Science Projects

One of the biggest obstacles to successful projects has been getting access to interesting data. Here are some more cool public data sources you can use for your next project.



By Michael Li, CEO The Data Incubator.

This is part 4.  Here are links to Part 1, Part 2, Part 3

At The Data Incubator, we run a free six week data science fellowship to help our Fellows land industry jobs. Our hiring partners love considering Fellows who don’t mind getting their hands dirty with data.  That’s why our Fellows work on cool capstone projects that showcase those skills.  One of the biggest obstacles to successful projects has been getting access to interesting data.  Here are some more cool public data sources you can use for your next project:

selection-criteria2016 Election

  1. County & Precinct Level Results: The National Atlas of the US has downloadable datasets for the last 12 years-2004, 2008, 2012, or all combined. OpenElections provides the most unbiased, certified election results. The Harvard Election Data Archive lets you download precinct level results from 2002-2012.
  2. State Level Results: The Federal Elections Commission has a published set of state-level results for Senate, House, and Presidential elections dating back to 1982.
  3. Global Elections: University of Michigan’s Constituency-Level Elections Archive collects data from elections all over the world.
  4. Voter Accessibility: The US government’s, Election Assistance Commission issues an Election Administration and Voting Survey to measure the ability of Americans both overseas and at home to register and successfully cast their ballot.
  5. Bush v Gore Recount: Following the controversial 2000 US Presidential Election, the National Opinion Research Center reviewed all of Florida’s 175,010 ‘invalid’ votes. Their data is available in different arrangements.

Pollution

  1. Nightlights: Researchers recently published The New World Atlas of Artificial Night Sky Brightness . You’re also able to download their atlas and view it in Google Earth. Note: you’ll have to submit a form to request access to their data.
  2. Air Quality: The WHO released their compilation of air quality for 1100 cities back in 2011. EEA has a detailed set of air quality specific to Europe, and American government has a public set for the US and it’s territories.
  3. Water Quality: USGS is a great go to for all things water, they have daily data tables, historical observations, and even current conditions. Global water data is also available through World Resources Institute.

healthcare-non-integrated-dataHealth

  1. Health Behaviors:The CDC conducts a continuous (and very thorough) survey called the Behavioral Risk Factor Surveillance System that measures American adult health behaviors-from alcohol consumption to immunizations. They have published annual data sets since 1984.
  2. Healthcare Spending: The agency for Healthcare Research and Quality has run a Medical Expenditure Panel Survey since 1996. Their raw data from this survey is downloadable to the public as well as their summary data tables

While building your own project cannot replicate the experience of fellowship at The Data Incubator (our Fellows get amazing access to hiring managers and access to nonpublic data sources) we hope this will get you excited about working in data science.  And when you are ready, you can apply to be a Fellow!

Got any more data sources?  Let us know and we’ll add them to the list!

Bio: Michael Li is the founder and CEO of Data Incubator. He has worked as a data scientist (Foursquare), quant (D.E. Shaw, J.P. Morgan), and a rocket scientist (NASA). He did his PhD at Princeton as a Hertz fellow and read Part III Maths at Cambridge as a Marshall scholar.

Original. Reposted by permission.

Related: