Harvard CS109 Data Science Course, Resources Free and Online

Harvard Fall 2013 CS109 Data Science is an excellent course, and most of its resources, including video archive and lecture slides, are freely available online - what a fantastic way to get ivy-league quality education (although without a diploma).



Gregory Piatetsky, Nov 20, 2013.

Harvard CS109 Data Science course, is currently taught by two Harvard professors: Hanspeter Pfister (Computer Science) and Joe Blitzstein (Statistics).

This course introduces methods for five key aspects of data science

  • data wrangling, cleaning, and sampling
  • data management to be able to access big data quickly and reliably;
  • exploratory data analysis to generate hypotheses and intuition;
  • prediction based on statistical methods such as regression and classification;
  • communication of results through visualization, stories, and summaries.

The course is using Python for all programming assignments and projects.

IPython notebooks for CS109 are available on https://github.com/cs109/content

Amazingly, most of the resources are available free online, including Video Archive - recent lectures include

  • Guest Lecture: Yair Livne, Data Science at Quora
  • SVMs and Random Forests
  • Visual Story Telling. Messaging. Effective Presentations.
  • Network Visualization. Node-Link Graphs. Matrix Views. Gephi

and Lecture Slides covering

  • Introduction
  • Statistical Graphs
  • Data Mining
  • Statistical Models
  • Bayesian Methods
  • Map Reduce and more ...

What a fantastic way for getting ivy-league quality education by going along with the course (although you will not get a Harvard diploma).