# Open Source Data Science Masters Curriculum

A good collection of open source resources for Data Science Masters Curriculum, covering Math, Algorithms, Databases, Data Mining, Machine Learning, Natural Language Processing, Data Analysis and Visualization, and Python.

Here is a great list of useful, open-source resources for a self-study towards Data Science MS, assembled by Data Scientist Clare Corthell,

**@clarecorthell**.

See also

- Harvard CS109 Data Science Course, Resources Free and Online
- Learning from Data at edX, taught by Caltech professor Yaser Abu-Mostafa,
- KDnuggets Home :: FAQ :: Learning Data Mining and Data Science

**Intro to Data Science **

UW / Coursera. Topics: Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization.

**Math**

- Linear Algebra / Levandosky Stanford / Book
- Linear Programming (Math 407) University of Washington / Course
- Statistics Stats in a Nutshell / Book
- Forecasting: Principles and Practice Monash University / Book *uses R
- Problem-Solving Heuristics "How To Solve It" Polya / Book
- Coding the Matrix: Linear Algebra through Computer Science Applications Brown / Coursera
- Think Bayes Allen Downey / Book

**Computing**

- Algorithms
- Algorithms Design & Analysis I Stanford / Coursera
- Algorithm Design Kleinberg & Tardos / Book

- Databases
- Introduction to Databases Stanford / Coursera
- SQL Tutorial W3Schools / Tutorials

- Data Mining
- Mining Massive Data Sets Stanford / Book
- Mining The Social Web O'Reilly / Book
- Introduction to Information Retrieval Stanford / Book

- Machine Learning
- Machine Learning / Ng Stanford / Coursera
- Programming Collective Intelligence O'Reilly / Book
- Statistics The Elements of Statistical Learning

- Probabilistic Graphical Models
- Probabilistic Programming and Bayesian Methods for Hackers Github / Tutorials
- PGMs / Koller Stanford / Coursera

- Natural Language Processing: NLP with Python O'Reilly / Book
- Data Analysis
- Python for Data Analysis O'Reilly / Book
- Big Data Analysis with Twitter UC Berkeley / Lectures
- Social and Economic Networks: Models and Analysis / Stanford / Coursera
- Information Visualization "Envisioning Information" Tufte / Book

- Learning Python: Learn Python the Hard Way, Google's Python Class
- Python (Libraries for Data Science)
- Basic Packages Python, virtualenv, NumPy, SciPy, matplotlib and IPython
- Data Science in iPython Notebooks (Linear Regression, Logistic Regression, Random Forests, K-Means Clustering)
- Bayesian Inference | pymc
- Labeled data structures objects, statistical functions, etc pandas (See: Python for Data Analysis)
- Python wrapper for the Twitter API twython
- Tools for Data Mining & Analysis scikit-learn
- Network Modeling & Viz networkx
- Natural Language Toolkit NLTK

For a final project, do a competition - plenty to choose on

For a full list and additional resources, see