A good collection of open source resources for Data Science Masters Curriculum, covering Math, Algorithms, Databases, Data Mining, Machine Learning, Natural Language Processing, Data Analysis and Visualization, and Python.
Here is a great list of useful, opensource resources for a selfstudy towards Data Science MS, assembled by Data Scientist Clare Corthell,
@clarecorthell.
Intro to Data Science
UW / Coursera. Topics: Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization.
Math
 Linear Algebra / Levandosky Stanford / Book
 Linear Programming (Math 407) University of Washington / Course
 Statistics Stats in a Nutshell / Book
 Forecasting: Principles and Practice Monash University / Book *uses R
 ProblemSolving Heuristics "How To Solve It" Polya / Book
 Coding the Matrix: Linear Algebra through Computer Science Applications Brown / Coursera
 Think Bayes Allen Downey / Book
Computing
 Algorithms
 Algorithms Design & Analysis I Stanford / Coursera
 Algorithm Design Kleinberg & Tardos / Book
 Databases
 Introduction to Databases Stanford / Coursera
 SQL Tutorial W3Schools / Tutorials
 Data Mining
 Mining Massive Data Sets Stanford / Book
 Mining The Social Web O'Reilly / Book
 Introduction to Information Retrieval Stanford / Book
 Machine Learning
 Machine Learning / Ng Stanford / Coursera
 Programming Collective Intelligence O'Reilly / Book
 Statistics The Elements of Statistical Learning
 Probabilistic Graphical Models
 Probabilistic Programming and Bayesian Methods for Hackers Github / Tutorials
 PGMs / Koller Stanford / Coursera
 Natural Language Processing: NLP with Python O'Reilly / Book
 Data Analysis
 Python for Data Analysis O'Reilly / Book
 Big Data Analysis with Twitter UC Berkeley / Lectures
 Social and Economic Networks: Models and Analysis / Stanford / Coursera
 Information Visualization "Envisioning Information" Tufte / Book
 Learning Python: Learn Python the Hard Way, Google's Python Class
 Python (Libraries for Data Science)
 Basic Packages Python, virtualenv, NumPy, SciPy, matplotlib and IPython
 Data Science in iPython Notebooks (Linear Regression, Logistic Regression, Random Forests, KMeans Clustering)
 Bayesian Inference  pymc
 Labeled data structures objects, statistical functions, etc pandas (See: Python for Data Analysis)
 Python wrapper for the Twitter API twython
 Tools for Data Mining & Analysis scikitlearn
 Network Modeling & Viz networkx
 Natural Language Toolkit NLTK
For a final project, do a competition  plenty to choose on
For a full list and additional resources, see
