Open Source Data Science Masters Curriculum
Tags: MS in Data Science, Open Source
A good collection of open source resources for Data Science Masters Curriculum, covering Math, Algorithms, Databases, Data Mining, Machine Learning, Natural Language Processing, Data Analysis and Visualization, and Python.
Here is a great list of useful, open-source resources for a self-study towards Data Science MS, assembled by Data Scientist Clare Corthell,
@clarecorthell.
See also
- Harvard CS109 Data Science Course, Resources Free and Online
- Learning from Data at edX, taught by Caltech professor Yaser Abu-Mostafa,
- KDnuggets Home :: FAQ :: Learning Data Mining and Data Science
Intro to Data Science
UW / Coursera. Topics: Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization.
Math
- Linear Algebra / Levandosky Stanford / Book
- Linear Programming (Math 407) University of Washington / Course
- Statistics Stats in a Nutshell / Book
- Forecasting: Principles and Practice Monash University / Book *uses R
- Problem-Solving Heuristics "How To Solve It" Polya / Book
- Coding the Matrix: Linear Algebra through Computer Science Applications Brown / Coursera
- Think Bayes Allen Downey / Book
Computing
- Algorithms
- Algorithms Design & Analysis I Stanford / Coursera
- Algorithm Design Kleinberg & Tardos / Book
- Databases
- Introduction to Databases Stanford / Coursera
- SQL Tutorial W3Schools / Tutorials
- Data Mining
- Mining Massive Data Sets Stanford / Book
- Mining The Social Web O'Reilly / Book
- Introduction to Information Retrieval Stanford / Book
- Machine Learning
- Machine Learning / Ng Stanford / Coursera
- Programming Collective Intelligence O'Reilly / Book
- Statistics The Elements of Statistical Learning
- Probabilistic Graphical Models
- Probabilistic Programming and Bayesian Methods for Hackers Github / Tutorials
- PGMs / Koller Stanford / Coursera
- Natural Language Processing: NLP with Python O'Reilly / Book
- Data Analysis
- Python for Data Analysis O'Reilly / Book
- Big Data Analysis with Twitter UC Berkeley / Lectures
- Social and Economic Networks: Models and Analysis / Stanford / Coursera
- Information Visualization "Envisioning Information" Tufte / Book
- Learning Python: Learn Python the Hard Way, Google's Python Class
- Python (Libraries for Data Science)
- Basic Packages Python, virtualenv, NumPy, SciPy, matplotlib and IPython
- Data Science in iPython Notebooks (Linear Regression, Logistic Regression, Random Forests, K-Means Clustering)
- Bayesian Inference | pymc
- Labeled data structures objects, statistical functions, etc pandas (See: Python for Data Analysis)
- Python wrapper for the Twitter API twython
- Tools for Data Mining & Analysis scikit-learn
- Network Modeling & Viz networkx
- Natural Language Toolkit NLTK
For a final project, do a competition - plenty to choose on
For a full list and additional resources, see
Most popular last 30 days
Most viewed last 30 days
- 50+ Data Science and Machine Learning Cheat Sheets - Jul 14, 2015.
- R vs Python for Data Science: The Winner is ... - May 26, 2015.
- 9 Must-Have Skills You Need to Become a Data Scientist - Nov 22, 2014.
- Top 10 Data Analysis Tools for Business - Jun 13, 2014.
- Top 20 Python Machine Learning Open Source Projects - Jun 1, 2015.
- Data is Ugly - Tales of Data Cleaning - Aug 1, 2015.
- Data Science, Analytics, & Data Mining Online Degrees and Certificates - Aug 13, 2015.
- How to become a Data Scientist for Free - Aug 28, 2015.
Most shared last 30 days
- Impact of IoT on Big Data Landscape - Jul 29, 2015.
- Data is Ugly - Tales of Data Cleaning - Aug 1, 2015.
- RightRelevance helps find key topics, top influencers in Big Data, Data Science, and Beyond - Aug 11, 2015.
- HeroX Cognitive Computing Challenge: read a document, load database with results - Aug 19, 2015.
- New Standard Methodology for Analytical Models - Aug 3, 2015.
- Paradoxes of Data Science - Aug 21, 2015.
- Overcoming Overfitting with the reusable holdout: Preserving validity in adaptive data analysis - Aug 12, 2015.