Guide to Data Science Cheat Sheets
Tags: Cheat Sheet, Data Science, Python, R, SQL
Selection of the most useful Data Science cheat sheets, covering SQL, Python (including NumPy, SciPy and Pandas), R (including Regression, Time Series, Data Mining), MATLAB, and more.
By Ajay Ohri, May 2014.
Over the past few years, as the buzz and apparently the demand for data scientists has continued to grow, people are eager to learn how to join, learn, advance and thrive in this seemingly lucrative profession. As someone who writes on analytics and occasionally teaches it, I am often asked - How do I become a data scientist?
Adding to the complexity of my answer is data science seems to be a multi-disciplinary field, while the university departments of statistics, computer science and management deal with data quite differently.
But to cut the marketing created jargon aside, a data scientist is simply a person who can write code in a few languages (primarily R, Python and SQL) for data querying, manipulation , aggregation, and visualization using enough statistical knowledge to give back actionable insights to the business for making decisions.
Since this rather practical definition of a data scientist is reinforced by the accompanying words on a job website for “data scientists” , ergo, here are some tools for learning the primary languages in data science- Python, R and SQL. A cheat sheet or reference card is a compilation of mostly used commands to help you learn that language’s syntax at a faster rate.
The inclusion of SQL may lead to some to feel surprised (isn’t this the NoSQL era?) , but it is there for a logical reason. Both PIG and Hive Query Language are closely associated with SQL- the original Structured Query Language. In addition one can solely use the sqldf package within R (and the less widely used python-sql or python-sqlparse libraries for Pythonic data scientists) or even the Proc SQL commands within the old champion language SAS, and do most of what a data scientist is expected to do (at least in data munging).
For Python, this is a rather partial list given the fact that Python, the most general purpose language within the data scientist quiver, can be used for many things. But for the data scientist, the packages of numpy, scipy , pandas and scikit-learn seem the most pertinent.
Do all the thousands of R packages have useful interest to the aspiring data scientist? No.
Accordingly we chose the appropriate cheat sheets for you. Note that this is a curated list of lists. If there is anything that can be assumed in the field of data science, it should be that the null hypothesis is that the data scientist is intelligent enough to make his own decisions based on data and it’s context. 3 printouts is all it takes to speed up the aspiring data scientist’s journey.
Please add additional cheat sheets in comments below.
Cheat Sheets for Python
Cheat Sheets for R
Cross Reference between R, Python (and Matlab)
Cheat Sheets for SQL
Additional
Ajay Ohri is a popular writer and blogger on Analytics and Data Mining and is the author of R for Business Analytics book (Springer, 2012).
Over the past few years, as the buzz and apparently the demand for data scientists has continued to grow, people are eager to learn how to join, learn, advance and thrive in this seemingly lucrative profession. As someone who writes on analytics and occasionally teaches it, I am often asked - How do I become a data scientist?
Adding to the complexity of my answer is data science seems to be a multi-disciplinary field, while the university departments of statistics, computer science and management deal with data quite differently.
But to cut the marketing created jargon aside, a data scientist is simply a person who can write code in a few languages (primarily R, Python and SQL) for data querying, manipulation , aggregation, and visualization using enough statistical knowledge to give back actionable insights to the business for making decisions.
Since this rather practical definition of a data scientist is reinforced by the accompanying words on a job website for “data scientists” , ergo, here are some tools for learning the primary languages in data science- Python, R and SQL. A cheat sheet or reference card is a compilation of mostly used commands to help you learn that language’s syntax at a faster rate.
The inclusion of SQL may lead to some to feel surprised (isn’t this the NoSQL era?) , but it is there for a logical reason. Both PIG and Hive Query Language are closely associated with SQL- the original Structured Query Language. In addition one can solely use the sqldf package within R (and the less widely used python-sql or python-sqlparse libraries for Pythonic data scientists) or even the Proc SQL commands within the old champion language SAS, and do most of what a data scientist is expected to do (at least in data munging).
For Python, this is a rather partial list given the fact that Python, the most general purpose language within the data scientist quiver, can be used for many things. But for the data scientist, the packages of numpy, scipy , pandas and scikit-learn seem the most pertinent.
Do all the thousands of R packages have useful interest to the aspiring data scientist? No.
Accordingly we chose the appropriate cheat sheets for you. Note that this is a curated list of lists. If there is anything that can be assumed in the field of data science, it should be that the null hypothesis is that the data scientist is intelligent enough to make his own decisions based on data and it’s context. 3 printouts is all it takes to speed up the aspiring data scientist’s journey.
Please add additional cheat sheets in comments below.
Cheat Sheets for Python
- Python www.astro.up.pt/~sousasag/Python_For_Astronomers/Python_qr.pdf
- NumPy, SciPy and Pandas s3.amazonaws.com/quandl-static-content/Documents/Quandl+-+Pandas,+SciPy,+NumPy+Cheat+Sheet.pdf
Cheat Sheets for R
- Short Reference Card cran.r-project.org/doc/contrib/Short-refcard.pdf
- R Functions for Regression Analysis cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf
- Time Series cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
- Data Mining cran.r-project.org/doc/contrib/YanchangZhao-refcard-data-mining.pdf
- Quandl s3.amazonaws.com/quandl-static-content/Documents/Quandl+-+R+Cheat+Sheet.pdf
Cross Reference between R, Python (and Matlab)
Cheat Sheets for SQL
- SQL Joins www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
- SQL and Hive hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
Additional
- Cheat Sheets for Java introcs.cs.princeton.edu/java/11cheatsheet/
- Linux Cheat Sheet www.linuxstall.com/linux-command-line-tips-that-every-linux-user-should-know/
Ajay Ohri is a popular writer and blogger on Analytics and Data Mining and is the author of R for Business Analytics book (Springer, 2012).
Most popular last 30 days
Most viewed last 30 days
- The Grammar of Data Science: Python vs R - Mar 28, 2015.
- Awesome Public Datasets on GitHub - Apr 6, 2015.
- More Free Data Mining, Data Science Books and Resources - Mar 25, 2015.
- 10 things statistics taught us about big data analysis - Feb 10, 2015.
- Forrester Wave(tm) Big Data Predictive Analytics 2015: Gainers and Losers - Apr 3, 2015.
- 7 Steps for Learning Data Mining and Data Science - Oct 10, 2013.
- Top 10 Data Analysis Tools for Business - Jun 13, 2014.
- Deep Learning for Text Understanding from Scratch - Mar 13, 2015.
- 9 Must-Have Skills You Need to Become a Data Scientist - Nov 22, 2014.
- 7 common mistakes when doing Machine Learning - Mar 7, 2015.
Most shared last 30 days
- Forrester Wave(tm) Big Data Predictive Analytics 2015: Gainers and Losers - Apr 3, 2015.
- Cloud Machine Learning Wars: Amazon vs IBM Watson vs Microsoft Azure - Apr 16, 2015.
- Awesome Public Datasets on GitHub - Apr 6, 2015.
- Top LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science - from "Big Bang" to Now - Apr 19, 2015.
- The Myth of Model Interpretability - Apr 27, 2015.
- Top 10 R Packages to be a Kaggle Champion - Apr 21, 2015.
- Data Science 101: Preventing Overfitting in Neural Networks - Apr 17, 2015.
- Deep Learning to Fight Crime - Apr 22, 2015.
- Cartoon: A solution for Data Scientists allergies caused by Big Data - Apr 17, 2015.
- Top stories for Apr 19-25: Top LinkedIn Groups for Analytics, Big Data, Data Mining; 10 R Packages for a Kaggle Champion - Apr 26, 2015.