50+ Data Science and Machine Learning Cheat Sheets
Gear up to speed and have Data Science & Data Mining concepts and commands handy with these cheatsheets covering R, Python, Django, MySQL, SQL, Hadoop, Apache Spark and Machine learning algorithms.
There are thousands of packages and hundreds of functions out there in the Data science world! An aspiring data enthusiast need not know all. Here are the most important ones that have been brainstormed and captured in a compact few pages.
Mastering Data science involves understanding of statistics, Mathematics, Programming knowledge especially in R, Python & SQL and then deploying a combination of all these to derive insights using the business understanding & a human instinct—that drives decisions.
Here are the cheatsheets by category:
Cheat sheets for Python:
Python is a popular choice for beginners, yet still powerful enough to back some of the world’s most popular products and applications. It's design makes the programming experience feel almost as natural as writing in English. Python basics or Python Debugger cheatsheets for beginners covers important syntax to get started. Communityprovided libraries such as numpy, scipy, scikit and pandas are highly relied on and the NumPy/SciPy/Pandas Cheat Sheet provides a quick refresher to these.
Cheat sheets for R:
The R's ecosystem has been expanding so much that a lot of referencing is needed. The R Reference Card covers most of the R world in few pages.The Rstudio has also published a series of cheatsheets to make it easier for the R community. The data visualization with ggplot2 seems to be a favorite as it helps when you are working on creating graphs of your results.
Cheat sheets for MySQL & SQL:
For a data scientist basics of SQL are as important as any other language as well. Both PIG and Hive Query Language are closely associated with SQL the original Structured Query Language. SQL cheatsheets provide a 5 minute quick guide to learning it and then you may explore Hive & MySQL!
Cheat sheets for Spark:
Apache Spark is an engine for largescale data processing. For certain applications, such as iterative machine learning, Spark can be up to 100x faster than Hadoop (using MapReduce). The essentials of Apache Spark cheatsheet explains its place in the big data ecosystem, walks through setup and creation of a basic Spark application, and explains commonly used actions and operations.
Cheat sheets for Hadoop & Hive:
Hadoop emerged as an untraditional tool to solve what was thought to be unsolvable by providing an open source software framework for the parallel processing of massive amounts of data. Explore the Hadoop cheatsheets to find out Useful commands when using Hadoop on the command line. A combination of SQL & Hive functions is another one to check out.
Cheat sheets for Machine learning:
We often find ourselves spending time thinking which algorithm is best? And then go back to our big books for reference! These cheat sheets gives an idea about both the nature of your data and the problem you're working to address, and then suggests an algorithm for you to try.
Cheat sheets for Django :
Django is a free and open source web application framework, written in Python. If you are new to Django, you can go over these cheatsheets and brainstorm quick concepts and dive in each one to a deeper level.
Share more & Learn! Did we miss anything? Add your favorite Cheatsheet in the comments below!
Related:
Mastering Data science involves understanding of statistics, Mathematics, Programming knowledge especially in R, Python & SQL and then deploying a combination of all these to derive insights using the business understanding & a human instinct—that drives decisions.
Here are the cheatsheets by category:
Cheat sheets for Python:
Python is a popular choice for beginners, yet still powerful enough to back some of the world’s most popular products and applications. It's design makes the programming experience feel almost as natural as writing in English. Python basics or Python Debugger cheatsheets for beginners covers important syntax to get started. Communityprovided libraries such as numpy, scipy, scikit and pandas are highly relied on and the NumPy/SciPy/Pandas Cheat Sheet provides a quick refresher to these.
 Python 2.7 Quick Reference Sheet
 Python Cheat Sheet by DaveChild
 Python Basics Reference sheet
 Python Debugger Cheatsheet
 NumPy / SciPy / Pandas Cheat Sheet
 Python OverAPI cheatsheet
 Python Decorators cheatsheet
 Python 2.4 Quick Reference Card
 Python 3 Cheat Sheet
 Python Language & Syntax Cheat Sheet
Cheat sheets for R:
The R's ecosystem has been expanding so much that a lot of referencing is needed. The R Reference Card covers most of the R world in few pages.The Rstudio has also published a series of cheatsheets to make it easier for the R community. The data visualization with ggplot2 seems to be a favorite as it helps when you are working on creating graphs of your results.
 R cheat sheet (Google Drive)
 R functions for Regression Analysis
 R Reference Card
 R functions for Time series Analysis
 R Reference Card for Data Mining
 R Cheat Sheet
 Data Analysis the data.table way
 Interactive Web Apps cheatsheet by R studio
 Data Visualisation with ggplot2 cheatsheet by R studio
 Package Development with devtools cheatsheet by R studio
 Data Wrangling cheatsheet
 R markdown cheatsheet
 R Markdown Reference guide
 R Data Management cheatsheet
 R Cheatsheet for graphical parameters
Cheat sheets for MySQL & SQL:
For a data scientist basics of SQL are as important as any other language as well. Both PIG and Hive Query Language are closely associated with SQL the original Structured Query Language. SQL cheatsheets provide a 5 minute quick guide to learning it and then you may explore Hive & MySQL!
 MySQL Cheatsheet by Dave child
 SQL Cheat sheet
 SQL in one page
 MySQL Reference guide
 Visual SQL Joins
 SQL for dummies
Cheat sheets for Spark:
Apache Spark is an engine for largescale data processing. For certain applications, such as iterative machine learning, Spark can be up to 100x faster than Hadoop (using MapReduce). The essentials of Apache Spark cheatsheet explains its place in the big data ecosystem, walks through setup and creation of a basic Spark application, and explains commonly used actions and operations.
 https://dzone.com/refcardz/apachespark
 Scala cheatsheets 1
 Scala cheatsheets 2
 Scala from DZone Reference Card
 Spark cheatsheet on github
 Scala on Spark Cheatsheet
 Essential Apache Spark cheatsheet by MapR
Cheat sheets for Hadoop & Hive:
Hadoop emerged as an untraditional tool to solve what was thought to be unsolvable by providing an open source software framework for the parallel processing of massive amounts of data. Explore the Hadoop cheatsheets to find out Useful commands when using Hadoop on the command line. A combination of SQL & Hive functions is another one to check out.
 Hadoop for Dummies cheatsheet
 Getting Started Apache Hadoop Reference Card
 Hadoop Command Line cheatsheet
 Working with HDFS from the command line  Hadoop Cheat sheet
 Hive Function cheatsheet
 SQL to Hive cheatsheet
Cheat sheets for Machine learning:
We often find ourselves spending time thinking which algorithm is best? And then go back to our big books for reference! These cheat sheets gives an idea about both the nature of your data and the problem you're working to address, and then suggests an algorithm for you to try.
 Choosing the right estimator Machine Learning cheatsheet
 Patterns for Predictive learning cheatsheet
 Machine learning algorithm cheat sheet for Microsoft Azure
 Machine Learning cheatsheet Github 1
 Machine Learning cheatsheet Github 2
 Machine Learning which algorithm performs best?
 Cheat sheet 10 machine learning algorithms R commands
 Patterns for Predictive Analytics
Cheat sheets for Django :
Django is a free and open source web application framework, written in Python. If you are new to Django, you can go over these cheatsheets and brainstorm quick concepts and dive in each one to a deeper level.
 Django cheat sheet v.1
 Django cheatsheet 1
 Django cheatsheet 2
 Django cheatsheet 3
 Django cheatsheet 4
 Django Reference Cheatsheet
 Django Quick start guide & Cheatsheet
 Flask Cheatsheet
Share more & Learn! Did we miss anything? Add your favorite Cheatsheet in the comments below!
Related:
 Guide to Data Science cheat sheets
 Top 20 R packages by popularity
 150 Most Influential People in Big Data & Hadoop
Top Stories Past 30 Days

