# Tutorials, Overviews

**DataCamp - Easiest Way to Learn Data Science**

Learning R? Take this .
Intro to R for Data Science Tutorial |
Learning Python? Take this.
Intro to Python for Data Science Tutorial |

Check also these fantastic posts:

**R Learning Path: From beginner to expert in R in 7 steps**

**Comprehensive Guide to Learning Python for Data Science**

### Latest:

**Become a Pro at Pandas, Python’s Data Manipulation Library**- Jun 13, 2019.

Pandas is one of the most popular Python libraries for cleaning, transforming, manipulating and analyzing data. Learn how to efficiently handle large amounts of data using Pandas.**Scalable Python Code with Pandas UDFs: A Data Science Application**- Jun 13, 2019.

There is still a gap between the corpus of libraries that developers want to apply in a scalable runtime and the set of libraries that support distributed execution. This post discusses how bridge this gap using the the functionality provided by Pandas UDFs in Spark 2.3+**All Models Are Wrong – What Does It Mean?**- Jun 12, 2019.

During your adventures in data science, you may have heard “all models are wrong.” Let’s unpack this famous quote to understand how we can still make models that are useful.**Overview of Different Approaches to Deploying Machine Learning Models in Production**- Jun 12, 2019.

Learn the different methods for putting machine learning models into production, and to determine which method is best for which use case.**How to Automate Hyperparameter Optimization**- Jun 12, 2019.

A step-by-step guide into performing a hyperparameter optimization task on a deep learning model by employing Bayesian Optimization that uses the Gaussian Process. We used the gp_minimize package provided by the Scikit-Optimize (skopt) library to perform this task.**3 Main Approaches to Machine Learning Models**- Jun 11, 2019.

Machine learning encompasses a vast set of conceptual approaches. We classify the three main algorithmic methods based on mathematical foundations to guide your exploration for developing models.**If you’re a developer transitioning into data science, here are your best resources**- Jun 11, 2019.

This article will provide a background on the data scientist role and why your background might be a good fit for data science, plus tangible stepwise actions that you, as a developer, can take to ramp up on data science.**5 Ways to Deal with the Lack of Data in Machine Learning**- Jun 10, 2019.

Effective solutions exist when you don't have enough data for your models. While there is no perfect approach, five proven ways will get your model to production.**Choosing an Error Function**- Jun 10, 2019.

The error function expresses how much we care about a deviation of a certain size. The choice of error function depends entirely on how our model will be used.**Random Forests vs Neural Networks: Which is Better, and When?**- Jun 7, 2019.

Random Forests and Neural Network are the two widely used machine learning algorithms. What is the difference between the two approaches? When should one use Neural Network or Random Forest?**PyViz: Simplifying the Data Visualisation Process in Python**- Jun 6, 2019.

There are python libraries suitable for basic data visualizations but not for complicated ones, and there are libraries suitable only for complex visualizations. Is there a single library that handles both these tasks efficiently? The answer is yes. It's PyViz**Jupyter Notebooks: Data Science Reporting**- Jun 6, 2019.

Jupyter does bring us some benefits of being able to organize code but many of us still find ourselves with messy and unnecessary code chunks. Here are some ways including a NEW EXTENSION that anyone can use to begin organizing your code on your notebooks.**Mongo DB Basics**- Jun 5, 2019.

Mongo DB is a document oriented NO SQL database unlike HBASE which has a wide column store. The advantage of Document oriented over relation type is the columns can be changed as an when required for each case as opposed to the same column name for all the rows.**The Whole Data Science World in Your Hands**- Jun 5, 2019.

Testing MatrixDS capabilities on different languages and tools: Python, R and Julia. If you work with data you have to check this out.**How to choose a visualization**- Jun 4, 2019.

Visualizations based on the structure of data are needed during analysis, which might be different than for the end user. A new guide for choosing the right visualization helps you flexibly understand the data first.**Separating signal from noise**- Jun 4, 2019.

When we are building a model, we are making the assumption that our data has two parts, signal and noise. Signal is the real pattern, the repeatable process that we hope to capture and describe. The noise is everything else that gets in the way of that.**The Hitchhiker’s Guide to Feature Extraction**- Jun 3, 2019.

Check out this collection of tricks and code for Kaggle and everyday work.**7 Steps to Mastering Intermediate Machine Learning with Python — 2019 Edition**- Jun 3, 2019.

This is the second part of this new learning path series for mastering machine learning with Python. Check out these 7 steps to help master intermediate machine learning with Python!

### May:

**Why physical storage of your database tables might matter****Understanding Backpropagation as Applied to LSTM****Who is your Golden Goose?: Cohort Analysis****Animations with Matplotlib****Becoming a Level 3.0 Data Scientist****Choosing Between Model Candidates****3 Machine Learning Books that Helped me Level Up as a Data Scientist****Boost Your Image Classification Model****Careful! Looking at your model results too much can cause information leakage****Analyzing Tweets with NLP in Minutes with Spark, Optimus and Twint****Your Guide to Natural Language Processing (NLP)****End-to-End Machine Learning: Making videos from images****When Too Likely Human Means Not Human: Detecting Automatically Generated Text****Extracting Knowledge from Knowledge Graphs Using Facebook’s Pytorch-BigGraph****Probability Mass and Density Functions****Building a Computer Vision Model: Approaches and datasets****Autoencoders: Deep Learning with TensorFlow’s Eager Execution****60+ useful graph visualization libraries**We outline 60+ graph visualization libraries that allow users to build applications to display and interact with network representations of data.**PyCharm for Data Scientists****7 Steps to Mastering SQL for Data Science — 2019 Edition**Follow these updated 7 steps to go from SQL data science newbie to practitioner in a hurry. We consider only the necessary concepts and skills, and provide quality resources for each.**A complete guide to K-means clustering algorithm****Large-Scale Evolution of Image Classifiers****Building Recommender systems with Azure Machine Learning service****Mathematical programming — Key Habit to Build Up for Advancing Data Science**We show how, by simulating the random throw of a dart, you can compute the value of pi approximately. This is a small step towards building the habit of mathematical programming, which should be a key skill in the repertoire of a budding data scientist.**Customer Churn Prediction Using Machine Learning: Main Approaches and Models****A Complete Exploratory Data Analysis and Visualization for Text Data: Combine Visualization and NLP to Generate Insights****How to fix an Unbalanced Dataset****Linear Programming and Discrete Optimization with Python using PuLP****Intro to XGBoost: Predicting Life Expectancy with Supervised Learning****Best US/Canada Masters in Analytics, Business Analytics, Data Science****How to Automate Tasks on GitHub With Machine Learning for Fun and Profit****Modeling Price with Regularized Linear Model & XGBoost****How to correctly select a sample from a huge dataset in machine learning**We explain how choosing a small, representative dataset from a large population can improve model training reliability.**Build Your First Chatbot Using Python & NLTK**