DataCamp - Easiest Way to Learn Data Science
Learning R? Take this free
Intro to R for Data Science Tutorial.
Learning Python? Take this free
Intro to Python for Data Science Tutorial.
Check also these fantastic posts:
R Learning Path: From beginner to expert in R in 7 steps
Comprehensive Guide to Learning Python for Data Science
Getting Started with Deep Learning - Mar 24, 2017.
This post approaches getting started with deep learning from a framework perspective. Gain a quick overview and comparison of available tools for implementing neural networks to help choose what's right for you.
Unsupervised Investments: A Comprehensive Guide to AI Investors - Mar 24, 2017.
This article presents a list of 80 funds investing in Artificial Intelligence and Machine Learning.
What Is Data Science, and What Does a Data Scientist Do? - Mar 23, 2017.
This article is intended to help define the data scientist role, including typical skills, qualifications, education, experience, and responsibilities. This definition is somewhat loose, and given that the ideal experience and skill set is relatively rare to find in one individual.
Statistical Modeling: A Primer - Mar 21, 2017.
It's critical to understand that statistical models are simplified representations of reality and they're all wrong but some of them are useful. So why do we use statistical models?
Getting Up Close and Personal with Algorithms - Mar 21, 2017.
We've put together a brief summary of the top algorithms used in predictive analysis, which you can see just below. Read to learn more about Linear Regression, Logistic Regression, Decision Trees, Random Forests, Gradient Boosting, and more.
Analytics 101: Comparing KPIs - Mar 20, 2017.
Different business units in the organisation have different behaviours (e.g. turnover rate) and they can’t be compared with each other. So, how can we tell whether the changes in their behaviour are reasons for concern?
The Most Underutilized Function in SQL - Mar 20, 2017.
Find out why md5() is an SQL function that's used surprisingly often, and find out how -- and why -- you can use it yourself.
Email Spam Filtering: An Implementation with Python and Scikit-learn - Mar 17, 2017.
This post is an overview of a spam filtering implementation using Python and Scikit-learn. The results of 2 classifiers are contrasted and compared: multinomial Naive Bayes and support vector machines.
Applying Machine Learning To March Madness - Mar 16, 2017.
March Madness is upon us. But before you get your brackets set, check out this overview of using machine learning to do the heavy lifting for you. A great discussion, and a timely topic.
50 Companies Leading The AI Revolution, Detailed - Mar 16, 2017.
We detail 50 companies leading the Artificial Intelligence revolution in AD Sales, CRM, Autotech, Business Intelligence and analytics, Commerce, Conversational AI/Bots, Core AI, Cyber-Security, Fintech, Healthcare, IoT, Vision, and other areas.
17 More Must-Know Data Science Interview Questions and Answers, Part 3 - Mar 15, 2017.
The third and final part of 17 new must-know Data Science interview questions and answers covers A/B testing, data visualization, Twitter influence evaluation, and Big Data quality.
Homebrewed Deep Learning and Do-It-Yourself Robotics - Mar 14, 2017.
Learn how to experiment with embodied robotic cognition with IBM Project Intu, a platform that extends Deep Learning and other cognitive services to new devices with minimum coding.
Open Source Toolkits for Speech Recognition - Mar 14, 2017.
This article reviews the main options for free speech recognition toolkits that use traditional Hidden Markov Models and n-gram language models.
Toward Increased k-means Clustering Efficiency with the Naive Sharding Centroid Initialization Method - Mar 13, 2017.
What if a simple, deterministic approach which did not rely on randomization could be used for centroid initialization? Naive sharding is such a method, and its time-saving and efficient results, though preliminary, are promising.
Working With Numpy Matrices: A Handy First Reference - Mar 10, 2017.
This introductory tutorial does a great job of outlining the most common Numpy array creation and manipulation functionality. A good post to keep handy while taking your first steps in Numpy, or to use as a handy reminder.
Visualizing Time-Series Change - Mar 9, 2017.
When creating time-series line charts, it’s important to consider which of the following messages you would like to communicate: Actual value of units? Change in absolute units? Percent change? Change from a specific point in time?
Beginner’s Guide to Customer Segmentation - Mar 9, 2017.
At the core of customer segmentation is being able to identify different types of customers and then figure out ways to find more of those individuals so you can... you guessed it, get more customers!
What makes a good data visualization – a Data Scientist perspective - Mar 8, 2017.
We examine principles of good data visualization, including some great and terrible examples, guidelines for human perception, focus on key variables, changes and trends, avoiding chart junk, and more.
The Challenges of Building a Predictive Churn Model - Mar 8, 2017.
Unlike other data science problems, there is no one method for predicting which customers are likely to churn in the next month. Here we review the most popular approaches.
Building Regression Models in R using Support Vector Regression - Mar 8, 2017.
The article studies the advantage of Support Vector Regression (SVR) over Simple Linear Regression (SLR) models for predicting real values, using the same basic idea as Support Vector Machines (SVM) use for classification.
Neuroscience for Data Scientists: Understanding Human Behaviour - Mar 8, 2017.
Neuroscience is very complex and advanced study of brain and people often misuse this term. Here we try to explain neuroscience terminologies and use of data science for such studies.
K-Means & Other Clustering Algorithms: A Quick Intro with Python - Mar 8, 2017.
In this intro cluster analysis tutorial, we'll check out a few algorithms in Python so you can get a basic understanding of the fundamentals of clustering on a real dataset.
A Simple XGBoost Tutorial Using the Iris Dataset - Mar 7, 2017.
This is an overview of the XGBoost machine learning algorithm, which is fast and shows good results. This example uses multiclass prediction with the Iris dataset from Scikit-learn.
Bokeh Cheat Sheet: Data Visualization in Python - Mar 3, 2017.
Bokeh is the Python data visualization library that enables high-performance visual presentation of large datasets in modern web browsers. The package is flexible and offers lots of possibilities to visualize your data in a compelling way, but can be overwhelming.
Every Intro to Data Science Course on the Internet, Ranked - Mar 2, 2017.
For this guide, I spent 10+ hours trying to identify every online intro to data science course offered as of January 2017, extracting key bits of information from their syllabi and reviews, and compiling their ratings.
Building a Bot to Answer FAQs: Predicting Text Similarity - Mar 2, 2017.
In this post, learn to build a bot to answer frequently asked questions, reducing lag time for more customers and taking the load off of engineers, ensuring they can concentrate on building products!
What is Customer Churn Modeling? Why is it valuable? - Mar 1, 2017.
Getting new customers is much more more expensive than retaining existing ones, so reducing churn is a top priority for many firms. Understanding why customers churn and estimating the risks are powerful components of a data-driven retention strategy.
7 More Steps to Mastering Machine Learning With Python - Mar 1, 2017.
This post is a follow-up to last year's introductory Python machine learning post, which includes a series of tutorials for extending your knowledge beyond the original.
50+ Useful Machine Learning & Prediction APIs, updated - Feb 8, 2017.
Very useful, updated list of 50+ APIs in machine learning, prediction, text analytics & classification, face recognition, language translation, and more.
- What I Learned Implementing a Classifier from Scratch in Python
An Overview of Python Deep Learning Frameworks
Read this concise overview of leading Python deep learning frameworks, including Theano, Lasagne, Blocks, TensorFlow, Keras, MXNet, and PyTorch.
- Moving from R to Python: The Libraries You Need to Know
- What is a Support Vector Machine, and Why Would I Use it?
- Introduction to Correlation
- The Gentlest Introduction to Tensorflow – Part 4
- 17 More Must-Know Data Science Interview Questions and Answers, Part 2
- The Gentlest Introduction to Tensorflow – Part 3
- Stacking Models for Improved Predictions
- Introduction to Natural Language Processing, Part 1: Lexical Units
Removing Outliers Using Standard Deviation in Python
Standard Deviation is one of the most underrated statistical tools out there. It’s an extremely useful metric that most people know how to calculate but very few know how to use effectively.
- Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory
- Natural Language Processing Key Terms, Explained
17 More Must-Know Data Science Interview Questions and Answers
17 new must-know Data Science Interview questions and answers include lessons from failure to predict 2016 US Presidential election and Super Bowl LI comeback, understanding bias and variance, why fewer predictors might be better, and how to make a model more robust to outliers.
- The Internet of Things vs. Related Concepts and Terms
- Web Scraping for Dataset Curation, Part 2: Tidying Craft Beer Data
- Web Scraping for Dataset Curation, Part 1: Collecting Craft Beer Data
- The Data Science of NYC Taxi Trips: An Analysis & Visualization
- Automatically Segmenting Data With Clustering
- 50+ Useful Machine Learning & Prediction APIs, updated
- Regression Analysis: A Primer
5 Career Paths in Big Data and Data Science, Explained
Sexiest job... massive shortage... blah blah blah. Are you looking to get a real handle on the career paths available in "Data Science" and "Big Data?" Read this article for insight on where to look to sharpen the required entry-level skills.
- Learning to Learn by Gradient Descent by Gradient Descent
- Identifying Variables That Might Be Better Predictors