2017 Feb Tutorials, Overviews
All (93) | Courses, Education (8) | Meetings (11) | News, Features (17) | Opinions, Interviews (25) | Software (4) | Tutorials, Overviews (22) | Webcasts & Webinars (6)
- Moving from R to Python: The Libraries You Need to Know
- Feb 24, 2017.
Are you considering making a move from R to Python? Here are the libraries you need to know, how they stack up to their R contemporaries, and why you should learn them.
- What is a Support Vector Machine, and Why Would I Use it?
- Feb 23, 2017.
Support Vector Machine has become an extremely popular algorithm. In this post I try to give a simple explanation for how it works and give a few examples using the the Python Scikits libraries.
- Introduction to Correlation
- Feb 22, 2017.
Correlation is one of the most widely used (and widely misunderstood) statistical concepts. We provide the definitions and intuition behind several types of correlation and illustrate how to calculate correlation using the Python pandas library.
- The Gentlest Introduction to Tensorflow – Part 4
- Feb 22, 2017.
This post is the fourth entry in a series dedicated to introducing newcomers to TensorFlow in the gentlest possible manner, and focuses on logistic regression for classifying the digits of 0-9.
- 17 More Must-Know Data Science Interview Questions and Answers, Part 2
- Feb 22, 2017.
The second part of 17 new must-know Data Science Interview questions and answers we cover overfitting, ensemble methods, feature selection, ground truth in unsupervised learning, the curse of dimensionality, and parallel algorithms.
- The Gentlest Introduction to Tensorflow – Part 3
- Feb 21, 2017.
This post is the third entry in a series dedicated to introducing newcomers to TensorFlow in the gentlest possible manner. This entry progresses to multi-feature linear regression.
- Stacking Models for Improved Predictions
- Feb 21, 2017.
This post presents an example of regression model stacking, and proceeds by using XGBoost, Neural Networks, and Support Vector Regression to predict house prices.
- Introduction to Natural Language Processing, Part 1: Lexical Units
- Feb 16, 2017.
This series explores core concepts of natural language processing, starting with an introduction to the field and explaining how to identify lexical units as a part of data preprocessing.
- Removing Outliers Using Standard Deviation in Python
- Feb 16, 2017.
Standard Deviation is one of the most underrated statistical tools out there. It’s an extremely useful metric that most people know how to calculate but very few know how to use effectively.
- Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory
- Feb 16, 2017.
Apache Parquet and Apache Arrow both focus on improving performance and efficiency of data analytics. These two projects optimize performance for on disk and in-memory processing
- Natural Language Processing Key Terms, Explained
- Feb 16, 2017.
This post provides a concise overview of 18 natural language processing terms, intended as an entry point for the beginner looking for some orientation on the topic.
- 17 More Must-Know Data Science Interview Questions and Answers
- Feb 15, 2017.
17 new must-know Data Science Interview questions and answers include lessons from failure to predict 2016 US Presidential election and Super Bowl LI comeback, understanding bias and variance, why fewer predictors might be better, and how to make a model more robust to outliers.
- The Internet of Things vs. Related Concepts and Terms
- Feb 14, 2017.
This post attempts to provide some insights on the differences between IoT and the related technologies of M2M, CPS, and WoT, based on literature texts, but also the author's experience from projects and application deployments.
- Web Scraping for Dataset Curation, Part 2: Tidying Craft Beer Data
- Feb 14, 2017.
This is the second part in a 2 part series on curating data from the web. The first part focused on web scraping, while this post details the process of tidying scraped data after the fact.
- Web Scraping for Dataset Curation, Part 1: Collecting Craft Beer Data
- Feb 13, 2017.
This post is the first in a 2 part series on scraping and cleaning data from the web using Python. This first part is concerned with the scraping aspect, while the second part while focus on the cleaning. A concrete example is presented.
- The Data Science of NYC Taxi Trips: An Analysis & Visualization
- Feb 10, 2017.
This post outlines using Google BigQuery for an analysis of NYC Taxi Trips in the cloud, presenting the analysis and visualization in Tableau Public for readers to interact with.
- Automatically Segmenting Data With Clustering
- Feb 9, 2017.
In this post, we’ll walk through one such algorithm called K-Means Clustering, how to measure its efficacy, and how to choose the sets of segments you generate.
- 50+ Useful Machine Learning & Prediction APIs, updated
- Feb 8, 2017.
Very useful, updated list of 50+ APIs in machine learning, prediction, text analytics & classification, face recognition, language translation, and more.
- Regression Analysis: A Primer
- Feb 6, 2017.
Despite the popularity of Regression, it is also misunderstood. Why? The answer might surprise you: There is no such thing as Regression. Rather, there are a large number of statistical methods that are called Regression, all of which are based on a shared statistical foundation.
- 5 Career Paths in Big Data and Data Science, Explained
- Feb 6, 2017.
Sexiest job... massive shortage... blah blah blah. Are you looking to get a real handle on the career paths available in "Data Science" and "Big Data?" Read this article for insight on where to look to sharpen the required entry-level skills.
- Learning to Learn by Gradient Descent by Gradient Descent
- Feb 2, 2017.
What if instead of hand designing an optimising algorithm (function) we learn it instead? That way, by training on the class of problems we’re interested in solving, we can learn an optimum optimiser for the class!
- Identifying Variables That Might Be Better Predictors
- Feb 2, 2017.
This blog serves to expand on the approach that the data science team uses to identify (and quantify) which variables and metrics are better predictors of performance.