DataCamp - Easiest Way to Learn Data Science
Learning R? Take this free
Intro to R for Data Science Tutorial.
Learning Python? Take this free
Intro to Python for Data Science Tutorial.
Check also these fantastic posts:
R Learning Path: From beginner to expert in R in 7 steps
Comprehensive Guide to Learning Python for Data Science
Introduction to Natural Language Processing, Part 1: Lexical Units - Feb 16, 2017.
This series explores core concepts of natural language processing, starting with an introduction to the field and explaining how to identify lexical units as a part of data preprocessing.
Removing Outliers Using Standard Deviation in Python - Feb 16, 2017.
Standard Deviation is one of the most underrated statistical tools out there. It’s an extremely useful metric that most people know how to calculate but very few know how to use effectively.
Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory - Feb 16, 2017.
Apache Parquet and Apache Arrow both focus on improving performance and efficiency of data analytics. These two projects optimize performance for on disk and in-memory processing
Natural Language Processing Key Terms, Explained - Feb 16, 2017.
This post provides a concise overview of 18 natural language processing terms, intended as an entry point for the beginner looking for some orientation on the topic.
17 More Must-Know Data Science Interview Questions and Answers - Feb 15, 2017.
17 new must-know Data Science Interview questions and answers include lessons from failure to predict 2016 US Presidential election and Super Bowl LI comeback, understanding bias and variance, why fewer predictors might be better, and how to make a model more robust to outliers.
The Internet of Things vs. Related Concepts and Terms - Feb 14, 2017.
This post attempts to provide some insights on the differences between IoT and the related technologies of M2M, CPS, and WoT, based on literature texts, but also the author's experience from projects and application deployments.
Web Scraping for Dataset Curation, Part 2: Tidying Craft Beer Data - Feb 14, 2017.
This is the second part in a 2 part series on curating data from the web. The first part focused on web scraping, while this post details the process of tidying scraped data after the fact.
Web Scraping for Dataset Curation, Part 1: Collecting Craft Beer Data - Feb 13, 2017.
This post is the first in a 2 part series on scraping and cleaning data from the web using Python. This first part is concerned with the scraping aspect, while the second part while focus on the cleaning. A concrete example is presented.
The Data Science of NYC Taxi Trips: An Analysis & Visualization - Feb 10, 2017.
This post outlines using Google BigQuery for an analysis of NYC Taxi Trips in the cloud, presenting the analysis and visualization in Tableau Public for readers to interact with.
Automatically Segmenting Data With Clustering - Feb 9, 2017.
In this post, we’ll walk through one such algorithm called K-Means Clustering, how to measure its efficacy, and how to choose the sets of segments you generate.
52 Useful Machine Learning & Prediction APIs, updated - Feb 8, 2017.
Very useful, updated list of 50+ APIs in machine learning, prediction, text analytics & classification, face recognition, language translation, and more.
Regression Analysis: A Primer - Feb 6, 2017.
Despite the popularity of Regression, it is also misunderstood. Why? The answer might surprise you: There is no such thing as Regression. Rather, there are a large number of statistical methods that are called Regression, all of which are based on a shared statistical foundation.
5 Career Paths in Big Data and Data Science, Explained - Feb 6, 2017.
Sexiest job... massive shortage... blah blah blah. Are you looking to get a real handle on the career paths available in "Data Science" and "Big Data?" Read this article for insight on where to look to sharpen the required entry-level skills.
Learning to Learn by Gradient Descent by Gradient Descent - Feb 2, 2017.
What if instead of hand designing an optimising algorithm (function) we learn it instead? That way, by training on the class of problems we’re interested in solving, we can learn an optimum optimiser for the class!
Identifying Variables That Might Be Better Predictors - Feb 2, 2017.
This blog serves to expand on the approach that the data science team uses to identify (and quantify) which variables and metrics are better predictors of performance.
Deep Learning Research Review: Natural Language Processing
This edition of Deep Learning Research Review explains recent research papers in Natural Language Processing (NLP). If you don't have the time to read the top papers yourself, or need an overview of NLP with Deep Learning, this post is for you.
- Internet of Things Tutorial: IoT Devices and the Semantic Sensor Web
Pandas Cheat Sheet: Data Science and Data Wrangling in Python
The Pandas library can seem very elaborate and it might be hard to find a single point of entry to the material: with other learning materials focusing on different aspects of this library, you can definitely use a reference sheet to help you get the hang of it.
- Artificial Intelligence and Speech Recognition for Chatbots: A Primer
- Why the Data Scientist and Data Engineer Need to Understand Virtualization in the Cloud
- Great Collection of Minimal and Clean Implementations of Machine Learning Algorithms
- Creating Curious Machines: Building Information-seeking Agents
- The Top Predictive Analytics Pitfalls to Avoid
Chatbots on Steroids: 10 Key Machine Learning Capabilities to Fuel Your Chatbot
As chatbots become a common practice, the need for smarter bots arises. Empowering your bot with machine learning capabilities can really differentiate it from the rest. Check out these 10 capabilities to help fuel your chatbot.
- Going to War with the Giants: Automated Machine Learning with MLJAR
- The big data ecosystem for science: X-ray crystallography
- The Current State of Automated Machine Learning
Time Series Analysis: A Primer
Time series analysis is a complex subject but, in short, when we use our usual cross-sectional techniques such as regression on time series data, variables can appear "more significant" than they really are and we are not taking advantage of the information the serial correlation in the data provides.
- 90 Active Blogs on Analytics, Big Data, Data Mining, Data Science, Machine Learning (updated)
- Introduction to Forecasting with ARIMA in R
- A Concise Overview of Recent Advances in Chatbot Technologies
- A Concise Overview of Recent Advances in the Internet of Things (IoT)
- A Concise Overview of Recent Advances in Vehicle Technologies
- Internet of Things Tutorial: WSN and RFID – The Forerunners
- Sound Data Science: Avoiding the Most Pernicious Prediction Pitfall
- Creating Data Visualization in Matplotlib
- Tidying Data in Python
- Generative Adversarial Networks – Hot Topic in Machine Learning
- 3 methods to deal with outliers
Machine Learning and Cyber Security Resources
An overview of useful resources about applications of machine learning and data mining in cyber security, including important websites, papers, books, tutorials, courses, and more.