2017 May Tutorials, Overviews
http likes 234All (114) | Courses, Education (11) | Meetings (16) | News, Features (20) | Opinions, Interviews (22) | Software (5) | Tutorials, Overviews (34) | Webcasts & Webinars (6)
- Data Science for Newbies: An Introductory Tutorial Series for Software Engineers
- May 31, 2017.
This post summarizes and links to the individual tutorials which make up this introductory look at data science for newbies, mainly focusing on the tools, with a practical bent, written by a software engineer from the perspective of a software engineering approach.
- Data preprocessing for deep learning with nuts-ml
- May 30, 2017.
Nuts-ml is a new data pre-processing library in Python for GPU-based deep learning in vision. It provides common pre-processing functions as independent, reusable units. These so called ‘nuts’ can be freely arranged to build data flows that are efficient, easy to read and modify.
- Qualitative Research Methods for Data Science?
- May 30, 2017.
Why on Earth would a data scientist need to know about qualitative research? There are plenty of reasons. Here are a few.
- Must-Know: How to determine the influence of a Twitter user?
- May 30, 2017.
The influence of a Twitter user goes beyond the simple number of followers. We also want to examine how effective are tweets - how likely they are to be retweeted, favorited, or the links inside clicked upon. What exactly is an influential user depends on the definition.
-
Machine Learning Workflows in Python from Scratch Part 1: Data Preparation - May 29, 2017.
This post is the first in a series of tutorials for implementing machine learning workflows in Python from scratch, covering the coding of algorithms and related tools from the ground up. The end result will be a handcrafted ML toolkit. This post starts things off with data preparation. - What is an Ontology? The simplest definition you’ll find… or your money back*
- May 26, 2017.
This post takes the concept of an ontology and presents it in a clear and simple manner, devoid of the complexities that often surround such explanations.
- An Introduction to the MXNet Python API
- May 26, 2017.
This post outlines an entire 6-part tutorial series on the MXNet deep learning library and its Python API. In-depth and descriptive, this is a great guide for anyone looking to start leveraging this powerful neural network library.
- Unsupervised Investments (II): A Guide to AI Accelerators and Incubators
- May 25, 2017.
A meticulously compiled list as extensive as possible of every accelerator, incubator or program the author has read or bumped into over the past months. It looks like there are at least 29 of them. An interesting read for a wide variety of potentially interested parties - far beyond only the investor.
-
Text Mining 101: Mining Information From A Resume - May 24, 2017.
We show a framework for mining relevant entities from a text resume, and how to separation parsing logic from entity specification. - Machine Learning Crash Course: Part 1
- May 24, 2017.
This post, the first in a series of ML tutorials, aims to make machine learning accessible to anyone willing to learn. We’ve designed it to give you a solid understanding of how ML algorithms work as well as provide you the knowledge to harness it in your projects.
- Must-Know: Key issues and problems with A/B testing
- May 22, 2017.
A look at 2 topics in A/B testing: Ensuring that bucket assignment is truly random, and conducting an A/B test on an opt-in feature.
- Simplifying Decision Tree Interpretability with Python & Scikit-learn
- May 19, 2017.
This post will look at a few different ways of attempting to simplify decision tree representation and, ultimately, interpretability. All code is in Python, with Scikit-learn being used for the decision tree modeling.
-
The Best Python Packages for Data Science - May 19, 2017.
This report is the second in a series analyzing data science related topics. This time around, specifically, we rank 15 top Python data science packages, hopefully with results of use to the data science community. -
Descriptive Statistics Key Terms, Explained - May 18, 2017.
This is a collection of 15 basic descriptive statistics key terms, explained in easy to understand language, along with an example and some Python code for computing simple descriptive statistics. - Top Recent Big Data videos on YouTube
- May 17, 2017.
Top viewed videos on Big Data since 2015 include Big Data use cases in psychographics, sports, politics and data monetisation.
- Propensity Scores: A Primer
- May 16, 2017.
Propensity scores are used in quasi-experimental and non-experimental research when the researcher must make causal inferences, for example, that exposure to a chemical increases the risk of cancer.
-
Must-Know: What are common data quality issues for Big Data and how to handle them? - May 16, 2017.
Let's have a look at common quality issues facing Big Data in terms of the key characteristics of Big Data – Volume, Velocity, Variety, Veracity, and Value. -
HDFS vs. HBase : All you need to know - May 15, 2017.
Hadoop Distributed File System (HDFS), and Hbase (Hadoop database) are key components of Big Data ecosystem. This blog explains the difference between HDFS and HBase with real-life use cases where they are best fit. - The Two Phases of Gradient Descent in Deep Learning
- May 12, 2017.
In short, you reach different resting placing with different SGD algorithms. That is, different SGDs just give you differing convergence rates due to different strategies, but we do expect that they all end up at the same results!
- Introducing Dask-SearchCV: Distributed hyperparameter optimization with Scikit-Learn
- May 12, 2017.
We introduce a new library for doing distributed hyperparameter optimization with Scikit-Learn estimators. We compare it to the existing Scikit-Learn implementations, and discuss when it may be useful compared to other approaches.
- The Internet of Things in the Cloud
- May 11, 2017.
Cloud computing is the next evolutionary step in Internet-based computing, which provides the means for delivering ICT resources as a service. Internet-of-Things can benefit from the scalability, performance and pay-as-you-go nature of cloud computing infrastructures.
-
The Guerrilla Guide to Machine Learning with R - May 11, 2017.
This post is a lean look at learning machine learning with R. It is a complete, if very short, course for the quick study hacker with no time (or patience) to spare. - Top 10 Recent AI videos on YouTube
- May 10, 2017.
Top viewed videos on artificial intelligence since 2016 include great talks and lecture series from MIT and Caltech, Google Tech Talks on AI.
- 5 Machine Learning Projects You Can No Longer Overlook, May
- May 10, 2017.
In this month's installment of Machine Learning Projects You Can No Longer Overlook, we find some data preparation and exploration tools, a (the?) reinforcement learning "framework," a new automated machine learning library, and yet another distributed deep learning library.
-
Using Deep Learning To Extract Knowledge From Job Descriptions - May 9, 2017.
We present a deep learning approach to extract knowledge from a large amount of data from the recruitment space. A learning to rank approach is followed to train a convolutional neural network to generate job title and job description embeddings. - Must-Know: How to determine the most useful number of clusters?
- May 9, 2017.
Without knowing the ground truth of a dataset, then, how do we know what the optimal number of data clusters are? We will have a look at 2 particular popular methods for attempting to answer this question: the elbow method and the silhouette method.
- Building, Training, and Improving on Existing Recurrent Neural Networks
- May 8, 2017.
In this post, we’ll provide a short tutorial for training a RNN for speech recognition, including code snippets throughout.
-
Deep Learning in Minutes with this Pre-configured Python VM Image - May 5, 2017.
Check out this Python deep learning virtual machine image, built on top of Ubuntu, which includes a number of machine learning tools and libraries, along with several projects to get up and running with right away. - Do We Need Balanced Sampling?
- May 4, 2017.
Resampling is a solution which is very popular in dealing with class imbalance. Our research on churn prediction shows that balanced sampling is unnecessary.
-
Top 10 Machine Learning Videos on YouTube, updated - May 3, 2017.
The top machine learning videos on YouTube include lecture series from Stanford and Caltech, Google Tech Talks on deep learning, using machine learning to play Mario and Hearthstone, and detecting NHL goals from live streams. - Must-Know: What is the idea behind ensemble learning?
- May 2, 2017.
In ensemble methods, more diverse the models used, more robust will be the ultimate result.
- How Not To Program the TensorFlow Graph
- May 1, 2017.
Using TensorFlow from Python is like using Python to program another computer. Being thoughtful about the graphs you construct can help you avoid confusion and costly performance problems.
-
How to Learn Machine Learning in 10 Days - May 1, 2017.
10 days may not seem like a lot of time, but with proper self-discipline and time-management, 10 days can provide enough time to gain a survey of the basic of machine learning, and even allow a new practitioner to apply some of these skills to their own project. -
The Guerrilla Guide to Machine Learning with Python - May 1, 2017.
Here is a bare bones take on learning machine learning with Python, a complete course for the quick study hacker with no time (or patience) to spare.