2017 May

Data Science for Newbies: An Introductory Tutorial Series for Software Engineers

This post summarizes and links to the individual tutorials which make up this introductory look at data science for newbies, mainly focusing on the tools, with a practical bent, written by a software engineer from the perspective of a software engineering approach.

on May 31, 2017 in Apache Spark, Data Science, Jupyter, Machine Learning, Pandas, Python, Reddit, Scala, SQL
Data preprocessing for deep learning with nuts-ml

Nuts-ml is a new data pre-processing library in Python for GPU-based deep learning in vision. It provides common pre-processing functions as independent, reusable units. These so called ‘nuts’ can be freely arranged to build data flows that are efficient, easy to read and modify.

on May 30, 2017 in Data Preparation, Deep Learning, IBM, Image Recognition, Python
Qualitative Research Methods for Data Science?

Why on Earth would a data scientist need to know about qualitative research? There are plenty of reasons. Here are a few.

on May 30, 2017 in Data Science, Qualitative Analytics, Qualitative Research
Must-Know: How to determine the influence of a Twitter user?

The influence of a Twitter user goes beyond the simple number of followers. We also want to examine how effective are tweets - how likely they are to be retweeted, favorited, or the links inside clicked upon. What exactly is an influential user depends on the definition.

on May 30, 2017 in Influencers, Twitter
Top Stories, May 22-28: Analytics, Data Science, Machine Learning Software Poll Results; Machine Learning Crash Course

New Leader, Trends, and Surprises in Analytics, Data Science, Machine Learning Software Poll; Machine Learning Crash Course: Part 1; Text Mining 101: Mining Information From A Resume; Data science platforms are on the rise and IBM is leading the way; An Introduction to the MXNet Python API

on May 29, 2017 in Top stories
Machine Learning Workflows in Python from Scratch Part 1: Data Preparation

This post is the first in a series of tutorials for implementing machine learning workflows in Python from scratch, covering the coding of algorithms and related tools from the ground up. The end result will be a handcrafted ML toolkit. This post starts things off with data preparation.

on May 29, 2017 in Data Preparation, Machine Learning, Python, Workflow
What is an Ontology? The simplest definition you’ll find… or your money back*

This post takes the concept of an ontology and presents it in a clear and simple manner, devoid of the complexities that often surround such explanations.

on May 26, 2017 in GRAKN.AI, Graph, Ontology
Machine Learning Anomaly Detection: The Ultimate Design Guide

Considering building a machine learning anomaly detection system for your high velocity business? Learn how with Anodot ultimate three-part guide.

on May 25, 2017 in Anodot, Anomaly Detection, Machine Learning, Real-time
Data science platforms are on the rise and IBM is leading the way

Download the 2017 Gartner Magic Quadrant for Data Science Platforms today to learn why IBM is named a leader in data science and to find out why data science, analytics, and machine learning are the engines of the future.

on May 25, 2017 in Data Science Platform, Gartner, IBM, IBM SPSS Modeler
How A Data Scientist Can Improve Productivity

Data Science projects involve iterative processes and may need changes in data at every iteration. But Data versioning, data pipelines and data workflows make Data Scientist’s life easy, let’s see how.

on May 25, 2017 in CRISP-DM, Data Scientist, Data Workflow, DVC, GitHub, Version Control
Will Data Science Eliminate Data Science?

There are elements of what we do which are AI complete. Eventually, Artificial General Intelligence will eliminate the data scientist, but it’s not around the corner.

on May 25, 2017 in Automation, Data Science, Data Scientist
DataScience.com Releases Python Package for Interpreting the Decision-Making Processes of Predictive Models

DataScience.com new Python library, Skater, uses a combination of model interpretation algorithms to identify how models leverage data to make predictions.

on May 24, 2017 in Datascience.com, GitHub, Interpretability, Python
Text Mining 101: Mining Information From A Resume

We show a framework for mining relevant entities from a text resume, and how to separation parsing logic from entity specification.

on May 24, 2017 in Career, Natural Language Processing, NLP, Resume, Text Analytics, Text Mining
Machine Learning Crash Course: Part 1

This post, the first in a series of ML tutorials, aims to make machine learning accessible to anyone willing to learn. We’ve designed it to give you a solid understanding of how ML algorithms work as well as provide you the knowledge to harness it in your projects.

on May 24, 2017 in Classification, Cost Function, Gradient Descent, Machine Learning, Regression
Natural Language Generation overview – is NLG is worth a thousand pictures ?

NLG tools automate the analysis and enhance traditional BI platforms by explaining in plain English the significance of visualizations and findings – here is an overview of the market.

on May 23, 2017 in AI, Arria, BI, Narrative Science, Natural Language Generation, Yseop
Why Java is the Language of Choice for the Internet of Things (IoT)

What has caused this Java revival and why is Java so useful in the Internet of Things? Better yet, what is the Internet of Things?

on May 23, 2017 in Developers, Internet of Things, IoT, Java
New Leader, Trends, and Surprises in Analytics, Data Science, Machine Learning Software Poll

Python caught up with R and (barely) overtook it; Deep Learning usage surges to 32%; RapidMiner remains top general Data Science platform; Five languages of Data Science.

on May 22, 2017 in Anaconda, Data Mining Software, Poll, Python, R, RapidMiner, Spark, TensorFlow
Must-Know: Key issues and problems with A/B testing

A look at 2 topics in A/B testing: Ensuring that bucket assignment is truly random, and conducting an A/B test on an opt-in feature.

on May 22, 2017 in A/B Testing, Interview Questions
The Path To Learning Artificial Intelligence

Learn how to easily build real-world AI for booming tech, business, pioneering careers and game-level fun.

on May 19, 2017 in AI, Artificial Intelligence, Deep Learning, Learning Path, Machine Learning, Online Education, Python
Simplifying Data Pipelines in Hadoop: Overcoming the Growing Pains

Moving to Hadoop is not without its challenges—there are so many options, from tools to approaches, that can have a significant impact on the future success of a business’ strategy. Data management and data pipelining can be particularly difficult.

on May 18, 2017 in Data Management, Data Platform, Hadoop, SVDS
Teaching the Data Science Process

Understanding the process requires not only wide technical background in machine learning but also basic notions of businesses administration; here I will share my experience on teaching the data science process.

on May 17, 2017 in Data Science, Methodology, Process, Teaching
Propensity Scores: A Primer

Propensity scores are used in quasi-experimental and non-experimental research when the researcher must make causal inferences, for example, that exposure to a chemical increases the risk of cancer.

on May 16, 2017 in Customer Experience, Statistics
Must-Know: What are common data quality issues for Big Data and how to handle them?

Let's have a look at common quality issues facing Big Data in terms of the key characteristics of Big Data – Volume, Velocity, Variety, Veracity, and Value.

on May 16, 2017 in 3Vs of Big Data, Big Data, Data Quality, Interview Questions
HDFS vs. HBase : All you need to know

Hadoop Distributed File System (HDFS), and Hbase (Hadoop database) are key components of Big Data ecosystem. This blog explains the difference between HDFS and HBase with real-life use cases where they are best fit.

on May 15, 2017 in Big Data, Hadoop, HBase, HDFS
Cartoon: Mother Of All Data.

We revisit KDnuggets Mother's Day Cartoon. Enjoy and don't forget the mothers in your life - Big Data predicted that 67.53% of you would remember.

on May 14, 2017 in Cartoon, Humor
Guarantee yourself a data science career

The Data Science Career Track is the first online bootcamp to guarantee you a data science job or your money back. The application process is selective - start it know.

on May 12, 2017 in Career, Data Science Education, Springboard
The Two Phases of Gradient Descent in Deep Learning

In short, you reach different resting placing with different SGD algorithms. That is, different SGDs just give you differing convergence rates due to different strategies, but we do expect that they all end up at the same results!

on May 12, 2017 in Deep Learning, ICLR, Neural Networks
Introducing Dask-SearchCV: Distributed hyperparameter optimization with Scikit-Learn

We introduce a new library for doing distributed hyperparameter optimization with Scikit-Learn estimators. We compare it to the existing Scikit-Learn implementations, and discuss when it may be useful compared to other approaches.

on May 12, 2017 in Dask, Distributed Computing, Distributed Systems, Machine Learning, Optimization, scikit-learn
Data Version Control: iterative machine learning

ML modeling is an iterative process and it is extremely important to keep track of all the steps and dependencies between code and data. New open-source tool helps you do that.

on May 11, 2017 in CRISP-DM, DVC, GitHub, Machine Learning, Open Source, Reproducibility, Version Control
The Internet of Things in the Cloud

Cloud computing is the next evolutionary step in Internet-based computing, which provides the means for delivering ICT resources as a service. Internet-of-Things can benefit from the scalability, performance and pay-as-you-go nature of cloud computing infrastructures.

on May 11, 2017 in Cloud, Cloud Computing, Internet of Things, IoT, Scalability
The Guerrilla Guide to Machine Learning with R

This post is a lean look at learning machine learning with R. It is a complete, if very short, course for the quick study hacker with no time (or patience) to spare.

on May 11, 2017 in Data Analysis, Machine Learning, R
Top 10 Recent AI videos on YouTube

Top viewed videos on artificial intelligence since 2016 include great talks and lecture series from MIT and Caltech, Google Tech Talks on AI.

on May 10, 2017 in AI, Google, Machine Learning, MIT, Neural Networks, NVIDIA, Robots, Youtube
The Quant Crunch: The demand for data science skills

This report, created by analyzing millions of job postings using advanced technology, divides Data Science and Analytics roles into 6 broad categories, and answers many questions, including cities, industries, job roles with most growth.

on May 10, 2017 in Data Science Skills, Hiring, IBM
5 Machine Learning Projects You Can No Longer Overlook, May

In this month's installment of Machine Learning Projects You Can No Longer Overlook, we find some data preparation and exploration tools, a (the?) reinforcement learning "framework," a new automated machine learning library, and yet another distributed deep learning library.

on May 10, 2017 in Automated Machine Learning, Data Exploration, Deep Learning, Distributed Systems, Machine Learning, Overlook, Pandas, Reinforcement Learning
A Data Analyst guide to A/B testing

A/B testing is key to improving results in any marketing campaign. We examine the issues involved in its 3 main components: message variants, user group selection, and choosing the winning version.

on May 9, 2017 in A/B Testing, CleverTap, Marketing Analytics
Using Deep Learning To Extract Knowledge From Job Descriptions

We present a deep learning approach to extract knowledge from a large amount of data from the recruitment space. A learning to rank approach is followed to train a convolutional neural network to generate job title and job description embeddings.

on May 9, 2017 in Convolutional Neural Networks, Deep Learning, Natural Language Processing, Neural Networks, NLP, Text Mining
Must-Know: How to determine the most useful number of clusters?

Without knowing the ground truth of a dataset, then, how do we know what the optimal number of data clusters are? We will have a look at 2 particular popular methods for attempting to answer this question: the elbow method and the silhouette method.

on May 9, 2017 in Clustering, Interview Questions
Sales forecasting using Machine Learning

SpringML inviting business and sales leaders to its Man vs Machine Forecasting Duel - give them a day with your data and they will provide an algorithm based, unbiased forecast.

on May 8, 2017 in Forecasting, Machine Learning, Sales, SpringML
Data Science & Machine Learning Platforms for the Enterprise

A resilient Data Science Platform is a necessity to every centralized data science team within a large corporation. It helps them centralize, reuse, and productionize their models at peta scale.

on May 8, 2017 in Algorithmia, Data Science Platform, Enterprise, Machine Learning
Building, Training, and Improving on Existing Recurrent Neural Networks

In this post, we’ll provide a short tutorial for training a RNN for speech recognition, including code snippets throughout.

on May 8, 2017 in Deep Learning, Neural Networks, Recurrent Neural Networks, SVDS
New Poll: What software you used for Analytics, Data Mining, Data Science, Machine Learning projects in the past 12 months?

Vote in KDnuggets 18th Annual Poll: What software you used for Analytics, Data Mining, Data Science, Machine Learning projects in the past 12 months? We will clean, analyze, visualize, and publish the results.

on May 5, 2017 in Data Mining Software, Data Science Platform, Deep Learning, Poll
Deep Learning in Minutes with this Pre-configured Python VM Image

Check out this Python deep learning virtual machine image, built on top of Ubuntu, which includes a number of machine learning tools and libraries, along with several projects to get up and running with right away.

on May 5, 2017 in Deep Learning, Machine Learning, Python
Machine Learning overtaking Big Data?

Is Machine Learning is overtaking Big Data?! We also examine trends for several more related and popular buzzwords, and see how BD, ML. Artificial Intelligence, Data Science, and Deep Learning rank.

on May 4, 2017 in Big Data, Big Data Hype, Gartner, Google Trends, Machine Learning
42 Essential Quotes by Data Science Thought Leaders

42 illuminating quotes you need to read if you’re a data scientist or considering a career in the field – insights from industry experts tackling the tough questions that every data scientist faces.

on May 4, 2017 in Career, Data Science, Data Science Skills, Data Scientist, DJ Patil, Hilary Mason, Kirk D. Borne
Do We Need Balanced Sampling?

Resampling is a solution which is very popular in dealing with class imbalance. Our research on churn prediction shows that balanced sampling is unnecessary.

on May 4, 2017 in Customer Analytics, Data Mining, Data Science
How to Fail with Artificial Intelligence: 9 creative ways to make your AI startup fail

This post summarizes nine creative ways to condemn almost any AI startup to bankruptcy. I focus on data science and machine learning startups, but the lessons on what to avoid can easily be applied to other industries.

on May 4, 2017 in AI, Artificial Intelligence, Failure, Startup
Top 10 Machine Learning Videos on YouTube, updated

The top machine learning videos on YouTube include lecture series from Stanford and Caltech, Google Tech Talks on deep learning, using machine learning to play Mario and Hearthstone, and detecting NHL goals from live streams.

on May 3, 2017 in Andrew Ng, Computer Vision, Deep Learning, Geoff Hinton, Google, Machine Learning, Neural Networks, Robots, Video Games, Yaser Abu-Mostafa, Youtube
Did you know cavemen were already dealing with “Big Data” issues?

We know Big Data & Analytics are new & cutting edge technologies; but actually, human started using data & analytics techniques 5000 years ago. Let’s take a look.

on May 3, 2017 in Big Data, Big Data Analytics, Data Analysis, Data Science, History
Deep Learning – Past, Present, and Future

There is a lot of buzz around deep learning technology. First developed in the 1940s, deep learning was meant to simulate neural networks found in brains, but in the last decade 3 key developments have unleashed its potential.

on May 2, 2017 in Andrew Ng, Big Data, Deep Learning, Geoff Hinton, Google, GPU, History, Neural Networks, NVIDIA
What Do Frameworks Offer Data Scientists that Programming Languages Lack?

While programming languages will never be completely obsolete, a growing number of programmers (and data scientists) prefer working with frameworks and view them as the more modern and cutting-edge option for a number of reasons.

on May 2, 2017 in Big Data, Data Science, Programming Languages
The 2017 Data Scientist Report is now available

For the third year in a row, CrowdFlower surveyed data scientists (nearly 200 this year) from all manner of organizations, which they have compiled into one free report which you can be downloaded now. This year, lots of insights into the word of AI are included.

on May 1, 2017 in CrowdFlower, Data Science, Report
How Not To Program the TensorFlow Graph

Using TensorFlow from Python is like using Python to program another computer. Being thoughtful about the graphs you construct can help you avoid confusion and costly performance problems.

on May 1, 2017 in Deep Learning, Programming, Python, TensorFlow
How to Learn Machine Learning in 10 Days

10 days may not seem like a lot of time, but with proper self-discipline and time-management, 10 days can provide enough time to gain a survey of the basic of machine learning, and even allow a new practitioner to apply some of these skills to their own project.

on May 1, 2017 in Machine Learning, Sebastian Raschka
The Guerrilla Guide to Machine Learning with Python

Here is a bare bones take on learning machine learning with Python, a complete course for the quick study hacker with no time (or patience) to spare.

on May 1, 2017 in Deep Learning, Machine Learning, Pandas, Python, scikit-learn, Sebastian Raschka

2017 May

Latest Posts

Top Posts