2017 Jan

New e-learning course: Fraud Analytics using Descriptive, Predictive and Social Network Analytics

This online course teaches how to find fraud patterns from historical data using descriptive analytics, and social network learning.

on Jan 31, 2017 in Bart Baesens, Fraud analytics, Fraud Detection, SAS, Social Media Analytics
Deep Learning Research Review: Natural Language Processing

This edition of Deep Learning Research Review explains recent research papers in Natural Language Processing (NLP). If you don't have the time to read the top papers yourself, or need an overview of NLP with Deep Learning, this post is for you.

on Jan 31, 2017 in Deep Learning, Natural Language Processing, Neural Networks, NLP
Data Scientist – best job in America, again

Glassdoor again ranked Data Scientist as the no. 1 job in USA, and 5 of the top 10 US jobs are related to Analytics, Big Data, and Data Science.

on Jan 30, 2017 in Data Engineer, Data Scientist, Glassdoor, USA
Pandas Cheat Sheet: Data Science and Data Wrangling in Python

The Pandas library can seem very elaborate and it might be hard to find a single point of entry to the material: with other learning materials focusing on different aspects of this library, you can definitely use a reference sheet to help you get the hang of it.

on Jan 27, 2017 in Cheat Sheet, Data Preparation, DataCamp, Pandas, Python
Bad Data + Good Models = Bad Results

No matter how advanced is your Machine Learning algorithm, the results will be bad if the input data
is bad. We examine one popular IMDB dataset and discuss how an analyst can deal with such data.

on Jan 26, 2017 in Data Quality, Face Recognition, IMDb, Kaggle, Movies
Artificial Intelligence and Speech Recognition for Chatbots: A Primer

Bot bots bots... Read this overview of how artificial intelligence and natural language processing are contributing to chatbot development, and where it all goes from here.

on Jan 26, 2017 in AI, Artificial Intelligence, Chatbot, Speech Recognition
6 areas of AI and Machine Learning to watch closely

Artificial Intelligence is a generic term and many fields of science overlaps when comes to make an AI application. Here is an explanation of AI and its 6 major areas to be focused, going forward.

on Jan 25, 2017 in AI, Deep Neural Network, Generative Adversarial Network, Machine Learning, Reinforcement Learning
Why the Data Scientist and Data Engineer Need to Understand Virtualization in the Cloud

This article covers the value of understanding the virtualization constructs for the data scientist and data engineer as they deploy their analysis onto all kinds of cloud platforms. Virtualization is a key enabling layer of software for these data workers to be aware of and to achieve optimal results from.

on Jan 25, 2017 in Cloud, Data Engineer, Data Engineering, Data Science, Data Scientist, Virtualization
Great Collection of Minimal and Clean Implementations of Machine Learning Algorithms

Interested in learning machine learning algorithms by implementing them from scratch? Need a good set of examples to work from? Check out this post with links to minimal and clean implementations of various algorithms.

on Jan 25, 2017 in Algorithms, Machine Learning, Programming, Python
Bringing Business Clarity To CRISP-DM

Many analytic projects fail to understand the business problem they are trying to solve. Correctly applying decision modeling in the Business Understanding phase of CRISP-DM brings clarity to the business problem.

on Jan 24, 2017 in CRISP-DM, Data Mining, Data Science, Decision Modeling, Methodology, Predictive Analytics
The Top Predictive Analytics Pitfalls to Avoid

Predictive modelling and machine learning are significantly contributing to business, but they can be very sensitive to data and changes in it, which makes it very important to use proper techniques and avoid pitfalls in building data science models.

on Jan 23, 2017 in Bias, Machine Learning, Model Performance, Predictive Analytics, Regularization, Statistics
Learn how to Develop and Deploy a Gradient Boosting Machine Model

GBM is one the hottest machine learning methods. Learn how to create GBM using SciKit-Learn and Python and understand the steps required to transform features, train, and deploy a GBM.

on Jan 20, 2017 in Gradient Boosting, Open Data Group, Python, scikit-learn
Eat Melon: A Deep Q Reinforcement Learning Demo in your browser

Check "Eat Melon demo", a fun way to gain familiarity with the Deep Q Learning algorithm, which you can do in your browser.

on Jan 20, 2017 in Atari, Deep Learning, OpenAI, Reinforcement Learning
The Data Science Puzzle, Revisited

The data science puzzle is re-examined through the relationship between several key concepts in the realm, and incorporates important updates and observations from the past year. The result is a modified explanatory graphic and rationale.

on Jan 20, 2017 in AI, Big Data, Data Mining, Data Science, Deep Learning, Machine Learning
Data Science of Sales Calls: 3 Actionable Findings

How does AI help sales and marketing teams in the organisation? Let’s understand Dos and don’ts of sales calls with the help of analysis of over 70,000+ B2B SaaS sales calls.

on Jan 19, 2017 in AI, Gong.io, Machine Learning, Sales, Speech Recognition
The big data ecosystem for science: X-ray crystallography

Diffract-and-destroy experiments to accurately determine three-dimensional structures of nano-scale systems can produce 150 TB of data per sample. We review how such Big Data is processed.

on Jan 19, 2017 in Big Data, Science, Strata, X-ray crystallography
Four Problems in Using CRISP-DM and How To Fix Them

CRISP-DM is the leading approach for managing data mining, predictive analytic and data science projects. CRISP-DM is effective but many analytic projects neglect key elements of the approach.

on Jan 18, 2017 in CRISP-DM, Data Mining, Methodology
The Current State of Automated Machine Learning

What is automated machine learning (AutoML)? Why do we need it? What are some of the AutoML tools that are available? What does its future hold? Read this article for answers to these and other AutoML questions.

on Jan 18, 2017 in Automated, Automated Data Science, Automated Machine Learning, Hyperparameter, Machine Learning
More Data or Better Algorithms: The Sweet Spot

We examine the sweet spot for data-driven Machine Learning companies, where is not too easy and not too hard to collect the needed data.

on Jan 17, 2017 in Algorithms, Big Data, Data, Datasets, Machine Learning
Time Series Analysis: A Primer

Time series analysis is a complex subject but, in short, when we use our usual cross-sectional techniques such as regression on time series data, variables can appear "more significant" than they really are and we are not taking advantage of the information the serial correlation in the data provides.

on Jan 17, 2017 in Data Analysis, Time Series
90 Active Blogs on Analytics, Big Data, Data Mining, Data Science, Machine Learning (updated)

Stay up-to-date in the data science with active blogs. This is a list of 90 recently active blogs on Big Data, Data Science, Data Mining, Machine Learning, and Artificial intelligence.

on Jan 17, 2017 in Big Data, Blogs, Data Mining, Data Science, Machine Learning
Introduction to Forecasting with ARIMA in R

ARIMA models are a popular and flexible class of forecasting model that utilize historical information to make predictions. In this tutorial, we walk through an example of examining time series for demand at a bike-sharing service, fitting an ARIMA model, and creating a basic forecast.

on Jan 16, 2017 in ARIMA, Datascience.com, Forecasting, R, Stationarity, Time Series
Deep Learning Can be Applied to Natural Language Processing

This post is a rebuttal to a recent article suggesting that neural networks cannot be applied to natural language given that language is not a produced as a result of continuous function. The post delves into some additional points on deep learning as well.

on Jan 16, 2017 in Deep Learning, Natural Language Processing, Neural Networks, NLP
6 Steps to Effective Data Preparation for Quality Conclusions

Data preparation is usually the most time consuming part of a data analysis project. To get good results, follow the six steps here, starting with Understand the Business Needs, Get to Know the Data, and Wrangle, Munge, and Mash Up.

on Jan 12, 2017 in Data Preparation, Sisense
Doctor of Business Administration/Data Analytics, Online at Grand Canyon University

Offered in a convenient online format, this doctoral program empowers expert data analysts to spark new industry-wide innovation.

By GCU on Jan 12, 2017 in Business Analytics, Data Analytics, Online Education
Top KDnuggets tweets, Jan 04-10: Cartoon: When Self-Driving Car takes you too far; A massive collection of free programming books

Also AI #DataScience #MachineLearning: Main Developments 2016, Key Trends 2017; Scikit-Learn Cheat Sheet: #Python #MachineLearning

on Jan 11, 2017 in 2017 Predictions, Free ebook, Programming, scikit-learn, Self-Driving Car
The Most Popular Language For Machine Learning and Data Science Is …

When it comes to choosing programming language for Data Analytics projects or job prospects, people have different opinions depending on their career backgrounds and domains they worked in. Here is the analysis of data from indeed.com with respect to choice of programming language for machine learning and data science.

on Jan 11, 2017 in Data Science, Machine Learning, Programming Languages, Python, R, Scala
Text Mining Amazon Mobile Phone Reviews: Interesting Insights

We analyzed more than 400 thousand reviews of unlocked mobile phones sold on Amazon.com to find out insights with respect to reviews, ratings, price and their relationships.

on Jan 10, 2017 in Amazon, Analytics, Product reviews, Sentiment Analysis, Text Analytics, Text Mining
Social Media for Marketing and Healthcare: Focus on Adverse Side Effects

Social media like twitter, facebook are very important sources of big data on the internet and using text mining, valuable insights about a product or service can be found to help marketing teams. Lets see, how healthcare companies are using big data and text mining to improve their marketing strategies.

on Jan 9, 2017 in Healthcare, NLP, Social Media, Text Analytics, Text Mining, Twitter
The Surprising Ethics of Humans and Self-Driving Cars

The surprising finding is that people are much more willing to ride in a self-driving car that might kill them to save several pedestrians than in a car that would save them but kill pedestrians. Asian respondents had significantly different preferences from US and Europe.

on Jan 9, 2017 in Ethics, Humans vs Machines, Poll, Self-Driving Car
A Tasty approach to data science

Data scientists at Foodpairing help brands cut down on the fuzzy front end of product development. The so-called Consumer Flavor Intelligence combines internet data and food science to create timely flavor line extensions.

on Jan 7, 2017 in Coffee, Consumer Analytics, Data Science, Food
Sound Data Science: Avoiding the Most Pernicious Prediction Pitfall

Data science and predictive analytics can provide huge value, but they can mislead and backfire if not used with fail-safe measures. The author gives examples of such problems and provides guidelines to avoid them.

on Jan 5, 2017 in Advice, Data Science, Model Performance, Overfitting, Predictive Analytics, Statistical Modeling
Cartoon: When Self-Driving Car + Machine Learning takes you too far …

What can happen in the not too distant future when advanced technologies like a Self-Driving car and Machine Learning Recommendations Engine are combined ...

on Jan 4, 2017 in Cartoon, Food, Recommendation Engine, Self-Driving Car
How To Stay Competitive In Machine Learning Business

To stay competitive in machine learning business, you have to be superior than your rivals and not the best possible – says one of the leading machine learning expert. Simple rules are defined here to make that happen. Let’s see how.

on Jan 4, 2017 in Business, Business Analytics, Data Management, Machine Learning, Research
Tidying Data in Python

This post summarizes some tidying examples Hadley Wickham used in his 2014 paper on Tidy Data in R, but will demonstrate how to do so using the Python pandas library.

on Jan 4, 2017 in Data Cleaning, Data Preparation, Pandas, Python
Revenue per Employee: golden ratio, or red herring?

There is growing support for revenue per employee as one of the most underrated metrics available for assessing business performance in a crowded marketplace.

on Jan 4, 2017 in Apple, Facebook, Google, Hiring, Workforce Analytics
Generative Adversarial Networks – Hot Topic in Machine Learning

What is Generative Adversarial Networks (GAN) ? A very illustrative explanation of GAN is presented here with simple examples like predicting next frame in video sequence or predicting next word while typing in google search.

on Jan 3, 2017 in Deep Learning, Generative Adversarial Network, Machine Learning, Neural Networks, NIPS
3 methods to deal with outliers

In both statistics and machine learning, outlier detection is important for building an accurate model to get good results. Here three methods are discussed to detect outliers or anomalous data instances.

By Alberto Quesada on Jan 3, 2017 in Machine Learning, Outliers, Statistics
Ten Myths About Machine Learning, by Pedro Domingos

Myths on artificial intelligence and machine learning abound. Noted expert Pedro Domingos identifies and refutes a number of these myths, of both the pessimistic and optimistic variety.

on Jan 3, 2017 in Machine Learning, Myths, Pedro Domingos
Machine Learning and Cyber Security Resources

An overview of useful resources about applications of machine learning and data mining in cyber security, including important websites, papers, books, tutorials, courses, and more.

on Jan 2, 2017 in Cybersecurity, Machine Learning, Security
5 Machine Learning Projects You Can No Longer Overlook, January

There are a lot of popular machine learning projects out there, but many more that are not. Which of these are actively developed and worth checking out? Here is an offering of 5 such projects, the most recent in an ongoing series.

on Jan 2, 2017 in Boosting, C++, Data Preparation, Decision Trees, Machine Learning, Neural Networks, Optimization, Overlook, Pandas, Python, scikit-learn

2017 Jan

Latest Posts

Top Posts