2017 Mar

All (45) | Meetings (1) | News, Features (1) | Opinions, Interviews (14) | Tutorials, Overviews (29)

PLOTCON, Largest Data Visualization Event of its kind, Oakland, May 2-5

For data scientists, journalists, and business analysts, PLOTCON is THE opportunity to meet the creators of the tools you use everyday, ask questions, hear where the future is heading, and be part of the conversation. Use code KDNUGGETS to save.

on Mar 31, 2017 in CA, Data Visualization, Oakland, Plotly
What makes a great data scientist?

Here are 3 key traits that differentiate between a data scientist and a great data scientist, starting with – great data scientist is obsessed with solving problems, not new tools.

on Mar 30, 2017 in Data Science Skills, Data Scientist, Skills
A Beginner’s Guide to Tweet Analytics with Pandas

Unlike a lot of other tutorials which often pull from the real-time Twitter API, we will be using the downloadable Twitter Analytics data, and most of what we do will be done in Pandas.

on Mar 29, 2017 in Pandas, Python, Twitter
What is Structural Equation Modeling?

Structural Equation Modeling (SEM) is an extremely broad and flexible framework for data analysis, perhaps better thought of as a family of related methods rather than as a single technique. What is its relevance to Marketing Research?

on Mar 27, 2017 in Data Analysis, Market Research, Modeling, Psychology
Getting Started with Deep Learning

This post approaches getting started with deep learning from a framework perspective. Gain a quick overview and comparison of available tools for implementing neural networks to help choose what's right for you.

on Mar 24, 2017 in Caffe, CNTK, Deep Learning, Keras, SVDS, TensorFlow, Theano, Torch
Key Takeaways from Strata + Hadoop World 2017 San Jose, Day 1

The focus is increasingly shifting from storing and processing Big Data in an efficient way, to applying traditional and new machine learning techniques to drive higher value from the data at hand.

on Mar 24, 2017 in CA, Cloudera, Coursera, Hadoop, MapR, Pinterest, San Jose, Strata
Unsupervised Investments: A Comprehensive Guide to AI Investors

This article presents a list of 80 funds investing in Artificial Intelligence and Machine Learning.

on Mar 24, 2017 in AI, Finance, Investment
How to think like a data scientist to become one

The author went from securities analyst to Head of Data Science at Amazon. He describes what he learned in his journey and gives 4 useful rules based on his experience.

on Mar 23, 2017 in Amazon, Data Science Skills, Data Scientist, SQL, Statistics
What Is Data Science, and What Does a Data Scientist Do?

This article is intended to help define the data scientist role, including typical skills, qualifications, education, experience, and responsibilities. This definition is somewhat loose, and given that the ideal experience and skill set is relatively rare to find in one individual.

on Mar 23, 2017 in Career, Data Science, Data Scientist
What Top Firms Ask: 100+ Data Science Interview Questions

Check this out: A topic wise collection of 100+ data science interview questions from top companies.

on Mar 22, 2017 in Algorithms, Data Science, Google, Hadoop, Interview Questions, Machine Learning, Microsoft, Statistics, Uber
Getting Up Close and Personal with Algorithms

We've put together a brief summary of the top algorithms used in predictive analysis, which you can see just below. Read to learn more about Linear Regression, Logistic Regression, Decision Trees, Random Forests, Gradient Boosting, and more.

on Mar 21, 2017 in Algorithms, Dataiku, Decision Trees, Gradient Boosting, Linear Regression, Logistic Regression, random forests algorithm
Analytics 101: Comparing KPIs

Different business units in the organisation have different behaviours (e.g. turnover rate) and they can’t be compared with each other. So, how can we tell whether the changes in their behaviour are reasons for concern?

on Mar 20, 2017 in KPI, Metrics, Statistics
The Most Underutilized Function in SQL

Find out why md5() is an SQL function that's used surprisingly often, and find out how -- and why -- you can use it yourself.

on Mar 20, 2017 in Data Science, SQL
Email Spam Filtering: An Implementation with Python and Scikit-learn

This post is an overview of a spam filtering implementation using Python and Scikit-learn. The results of 2 classifiers are contrasted and compared: multinomial Naive Bayes and support vector machines.

on Mar 17, 2017 in Machine Learning, Python, scikit-learn
Proxy Indicators: beware of spurious claims

Beware of online and market research studies which can lead to false or spurious claims. We examine several notable examples including Google Street View and Argentina inflation.

on Mar 16, 2017 in Argentina, Fake News, Google, Market Research, Overfitting
Applying Machine Learning To March Madness

March Madness is upon us. But before you get your brackets set, check out this overview of using machine learning to do the heavy lifting for you. A great discussion, and a timely topic.

on Mar 16, 2017 in Basketball, Machine Learning, March Madness
50 Companies Leading The AI Revolution, Detailed

We detail 50 companies leading the Artificial Intelligence revolution in AD Sales, CRM, Autotech, Business Intelligence and analytics, Commerce, Conversational AI/Bots, Core AI, Cyber-Security, Fintech, Healthcare, IoT, Vision, and other areas.

on Mar 16, 2017 in AI, Business Analytics, Cybersecurity, Data Science, Healthcare, IoT, Machine Learning
7 Types of Data Scientist Job Profiles

There is no one profile for the Data Scientist, but I tried to make a few generic job profiles that can somewhat fit job descriptions of different companies. I think there is way too much variety, but I had to narrow down on a set of profiles. Check out the list.

on Mar 15, 2017 in Career, Data Science, Data Scientist
17 More Must-Know Data Science Interview Questions and Answers, Part 3

The third and final part of 17 new must-know Data Science interview questions and answers covers A/B testing, data visualization, Twitter influence evaluation, and Big Data quality.

on Mar 15, 2017 in 3Vs of Big Data, A/B Testing, Big Data, Data Quality, Data Science, Data Visualization, Influencers, Interview Questions, Statistics, Twitter
Homebrewed Deep Learning and Do-It-Yourself Robotics

Learn how to experiment with embodied robotic cognition with IBM Project Intu, a platform that extends Deep Learning and other cognitive services to new devices with minimum coding.

on Mar 14, 2017 in Cognitive Computing, Deep Learning, IBM, Robots
Open Source Toolkits for Speech Recognition

This article reviews the main options for free speech recognition toolkits that use traditional Hidden Markov Models and n-gram language models.

on Mar 14, 2017 in C++, Java, Open Source, Python, Speech Recognition, SVDS
Cartoon: What Happens When AI Masters the March Madness

March Madness college basketball phenomenon is underway. New KDnuggets Cartoon looks at what happens when AI masters the March Madness.

on Mar 14, 2017 in AI, Basketball, Cartoon, March Madness, Sports
Text Analytics: A Primer

Marketing scientist Kevin Gray asks Professor Bing Liu to give us a quick snapshot of text analytics in this informative interview.

on Mar 14, 2017 in Bing Liu, Natural Language Processing, NLP, Text Analytics, Text Mining
6 Business Concepts you need to become a Data Science Unicorn

Are you a data science professional and want to advance your career as Data Science Unicorn? Here we provide important business concepts and guidelines required for a data science techie to become a Unicorn.

on Mar 13, 2017 in Bernard Marr, Business Intelligence, Business Strategy, Data Science, Unicorn, Youtube
Toward Increased k-means Clustering Efficiency with the Naive Sharding Centroid Initialization Method

What if a simple, deterministic approach which did not rely on randomization could be used for centroid initialization? Naive sharding is such a method, and its time-saving and efficient results, though preliminary, are promising.

on Mar 13, 2017 in Algorithms, Clustering, Dataset, K-means
Working With Numpy Matrices: A Handy First Reference

This introductory tutorial does a great job of outlining the most common Numpy array creation and manipulation functionality. A good post to keep handy while taking your first steps in Numpy, or to use as a handy reminder.

on Mar 10, 2017 in numpy, Python
Visualizing Time-Series Change

When creating time-series line charts, it’s important to consider which of the following messages you would like to communicate: Actual value of units? Change in absolute units? Percent change? Change from a specific point in time?

on Mar 9, 2017 in Data Visualization, Time Series
Beginner’s Guide to Customer Segmentation

At the core of customer segmentation is being able to identify different types of customers and then figure out ways to find more of those individuals so you can... you guessed it, get more customers!

on Mar 9, 2017 in Clustering, Customer Analytics, Histogram, K-means, Yhat
What makes a good data visualization – a Data Scientist perspective

We examine principles of good data visualization, including some great and terrible examples, guidelines for human perception, focus on key variables, changes and trends, avoiding chart junk, and more.

on Mar 8, 2017 in Data Visualization, Edward Tufte, Elections
The Challenges of Building a Predictive Churn Model

Unlike other data science problems, there is no one method for predicting which customers are likely to churn in the next month. Here we review the most popular approaches.

on Mar 8, 2017 in Churn, Customer Analytics, Datascience.com, Survival Analysis
Building Regression Models in R using Support Vector Regression

The article studies the advantage of Support Vector Regression (SVR) over Simple Linear Regression (SLR) models for predicting real values, using the same basic idea as Support Vector Machines (SVM) use for classification.

on Mar 8, 2017 in R, Regression, Support Vector Machines
Neuroscience for Data Scientists: Understanding Human Behaviour

Neuroscience is very complex and advanced study of brain and people often misuse this term. Here we try to explain neuroscience terminologies and use of data science for such studies.

on Mar 8, 2017 in Consumer Analytics, Data Science, Neuroscience
K-Means & Other Clustering Algorithms: A Quick Intro with Python

In this intro cluster analysis tutorial, we'll check out a few algorithms in Python so you can get a basic understanding of the fundamentals of clustering on a real dataset.

on Mar 8, 2017 in Clustering, K-means, Python, scikit-learn
How to Get a Data Science Job: A Ridiculously Specific Guide

Job hunting is challenging and sometimes frustrating task and we all experience it in our career. Here we provide a very specific and practical guide to get your dream job in Data Science world.

on Mar 7, 2017 in Advice, Data Science, Data Science Skills, Glassdoor, Hiring
A Simple XGBoost Tutorial Using the Iris Dataset

This is an overview of the XGBoost machine learning algorithm, which is fast and shows good results. This example uses multiclass prediction with the Iris dataset from Scikit-learn.

on Mar 7, 2017 in Python, scikit-learn, XGBoost
Big Data Desperately Needs Transparency

If Big Data is to realize its potential, people need to understand what it is capable of, what information is out there and where every piece of data comes from. Without such transparency and understanding, it will be difficult to persuade people to rely on the findings.

on Mar 6, 2017 in Big Data, Interpretability, Transparency, Trust
Software Engineering vs Machine Learning Concepts

Not all core concepts from software engineering translate into the machine learning universe. Here are some differences I've noticed.

on Mar 6, 2017 in Machine Learning, Software Engineering
Bokeh Cheat Sheet: Data Visualization in Python

Bokeh is the Python data visualization library that enables high-performance visual presentation of large datasets in modern web browsers. The package is flexible and offers lots of possibilities to visualize your data in a compelling way, but can be overwhelming.

on Mar 3, 2017 in Bokeh, Cheat Sheet, Data Visualization, DataCamp, Python
Gartner Data Science Platforms – A Deeper Look

Thomas Dinsmore critical examination of Gartner 2017 MQ of Data Science Platforms, including vendors who out, in, have big changes, Hadoop and Spark integration, open source software, and what Data Scientists actually use.

on Mar 3, 2017 in Apache Spark, Data Science Platform, Gartner, IBM, Python, R, SAS, Thomas Dinsmore
Greed, Fear, Game Theory and Deep Learning

The most advanced kind of Deep Learning system will involve multiple neural networks that either cooperate or compete to solve problems. The core problem of a multi-agent approach is how to control its behavior.

on Mar 3, 2017 in AI, Deep Learning, Reinforcement Learning
Every Intro to Data Science Course on the Internet, Ranked

For this guide, I spent 10+ hours trying to identify every online intro to data science course offered as of January 2017, extracting key bits of information from their syllabi and reviews, and compiling their ratings.

on Mar 2, 2017 in Coursera, Data Science, MOOC, Online Education, Ranking, Udacity, Udemy
Building a Bot to Answer FAQs: Predicting Text Similarity

In this post, learn to build a bot to answer frequently asked questions, reducing lag time for more customers and taking the load off of engineers, ensuring they can concentrate on building products!

on Mar 2, 2017 in Chatbot, Python, Similarity
What is Customer Churn Modeling? Why is it valuable?

Getting new customers is much more more expensive than retaining existing ones, so reducing churn is a top priority for many firms. Understanding why customers churn and estimating the risks are powerful components of a data-driven retention strategy.

on Mar 1, 2017 in Churn, Customer Analytics, Datascience.com
The Data Science Project Playbook

Keep your development team from getting mired in high-complexity, low-return projects by following this practical playbook.

on Mar 1, 2017 in Data Science, Data Science Team
7 More Steps to Mastering Machine Learning With Python

This post is a follow-up to last year's introductory Python machine learning post, which includes a series of tutorials for extending your knowledge beyond the original.

on Mar 1, 2017 in 7 Steps, Classification, Clustering, Deep Learning, Ensemble Methods, Gradient Boosting, Machine Learning, Python, scikit-learn, Sebastian Raschka

2017 Mar

Latest Posts

Top Posts