2019 Aug

Emoji Analytics

Emoji is becoming a global language understandable by anyone who expresses... emotion. With the pervasiveness of these little Unicode blocks, we can perform analytics on their use throughout social media to gain insight into sentiments around the world.

on Aug 30, 2019 in Analytics, Emoji, Social Network Analysis, Twitter
R Users’ Salaries from the 2019 Stackoverflow Survey

Let’s take a look on what R users are saying about their salaries. Note that the following results could be biased because of unrepresentative and in some cases small samples.

on Aug 30, 2019 in R, Salary, StackOverflow, Survey
Object-oriented programming for data scientists: Build your ML estimator

Implement some of the core OOP principles in a machine learning context by building your own Scikit-learn-like estimator, and making it better.

on Aug 30, 2019 in Data Scientist, Machine Learning, Programming, Python
Deep Learning Next Step: Transformers and Attention Mechanism

With the pervasive importance of NLP in so many of today's applications of deep learning, find out how advanced translation techniques can be further enhanced by transformers and attention mechanisms.

on Aug 29, 2019 in Attention, Deep Learning, NLP, Transformer
4 Tips for Advanced Feature Engineering and Preprocessing

Techniques for creating new features, detecting outliers, handling imbalanced data, and impute missing values.

on Aug 29, 2019 in Data Preprocessing, Feature Engineering, Python, Tips
Types of Bias in Machine Learning

The sample data used for training has to be as close a representation of the real scenario as possible. There are many factors that can bias a sample from the beginning and those reasons differ from each domain (i.e. business, security, medical, education etc.)

on Aug 29, 2019 in Bias, Data Science, Data Scientist, Machine Learning
New Poll: Data Science Skills

New KDnuggets poll asks 1) What Data Science/Machine Learning-related skills you currently have, and 2) Which skills you want to add or improve? If you are human, please vote and we will analyze and publish the results.

on Aug 28, 2019 in Data Science Skills, Poll, Skills
A 2019 Guide to Human Pose Estimation

Human pose estimation refers to the process of inferring poses in an image. Essentially, it entails predicting the positions of a person’s joints in an image or video. This problem is also sometimes referred to as the localization of human joints.

on Aug 28, 2019 in AI, Computer Vision, Image Recognition, Video recognition
TensorFlow 2.0: Dynamic, Readable, and Highly Extended

With substantial changes coming with TensorFlow 2.0, and the release candidate version now available, learn more in this guide about the major updates and how to get started on the machine learning platform.

on Aug 27, 2019 in Deep Learning, Deployment, Exxact, TensorFlow
Introducing AI Explainability 360: A New Toolkit to Help You Understand what Machine Learning Models are Doing

Recently, AI researchers from IBM open sourced AI Explainability 360, a new toolkit of state-of-the-art algorithms that support the interpretability and explainability of machine learning models.

on Aug 27, 2019 in AI, Explainability, Machine Learning, Modeling
Why Data Visualization Is The Most Important Skill in a Data Analyst Arsenal

Visually-displayed data is much more accessible, and it’s critical to promptly identify the weaknesses of an organization, accurately forecast trading volumes and sale prices, or make the right business choices.

on Aug 26, 2019 in Data Analyst, Data Visualization, Simpliv
How to count Big Data: Probabilistic data structures and algorithms

Learn how probabilistic data structures and algorithms can be used for cardinality estimation in Big Data streams.

on Aug 26, 2019 in Algorithms, Big Data, Probability
Artificial Intelligence vs. Machine Learning vs. Deep Learning: What is the Difference?

Over the past few years, artificial intelligence continues to be one of the hottest topics. And in order to work effectively with it, you need to understand its constituent parts.

on Aug 26, 2019 in AI, Deep Learning, Machine Learning
How to Sell Your Boss on the Need for Data Analytics

Here are some ways you can make the case to your boss that analytics investments are smart for your company to pursue.

on Aug 26, 2019 in Career Advice, Data Analytics
Nothing but NumPy: Understanding & Creating Neural Networks with Computational Graphs from Scratch

Entirely implemented with NumPy, this extensive tutorial provides a detailed review of neural networks followed by guided code for creating one from scratch with computational graphs.

on Aug 23, 2019 in Backpropagation, Neural Networks, numpy, Python
Top Handy SQL Features for Data Scientists

Whenever we hear "data," the first thing that comes to mind is SQL! SQL comes with easy and quick to learn features to organize and retrieve data, as well as perform actions on it in order to gain useful insights.

on Aug 23, 2019 in Data Science, Data Scientist, SQL
Order Matters: Alibaba’s Transformer-based Recommender System

Alibaba, the largest e-commerce platform in China, is a powerhouse not only when it comes to e-commerce, but also when it comes to recommender systems research. Their latest paper, Behaviour Sequence Transformer for E-commerce Recommendation in Alibaba, is yet another publication that pushes the state of the art in recommender systems.

on Aug 23, 2019 in Alibaba, Recommendation Engine, Recommender Systems, Transformer
Proptech and the proper use of technology for house sales prediction

Using the ATTOM dataset, we extracted data on sales transactions in the USA, loans, and estimated values of property. We developed an optimal prediction model from correlations in the time and status of ownership as well as the time of the year of sales fluctuations.

on Aug 22, 2019 in Feature Selection, Predictive Analytics, Real Estate
How LinkedIn, Uber, Lyft, Airbnb and Netflix are Solving Data Management and Discovery for Machine Learning Solutions

As machine learning evolves, the need for tools and platforms that automate the lifecycle management of training and testing datasets is becoming increasingly important. Fast growing technology companies like Uber or LinkedIn have been forced to build their own in-house data lifecycle management solutions to power different groups of machine learning models.

on Aug 22, 2019 in AirBnB, Data Management, LinkedIn, Machine Learning, Netflix, Uber
Comparing Decision Tree Algorithms: Random Forest® vs. XGBoost

Check out this tutorial walking you through a comparison of XGBoost and Random Forest. You'll learn how to create a decision tree, how to do tree bagging, and how to do tree boosting.

on Aug 21, 2019 in ActiveState, Decision Trees, Python, random forests algorithm, XGBoost
Gender Diversity in AI Research

Through an analysis of 1.5M papers from arXiv, this study reviews the evolution of gender diversity across disciplines, countries, and institutions as well as the semantic differences between AI papers with and without female co-authors.

on Aug 21, 2019 in AI, Diversity, Research, Women
Understanding Decision Trees for Classification in Python

This tutorial covers decision trees for classification also known as classification trees, including the anatomy of classification trees, how classification trees make predictions, using scikit-learn to make classification trees, and hyperparameter tuning.

on Aug 21, 2019 in Classification, Decision Trees, Python, scikit-learn
Automate Stacking In Python: How to Boost Your Performance While Saving Time

Utilizing stacking (stacked generalizations) is a very hot topic when it comes to pushing your machine learning algorithm to new heights. For instance, most if not all winning Kaggle submissions nowadays make use of some form of stacking or a variation of it.

on Aug 21, 2019 in Algorithms, Big Data, Data Science, Python
Detecting stationarity in time series data

Explore how to determine if your time series data is generated by a stationary process and how to handle the necessary assumptions and potential interpretations of your result.

on Aug 20, 2019 in Forecasting, Stationarity, Time Series
Is Kaggle Learn a “Faster Data Science Education?”

Kaggle Learn is "Faster Data Science Education," featuring micro-courses covering an array of data skills for immediate application. Courses may be made with newcomers in mind, but the platform and its content is proving useful as a review for more seasoned practitioners as well.

on Aug 20, 2019 in Data Science, Data Science Education, Kaggle, Online Education
An Overview of Python’s Datatable package

Modern machine learning applications need to process a humongous amount of data and generate multiple features. Python’s datatable module was created to address this issue. It is a toolkit for performing big data (up to 100GB) operations on a single-node machine, at the maximum possible speed.

on Aug 20, 2019 in Big Data, Data Science, Python
Crafting an Elevator Pitch for your Data Science Startup

If you are launching a data science startup, these tips will give you a head start as you seek capital for seed funding or your next level of growth.

on Aug 19, 2019 in Data Science, Startup, Startups, VC
Deep Learning for NLP: Creating a Chatbot with Keras!

Learn how to use Keras to build a Recurrent Neural Network and create a Chatbot! Who doesn’t like a friendly-robotic personal assistant?

on Aug 19, 2019 in Chatbot, Deep Learning, Keras, NLP, Python
Manual Coding or Automated Data Integration – What’s the Best Way to Integrate Your Enterprise Data?

What’s the best way to execute your data integration tasks: writing manual code or using ETL tool? Find out the approach that best fits your organization’s needs and the factors that influence it.

on Aug 19, 2019 in Advice, Data Integration, Data Management, Data Science, Data Science Platform, ETL
How to Become More Marketable as a Data Scientist

As a data scientist, you are in high demand. So, how can you increase your marketability even more? Check out these current trends in skills most desired by employers in 2019.

on Aug 16, 2019 in Advice, Andriy Burkov, Carla Gentry, Data Science Skills, Data Scientist
Understanding Cancer using Machine Learning

Use of Machine Learning (ML) in Medicine is becoming more and more important. One application example can be Cancer Detection and Analysis.

on Aug 16, 2019 in Cancer Detection, Healthcare, Machine Learning, Medical
Pytorch Lightning vs PyTorch Ignite vs Fast.ai

Here, I will attempt an objective comparison between all three frameworks. This comparison comes from laying out similarities and differences objectively found in tutorials and documentation of all three frameworks.

on Aug 16, 2019 in fast.ai, Neural Networks, Python, PyTorch, PyTorch Lightning
Data Driven Government – Speakers Highlights

The lineup of experienced, thought-leading speakers at Data Driven Government, Sep 25 in Washington, DC, will explain how to use data and analytics to more effectively accomplish your mission, increase efficiency, and improve evidence-based policymaking.

on Aug 15, 2019 in DC, Government, PAW, Predictive Analytics World, Washington
How Concerned Should You be About Predictor Collinearity? It Depends…

Predictor collinearity (also known as multicollinearity) can be problematic for your regression models. Check out these rules of thumb about when, and when not, to be concerned.

on Aug 15, 2019 in Collinearity, Correlation, Linear Regression, Prediction
Command Line Basics Every Data Scientist Should Know

Check out this introductory guide to completing simple tasks with the command line.

on Aug 15, 2019 in Data Science, Data Science Tools
Top KDnuggets tweets, Aug 07-13: Deep Learning Cheat Sheets; 12 NLP Researchers, Practitioners To Follow

Deep Learning Cheat Sheets; 12 NLP Researchers, Practitioners & Innovators You Should Be Following; Knowing Your Neighbours: Machine Learning on Graphs.

on Aug 14, 2019 in Deep Learning, Graph Mining, NLP, Top tweets
Statistical Modelling vs Machine Learning

At times it may seem Machine Learning can be done these days without a sound statistical background but those people are not really understanding the different nuances. Code written to make it easier does not negate the need for an in-depth understanding of the problem.

on Aug 14, 2019 in Advice, Data Science, Machine Learning, Statistics
What is Poisson Distribution?

An solid overview of the Poisson distribution, starting from why it is needed, how it stacks up to binomial distribution, deriving its formula mathematically, and more.

on Aug 14, 2019 in Distribution, Poisson Distribution, Probability, Statistics
Learn how to use PySpark in under 5 minutes (Installation + Tutorial)

Apache Spark is one of the hottest and largest open source project in data processing framework with rich high-level APIs for the programming languages like Scala, Python, Java and R. It realizes the potential of bringing together both Big Data and machine learning.

on Aug 13, 2019 in Apache Spark, Big Data, Data Science, Python
6 Key Concepts in Andrew Ng’s “Machine Learning Yearning”

If you are diving into AI and machine learning, Andrew Ng's book is a great place to start. Learn about six important concepts covered to better understand how to use these tools from one of the field's best practitioners and teachers.

on Aug 12, 2019 in AI, Andrew Ng, Best Practices, Deployment, Machine Learning, Metrics, Training Data
A 2019 Guide to Semantic Segmentation

Semantic segmentation refers to the process of linking each pixel in an image to a class label. These labels could include a person, car, flower, piece of furniture, etc., just to mention a few. We’ll now look at a number of research papers on covering state-of-the-art approaches to building semantic segmentation models.

on Aug 12, 2019 in Image Classification, Image Recognition, Python, Segmentation
12 NLP Researchers, Practitioners & Innovators You Should Be Following

Check out this list of NLP researchers, practitioners and innovators you should be following, including academics, practitioners, developers, entrepreneurs, and more.

on Aug 12, 2019 in Influencers, Jeremy Howard, NLP, Rachel Thomas, Research, Richard Socher
Keras Callbacks Explained In Three Minutes

A gentle introduction to callbacks in Keras. Learn about EarlyStopping, ModelCheckpoint, and other callback functions with code examples.

on Aug 9, 2019 in Explained, Keras, Neural Networks, Python
Introduction to Image Segmentation with K-Means clustering

Image segmentation is the classification of an image into different groups. Many kinds of research have been done in the area of image segmentation using clustering. In this article, we will explore using the K-Means clustering algorithm to read an image and cluster different regions of the image.

By Nagesh Singh Chauhan on Aug 9, 2019 in Clustering, Computer Vision, Image Recognition, K-means, Python, Segmentation
9 Tips For Training Lightning-Fast Neural Networks In Pytorch

Who is this guide for? Anyone working on non-trivial deep learning models in Pytorch such as industrial researchers, Ph.D. students, academics, etc. The models we're talking about here might be taking you multiple days to train or even weeks or months.

on Aug 9, 2019 in Neural Networks, Performance, PyTorch, PyTorch Lightning, Tips
Knowing Your Neighbours: Machine Learning on Graphs

Graph Machine Learning uses the network structure of the underlying data to improve predictive outcomes. Learn how to use this modern machine learning method to solve challenges with connected data.

on Aug 8, 2019 in Convolutional Neural Networks, Graph Analytics, Graph Mining, Machine Learning
Inside Pluribus: Facebook’s New AI That Just Mastered the World’s Most Difficult Poker Game

The reasons why Pluribus represents a major breakthrough in AI systems might result confusing to many readers. After all, in recent years AI researchers have made tremendous progress across different complex games. However, six-player, no-limit Texas Hold’em still remains one of the most elusive challenges for AI systems.

on Aug 8, 2019 in AI, Facebook, Poker
Exploratory Data Analysis Using Python

In this tutorial, you’ll use Python and Pandas to explore a dataset and create visual distributions, identify and eliminate outliers, and uncover correlations between two datasets.

on Aug 7, 2019 in ActiveState, Data Analysis, Data Exploration, Pandas, Python
What is Benford’s Law and why is it important for data science?

Benford’s law is a little-known gem for data analytics. Learn about how this can be used for anomaly or fraud detection in scientific or technical publications.

on Aug 7, 2019 in Anomaly Detection, Benford's Law, Fraud Detection
Deep Learning for NLP: ANNs, RNNs and LSTMs explained!

Learn about Artificial Neural Networks, Deep Learning, Recurrent Neural Networks and LSTMs like never before and use NLP to build a Chatbot!

on Aug 7, 2019 in Deep Learning, Explained, LSTM, Neural Networks, NLP, Recurrent Neural Networks
Coding Random Forests® in 100 lines of code*

There are dozens of machine learning algorithms out there. It is impossible to learn all their mechanics; however, many algorithms sprout from the most established algorithms, e.g. ordinary least squares, gradient boosting, support vector machines, tree-based algorithms and neural networks.

on Aug 7, 2019 in Algorithms, Machine Learning, Multicollinearity, R, random forests algorithm
Feature selection by random search in Python

Feature selection is one of the most important tasks in machine learning. Learn how to use a simple random search in Python to get good results in less time.

on Aug 6, 2019 in Collinearity, Cross-validation, Feature Selection, Python, Random
25 Tricks for Pandas

Check out this video (and Jupyter notebook) which outlines a number of Pandas tricks for working with and manipulating data, covering topics such as string manipulations, splitting and filtering DataFrames, combining and aggregating data, and more.

on Aug 6, 2019 in Pandas, Python, Tips
Lagrange multipliers with visualizations and code

In this story, we’re going to take an aerial tour of optimization with Lagrange multipliers. When do we need them? Whenever we have an optimization problem with constraints.

on Aug 6, 2019 in Analytics, Mathematics, Optimization, Python
Machine Learning is Happening Now: A Survey of Organizational Adoption, Implementation, and Investment

This is an excerpt from a survey which sought to evaluate the relevance of machine learning in operations today, assess the current state of machine learning adoption and to identify tools used for machine learning. A link to the full report is inside.

on Aug 5, 2019 in Machine Learning, Report, Survey
Pytorch Cheat Sheet for Beginners and Udacity Deep Learning Nanodegree

This cheatsheet should be easier to digest than the official documentation and should be a transitional tool to get students and beginners to get started reading documentations soon.

on Aug 2, 2019 in Beginners, Cheat Sheet, Deep Learning, Google Colab, Python, PyTorch, Udacity
Easily Deploy Deep Learning Models in Production

Getting trained neural networks to be deployed in applications and services can pose challenges for infrastructure managers. Challenges like multiple frameworks, underutilized infrastructure and lack of standard implementations can even cause AI projects to fail. This blog explores how to navigate these challenges.

on Aug 1, 2019 in Deep Learning, Deployment, GPU, Inference, NVIDIA
Opening Black Boxes: How to leverage Explainable Machine Learning

A machine learning model that predicts some outcome provides value. One that explains why it made the prediction creates even more value for your stakeholders. Learn how Interpretable and Explainable ML technologies can help while developing your model.

on Aug 1, 2019 in Explainable AI, Feature Selection, LIME, Machine Learning, SHAP, XAI
How a simple mix of object-oriented programming can sharpen your deep learning prototype

By mixing simple concepts of object-oriented programming, like functionalization and class inheritance, you can add immense value to a deep learning prototyping code.

on Aug 1, 2019 in Deep Learning, Keras, Programming, Python
A 2019 Guide to Object Detection

Object detection has been applied widely in video surveillance, self-driving cars, and object/people tracking. In this piece, we’ll look at the basics of object detection and review some of the most commonly-used algorithms and a few brand new approaches, as well.

on Aug 1, 2019 in Computer Vision, Image Recognition, Object Detection

2019 Aug

Latest Posts

Top Posts