2019 Jul

All (64) | News (1) | Opinions (20) | Tutorials, Overviews (43)

Can we trust AutoML to go on full autopilot?

We put an AutoML tool to the test on a real-world problem, and the results are surprising. Even with automatic machine learning, you still need expert data scientists.

on Jul 31, 2019 in Automated Machine Learning, AutoML, Overfitting, Time Series
Five Command Line Tools for Data Science

You can do more data science than you think from the terminal.

on Jul 31, 2019 in Data Exploration, Data Science, Data Science Tools
Ten more random useful things in R you may not know about

I had a feeling that R has developed as a language to such a degree that many of us are using it now in completely different ways. This means that there are likely to be numerous tricks, packages, functions, etc that each of us use, but that others are completely unaware of, and would find useful if they knew about them.

on Jul 31, 2019 in Advice, Analytics, Data Science, R
Understanding Tensor Processing Units

The Tensor Processing Unit (TPU) is Google's custom tool to accelerate machine learning workloads using the TensorFlow framework. Learn more about what TPUs do and how they can work for you.

on Jul 30, 2019 in Google, Sciforce, TensorFlow, TPU
P-values Explained By Data Scientist

This article is designed to give you a full picture from constructing a hypothesis testing to understanding p-value and using that to guide our decision making process.

on Jul 30, 2019 in Data Science, Data Scientist, Hypothesis Testing, P-value, Statistics
Here’s how you can accelerate your Data Science on GPU

Data Scientists need computing power. Whether you’re processing a big dataset with Pandas or running some computation on a massive matrix with Numpy, you’ll need a powerful machine to get the job done in a reasonable amount of time.

on Jul 30, 2019 in Big Data, Data Science, DBSCAN, Deep Learning, GPU, NVIDIA, Python
Top 10 Best Podcasts on AI, Analytics, Data Science, Machine Learning

Check out our latest Top 10 Most Popular Data Science and Machine Learning podcasts available on iTunes. Stay up to date in the field with these recent episodes and join in with the current data conversations.

on Jul 29, 2019 in AI, Analytics, Data Science, Machine Learning, Podcast
7 Tips for Dealing With Small Data

At my workplace, we produce a lot of functional prototypes for our clients. Because of this, I often need to make Small Data go a long way. In this article, I’ll share 7 tips to improve your results when prototyping with small datasets.

on Jul 29, 2019 in Cross-validation, Data Models, Ensemble Methods, Modeling, Tips, Transfer Learning
Decentralized and Collaborative AI: How Microsoft Research is Using Blockchains to Build More Transparent Machine Learning Models

Recently, AI researchers from Microsoft open sourced the Decentralized & Collaborative AI on Blockchain project that enables the implementation of decentralized machine learning models based on blockchain technologies.

on Jul 29, 2019 in AI, Blockchain, Machine Learning, Microsoft, Transparency
Convolutional Neural Networks: A Python Tutorial Using TensorFlow and Keras

Different neural network architectures excel in different tasks. This particular article focuses on crafting convolutional neural networks in Python using TensorFlow and Keras.

on Jul 26, 2019 in Convolutional Neural Networks, Keras, Neural Networks, Python, TensorFlow
Top 13 Skills To Become a Rockstar Data Scientist

Education, coding, SQL, big data platforms, storytelling and more. These are the 13 skills you need to master to become a rockstar data scientist.

on Jul 26, 2019 in Career Advice, Data Science, Data Science Skills, Data Scientist, Skills
Fantastic Four of Data Science Project Preparation

This article takes a closer look at the four fantastic things we should keep in mind when approaching every new data science project.

on Jul 26, 2019 in Comic, Data Exploration, Data Preparation, Data Science, Domain Knowledge
High-Quality AI And Machine Learning Data Labeling At Scale: A Brief Research Report

Analyst firm Cognilytica estimates that as much as 80% of machine learning project time is spent on aggregating, cleaning, labeling, and augmenting machine learning model data. So, how do innovative machine learning teams prepare data in such a way that they can trust its quality, cost of preparation, and the speed with which it’s delivered?

on Jul 25, 2019 in AI, Cloudfactory, Data Labeling, Machine Learning, Report, Research
A Gentle Introduction to Noise Contrastive Estimation

Find out how to use randomness to learn your data by using Noise Contrastive Estimation with this guide that works through the particulars of its implementation.

on Jul 25, 2019 in Deep Learning, Logistic Regression, Neural Networks, Noise, Random, Sampling, word2vec
Top Certificates and Certifications in Analytics, Data Science, Machine Learning and AI

Here are the top certificates and certifications in Analytics, AI, Data Science, Machine Learning and related areas.

on Jul 25, 2019 in Business Analytics, Certificate, Certification, Data Science Certificate, Education, Machine Learning, Online Education, SAS Certification
Is SQL needed to be a data scientist?

As long as there is ‘data’ in data scientist, Structured Query Language (or see-quel as we call it) will remain an important part of it. In this blog, let us explore data science and its relationship with SQL.

on Jul 25, 2019 in Data Science, Relational Databases, SQL
Neural Code Search: How Facebook Uses Neural Networks to Help Developers Search for Code Snippets

Developers are always searching for answers to questions about their code. But how do they ask the right questions? Facebook is creating new NLP neural networks to help search code repositories that may advance information retrieval algorithms.

on Jul 24, 2019 in Facebook, Information Retrieval, Natural Language Processing, Neural Networks, NLP, Programming
This New Google Technique Help Us Understand How Neural Networks are Thinking

Recently, researchers from the Google Brain team published a paper proposing a new method called Concept Activation Vectors (CAVs) that takes a new angle to the interpretability of deep learning models.

on Jul 24, 2019 in Accuracy, Deep Learning, Google, Interpretability, Neural Networks
Easy, One-Click Jupyter Notebooks

All of the setup for software, networking, security, and libraries is automatically taken care of by the Saturn Cloud system. Data Scientists can then focus on the actual Data Science and not the tedious infrastructure work that falls around it

on Jul 24, 2019 in Big Data, Cloud, Data Science, Data Scientist, DevOps, Jupyter, Python, Saturn Cloud
12 Things I Learned During My First Year as a Machine Learning Engineer

Learn about the day-in-the-life of one machine learning engineer and the important lessons learned for being successful in that role.

on Jul 23, 2019 in Advice, Best Practices, Communication, Machine Learning Engineer, Skills
Kaggle Kernels Guide for Beginners: A Step by Step Tutorial

This is an attempt to hold the hands of a complete beginner and walk them through the world of Kaggle Kernels — for them to get started.

on Jul 23, 2019 in Kaggle, Python, R
Is Bias in Machine Learning all Bad?

We have been taught over our years of predictive model building that bias will harm our model. Bias control needs to be in the hands of someone who can differentiate between the right kind and wrong kind of bias.

on Jul 23, 2019 in Bias, Data Science, Machine Learning
The title CDO started out as a joke

How did the role of Chief Data Officer come to drive data literacy at companies around the world? Find out how it all began in this interview with the first who held the title at Yahoo!

on Jul 22, 2019 in Africa, Barclays, Chief Data Officer, IADSS, Kate Strachnyi, Usama Fayyad, Yahoo
What’s the Best Data Strategy for Enterprises: Build, buy, partner or acquire?

Every large organization is investing heavily in building data solutions and tools. They are building data solutions from scratch when they could be taking advantage of readily available tools and solutions. Many organizations are re-inventing the wheel and wasting resources.

on Jul 22, 2019 in Acquisitions, Enterprise, Implementation, Open Source, Strategy
From Data Pre-processing to Optimizing a Regression Model Performance

All you need to know about data pre-processing, and how to build and optimize a regression model using Backward Elimination method in Python.

on Jul 19, 2019 in Model Performance, Modeling, Optimization, Regression
Bayesian deep learning and near-term quantum computers: A cautionary tale in quantum machine learning

This blog post is an overview of quantum machine learning written by the author of the paper Bayesian deep learning on a quantum computer. In it, we explore the application of machine learning in the quantum computing space. The authors of this paper hope that the results of the experiment help influence the future development of quantum machine learning.

on Jul 19, 2019 in Bayesian, Machine Learning, Quantum Computing
The Evolution of a ggplot

A step-by-step tutorial showing how to turn a default ggplot into an appealing and easily understandable data visualization in R.

on Jul 18, 2019 in Data Visualization, ggplot2, R
Big Data for Insurance

The insurance industry has always been quite conservative; however, the adoption of new technologies is not just a modern trend but a necessity to maintain the competitive pace. In the modern digital era, Big Data technologies help to process vast amounts of information, increase workflow efficiency, and reduce operational costs. Learn more about the benefits of Big Data for insurance from our material.

on Jul 18, 2019 in Analytics, Big Data, Insurance, Predictive Analytics
Adapters: A Compact and Extensible Transfer Learning Method for NLP

Adapters obtain comparable results to BERT on several NLP tasks while achieving parameter efficiency.

on Jul 18, 2019 in BERT, NLP, Transfer Learning, Transformer
A Summary of DeepMind’s Protein Folding Upset at CASP13

Learn how DeepMind dominated the last CASP competition for advancing protein folding models. Their approach using gradient descent is today's state of the art for predicting the 3D structure of a protein knowing only its comprising amino acid compounds.

on Jul 17, 2019 in Bioinformatics, Deep Learning, DeepMind, Exxact, Generative Adversarial Network, Gradient Descent, Protein
How to Make Stunning 3D Plots for Better Storytelling

3D Plots built in the right way for the right purpose are always stunning. In this article, we’ll see how to make stunning 3D plots with R using ggplot2 and rayshader.

on Jul 17, 2019 in Data Visualization, ggplot2, R, Storytelling
Computer Vision for Beginners: Part 1

Image processing is performing some operations on images to get an intended manipulation. Think about what we do when we start a new data analysis. We do some data preprocessing and feature engineering. It’s the same with image processing.

on Jul 17, 2019 in Computer Vision, Deep Learning, Image Processing, Python
Things I Have Learned About Data Science

Read this collection of 38 things the author has learned along his travels, and has opted to share for the benefit of the reader.

on Jul 16, 2019 in Data Science, Tips
Dealing with categorical features in machine learning

Many machine learning algorithms require that their input is numerical and therefore categorical features must be transformed into numerical features before we can use any of these algorithms.

on Jul 16, 2019 in Data Cleaning, Data Preprocessing, Feature Engineering, Machine Learning, Python
Scaling a Massive State-of-the-art Deep Learning Model in Production

A new NLP text writing app based on OpenAI's GPT-2 aims to write with you -- whenever you ask. Find out how the developers setup and deployed their model into production from an engineer working on the team.

on Jul 15, 2019 in Deep Learning, Deployment, NLP, OpenAI, Scalability, Transformer
Secrets to a Successful Data Science Interview

Are you puzzled as to what to prepare for data science interviews? That you are reading this document is a reflection of your seriousness in being a successful data scientist.

on Jul 15, 2019 in Career Advice, Data Science, Interview
The Hackathon Guide for Aspiring Data Scientists

This article is an overview of how to prepare for a hackathon as an aspiring data scientist, highlighting the 4 reasons why you should take part in one, along with a series of tips for participation.

on Jul 15, 2019 in Data Science, Flask, Hackathon, Mobile, Product
Introducing Gen: MIT’s New Language That Wants to be the TensorFlow of Programmable Inference

Researchers from MIT recently unveiled a new probabilistic programming language named Gen, a language which allow researchers to write models and algorithms from multiple fields where AI techniques are applied without having to deal with equations or manually write high-performance code.

on Jul 12, 2019 in Inference, Julia, MIT, Programming Languages
Pre-training, Transformers, and Bi-directionality

Bidirectional Encoder Representations from Transformers BERT (Devlin et al., 2018) is a language representation model that combines the power of pre-training with the bi-directionality of the Transformer’s encoder (Vaswani et al., 2017). BERT improves the state-of-the-art performance on a wide array of downstream NLP tasks with minimal additional task-specific training.

on Jul 12, 2019 in AISC, BERT, NLP, Training, Transformer
Top 10 Data Science Leaders You Should Follow

If you’re in the data science field, I strongly encourage you to follow these giants— which I’ll list down in the section below — and be a part of our data science community to learn from the best and share your experience and knowledge.

on Jul 12, 2019 in Data Science, Experts, Influencers, Social Media
The Death of Big Data and the Emergence of the Multi-Cloud Era

The Era of Big Data is coming to an end as the focus shifts from how we collect data to processing that data in real-time. Big Data is now a business asset supporting the next eras of multi-cloud support, machine learning, and real-time analytics.

on Jul 11, 2019 in Big Data, Cloudera, Hadoop, Multi-cloud, Realtime Analytics
Training a Neural Network to Write Like Lovecraft

In this post, the author attempts to train a neural network to generate Lovecraft-esque prose, known to be awkward and irregular at best. Did it end in success? If not, any suggestions on how it might have? Read on to find out.

on Jul 11, 2019 in Keras, LSTM, Natural Language Generation, Neural Networks, Python, TensorFlow
10 Simple Hacks to Speed up Your Data Analysis in Python

This article lists some curated tips for working with Python and Jupyter Notebooks, covering topics such as easily profiling data, formatting code and output, debugging, and more. Hopefully you can find something useful within.

on Jul 11, 2019 in Data Analysis, Jupyter, Pandas, Python, Tips
How to Showcase the Impact of Your Data Science Work

You're a Data Scientist -- or preparing to land your first job -- and communicating your work to others, especially employers, so they understand your impact is essential. These five tips will help you help others appreciate your data science.

on Jul 10, 2019 in Advice, Data Science, Industry
A Gentle Guide to Starting Your NLP Project with AllenNLP

For those who aren’t familiar with AllenNLP, I will give a brief overview of the library and let you know the advantages of integrating it to your project.

on Jul 10, 2019 in Allen Institute, NLP, Python, Sentiment Analysis
What’s wrong with the approach to Data Science?

The job ‘Data Scientist’ has been around for decades, it was just not called “Data Scientist”. Statisticians have used their knowledge and skills using machine learning techniques such as Logistic Regression and Random Forest for prediction and insights for longer than people actually realize.

on Jul 10, 2019 in Advice, Data Science, Data Science Education, Data Scientist
Why you’re not a job-ready data scientist (yet)

Trying to snag a dream Data Science job, but can't seem to land one? Check out these four skills that companies really want and be prepared for your next interview.

on Jul 9, 2019 in Advice, Career, Data Scientist, Mentorship
Practical Speech Recognition with Python: The Basics

Do you fear implementing speech recognition in your Python apps? Read this tutorial for a simple approach to getting practical with speech recognition using open source Python libraries.

on Jul 9, 2019 in Google, NLP, Python, Speech Recognition
Annotated Heatmaps of a Correlation Matrix in 5 Simple Steps

A heatmap is a graphical representation of data in which data values are represented as colors. That is, it uses color in order to communicate a value to the reader. This is a great tool to assist the audience towards the areas that matter the most when you have a large volume of data.

on Jul 9, 2019 in Data Visualization, Python, Statistics
Collaborative Evolutionary Reinforcement Learning

Intel Researchers created a new approach to RL via Collaborative Evolutionary Reinforcement Learning (CERL) that combines policy gradient and evolution methods to optimize, exploit, and explore challenges.

on Jul 8, 2019 in Evolutionary Algorithm, Intel, Reinforcement Learning
XGBoost and Random Forest® with Bayesian Optimisation

This article will explain how to use XGBoost and Random Forest with Bayesian Optimisation, and will discuss the main pros and cons of these methods.

on Jul 8, 2019 in Bayesian, Optimization, Python, random forests algorithm, XGBoost
Classifying Heart Disease Using K-Nearest Neighbors

I have written this post for the developers and assumes no background in statistics or mathematics. The focus is mainly on how the k-NN algorithm works and how to use it for predictive modeling problems.

on Jul 8, 2019 in Healthcare, K-nearest neighbors, Machine Learning, Medical, Python
How Data Science Is Used Within the Film Industry

As Data Science is becoming pervasive across so many industries, Hollywood is certainly not being left behind. Learn about how Big Data, analytics, and AI are now core drivers of the movies we watch and how we watch them.

on Jul 5, 2019 in Data Science, Industry, Marketing, Movies, Predictive Analytics, Recommender Systems
State of AI Report 2019

This year's "State of AI Report" has been released. Read it to find out about the latest in AI research, talent, industry, and politics form the past 12 months.

on Jul 5, 2019 in AI, Report
Top 8 Data Science Use Cases in Construction

This article considers several of the most efficient and productive data science use cases in the construction industry.

on Jul 5, 2019 in Optimization, Predictive Analytics, Product Analytics, Risk Analytics, Use Cases
Cartoon: AI + Self-Driving + BBQ = ?

KDnuggets Cartoon looks at what happens when AI and self-driving technology collide with the traditional summer pastime of grilling.

on Jul 4, 2019 in Adversarial, Cartoon, Deep Learning, Self-Driving Car
5 Probability Distributions Every Data Scientist Should Know

Having an understanding of probability distributions should be a priority for data scientists. Make sure you know what you should by reviewing this post on the subject.

on Jul 4, 2019 in Data Science, Data Scientist, Distribution, Normal Distribution, Probability
NLP vs. NLU: from Understanding a Language to Its Processing

As AI progresses and the technology becomes more sophisticated, we expect existing techniques to evolve. With these changes, will the well-founded natural language processing give way to natural language understanding? Or, are the two concepts subtly distinct to hold their own niche in AI?

on Jul 3, 2019 in AI, NLP, NLU, Sciforce
Building a Recommender System, Part 2

This post explores an technique for collaborative filtering which uses latent factor models, a which naturally generalizes to deep learning approaches. Our approach will be implemented using Tensorflow and Keras.

on Jul 3, 2019 in Movies, Python, Recommendation Engine, Recommender Systems
4 Most Popular Alternative Data Sources Explained

Alternative data is the new game changer. To start with alternative data, people might even wonder from where you can get hold of alternative data that can give such a competitive advantage. This post details 4 alternative data sources that you can exploit to the fullest.

on Jul 2, 2019 in Explained, Sensors, Social Networks, Traffic, Transactions, Weather
Seven Key Dimensions to Help You Understand Artificial Intelligence Environments

Understanding an AI environment is an incredibly complex task but there are several key dimensions that provide clarity on that reasoning.

on Jul 2, 2019 in AI, Environment
How do you check the quality of your regression model in Python?

Linear regression is rooted strongly in the field of statistical learning and therefore the model must be checked for the ‘goodness of fit’. This article shows you the essential steps of this task in a Python ecosystem.

on Jul 2, 2019 in Data Science, Multicollinearity, Python, Regression, Statistics
A Data Scientist’s Path to Understanding Market Simulation

Made possible by recent advances in computing power and machine learning, market simulation employs agent-based modeling, behavioral science and network science to recreate the complex dynamics and rules of how a population of people in a given market behave, influence each other and make decisions.

on Jul 1, 2019 in Forecasting, Market Analytics, Market Forecast, Simulation
XLNet Outperforms BERT on Several NLP Tasks

XLNet is a new pretraining method for NLP that achieves state-of-the-art results on several NLP tasks.

on Jul 1, 2019 in BERT, NLP, Performance

2019 Jul

Latest Posts

Top Posts