2019 Nov
All (89) | Courses, Education (1) | Meetings (1) | News (6) | Opinions (21) | Top Stories, Tweets (9) | Tutorials, Overviews (45) | Webcasts & Webinars (6)
- Two Years In The Life of AI, Machine Learning, Deep Learning and Java - Nov 29, 2019.
Where does Java stand in the world of artificial intelligence, machine learning, and deep learning? Learn more about how to do these things in Java, and the libraries and frameworks to use.
- Markov Chains: How to Train Text Generation to Write Like George R. R. Martin - Nov 29, 2019.
Read this article on training Markov chains to generate George R. R. Martin style text.
- Lit BERT: NLP Transfer Learning In 3 Steps - Nov 29, 2019.
PyTorch Lightning is a lightweight framework which allows anyone using PyTorch to scale deep learning code easily while making it reproducible. In this tutorial we’ll use Huggingface's implementation of BERT to do a finetuning task in Lightning.
-
Open Source Projects by Google, Uber and Facebook for Data Science and AI - Nov 28, 2019.
Open source is becoming the standard for sharing and improving technology. Some of the largest organizations in the world namely: Google, Facebook and Uber are open sourcing their own technologies that they use in their workflow to the public. - Cartoon: Thanksgiving, Big Data, and Turkey Data Science… - Nov 28, 2019.
A classic KDnuggets Thanksgiving cartoon examines the predicament of one group of fowl Data Scientists.
- A Doomed Marriage of Machine Learning and Agile - Nov 28, 2019.
Sebastian Thrun, the founder of Udacity, ruined my machine learning project and wedding.
-
Getting Started with Automated Text Summarization - Nov 28, 2019.
This article will walk through an extractive text summarization process, using a simple word frequency approach, implemented in Python. - Top KDnuggets tweets, Nov 20-26: How to Speed up Pandas by 4x with one line of code - Nov 27, 2019.
Also: Deep Learning for Image Classification with Less Data; How to Speed up Pandas by 4x with one line of code; 25 Useful #Python Snippets to Help in Your Day-to-Day Work; Automated Machine Learning Project Implementation Complexities
-
The Future of Careers in Data Science & Analysis - Nov 27, 2019.
As the fields of data science and analysis continue to expand, the next crop of bright minds is always needed. Learn more about the nuances of these jobs and find where you can fit in for a rewarding and interesting career. - Spark NLP 101: LightPipeline - Nov 27, 2019.
A Pipeline is specified as a sequence of stages, and each stage is either a Transformer or an Estimator. These stages are run in order, and the input DataFrame is transformed as it passes through each stage. Now let’s see how this can be done in Spark NLP using Annotators and Transformers.
- Task-based effectiveness of basic visualizations - Nov 27, 2019.
This is a summary of a recent paper on an age-old topic: what visualisation should I use? No prizes for guessing “it depends!” Is this the paper to finally settle the age-old debate surrounding pie-charts??
- AXA, State Auto and Hippo on how to fuse innovative tech to your company - Nov 26, 2019.
Join this live webinar: Fast, Agile, Service-Driven Insurance: Fuse Innovative Tech to Your Company DNA - AI, Chatbots, Automation and More, Dec 11 at 10:00am EST, to get actionable insight to develop your strategy.
- Machine Learning 101: The What, Why, and How of Weighting - Nov 26, 2019.
Weighting is a technique for improving models. In this article, learn more about what weighting is, why you should (and shouldn’t) use it, and how to choose optimal weights to minimize business costs.
- Content-based Recommender Using Natural Language Processing (NLP) - Nov 26, 2019.
A guide to build a content-based movie recommender model based on NLP.
- Probability Learning: Naive Bayes - Nov 26, 2019.
This post will describe various simplifications of Bayes' Theorem, that make it more practical and applicable to real world problems: these simplifications are known by the name of Naive Bayes. Also, to clarify everything we will see a very illustrative example of how Naive Bayes can be applied for classification.
- Would you buy insights from this guy? (How to assess and manage a Data Science vendor) - Nov 25, 2019.
With all the hype from data science vendors selling "actionable insights" to boost your company's bottom line, selecting your analytics partner should proceed through the same, careful process as any traditional business endeavor. Follow these questions and best practices to ensure you manage accordingly.
- Top 8 Data Science Use Cases in Marketing - Nov 25, 2019.
In this article, we want to highlight some key data science use cases in marketing. Let us concentrate on several instances that present particular interest and managed to prove their efficiency in the course of time.
- Top Stories, Nov 18-24: How to Speed up Pandas by 4x with one line of code; Python, Selenium & Google for Geocoding Automation: Free and Paid - Nov 25, 2019.
Also: Automated Machine Learning Project Implementation Complexities; Text Encoding: A Review; The Notebook Anti-Pattern; Data Science for Managers: Programming Languages; 10 Free Must-read Books on AI
- Can Neural Networks Develop Attention? Google Thinks they Can - Nov 25, 2019.
Google recently published some work about modeling attention mechanisms in deep neural networks.
- Advice for New and Junior Data Scientists - Nov 22, 2019.
If you are a new Data Scientist early in your professional journey, and you’re a bit confused and lost, then follow this advice to figure out how to best contribute to your company.
-
Automated Machine Learning Project Implementation Complexities - Nov 22, 2019.
To demonstrate the implementation complexity differences along the AutoML highway, let's have a look at how 3 specific software projects approach the implementation of just such an AutoML "solution," namely Keras Tuner, AutoKeras, and automl-gs. - Text Encoding: A Review - Nov 22, 2019.
We will focus here exactly on that part of the analysis that transforms words into numbers and texts into number vectors: text encoding.
- Three Methods of Data Pre-Processing for Text Classification - Nov 21, 2019.
This blog shows how text data representations can be used to build a classifier to predict a developer’s deep learning framework of choice based on the code that they wrote, via examples of TensorFlow and PyTorch projects.
-
Python, Selenium & Google for Geocoding Automation: Free and Paid - Nov 21, 2019.
This tutorial will take you through two options that have automated the geocoding process for the user using Python, Selenium and Google Geocoding API. - Neural Networks 201: All About Autoencoders - Nov 21, 2019.
Autoencoders can be a very powerful tool for leveraging unlabeled data to solve a variety of problems, such as learning a "feature extractor" that helps build powerful classifiers, finding anomalies, or doing a Missing Value Imputation.
- Top KDnuggets tweets, Nov 13-19: A whole lot of Data Science Cheatsheets - Nov 21, 2019.
Also: Bring the scientific rigor of reproducibility to your Data Science projects; Neutrinos Lead to Unexpected Discovery in Basic Math ; The media gets really excited about AI. Maybe a bit too excited
- The Notebook Anti-Pattern - Nov 21, 2019.
This article aims to explain why this drive towards the use of notebooks in production is an anti pattern, giving some suggestions along the way.
- Python Tuples and Tuple Methods - Nov 21, 2019.
Brush up on your Python basics with this post on creating, using, and manipulating tuples.
- The Semiconductor Imperative for Driving Meaningful Innovation - Nov 20, 2019.
The fundamental fact is that more information than ever will need to be analyzed on millions of devices. And that’s where 5G will make accessing data dramatically faster and more efficient. At Samsung, we’re excited about what 5G can truly enable and to be a central player in the new 5G world.
- Pro Tips: How to deal with Class Imbalance and Missing Labels - Nov 20, 2019.
Your spectacularly-performing machine learning model could be subject to the common culprits of class imbalance and missing labels. Learn how to handle these challenges with techniques that remain open areas of new research for addressing real-world machine learning problems.
- Deep Learning for Image Classification with Less Data - Nov 20, 2019.
In this blog I will be demonstrating how deep learning can be applied even if we don’t have enough data.
-
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead - Nov 20, 2019.
The two main takeaways from this paper: firstly, a sharpening of my understanding of the difference between explainability and interpretability, and why the former may be problematic; and secondly some great pointers to techniques for creating truly interpretable models. - Why write for KDnuggets? Calling for original blogs and new authors - Nov 19, 2019.
KDnuggets is calling for original blogs and contributions from new authors on AI, Data Science, Machine Learning, and related topics. The authors of most popular such blogs in December will be profiled in KDnuggets.
- How to apply machine learning and deep learning methods to audio analysis - Nov 19, 2019.
Find out how data scientists and AI practitioners can use a machine learning experimentation platform like Comet.ml to apply machine learning and deep learning to methods in the domain of audio analysis.
- Reproducibility, Replicability, and Data Science - Nov 19, 2019.
As cornerstones of scientific processes, reproducibility and replicability ensure results can be verified and trusted. These two concepts are also crucial in data science, and as a data scientist, you must follow the same rigor and standards in your projects.
- The Math Behind Bayes - Nov 19, 2019.
This post will be dedicated to explaining the maths behind Bayes Theorem, when its application makes sense, and its differences with Maximum Likelihood.
-
Data Science for Managers: Programming Languages - Nov 19, 2019.
In this article, we are going to talk about popular languages for Data Science and briefly describe each of them. - Top Stories, Nov 11-17: How to Speed up Pandas by 4x with one line of code - Nov 18, 2019.
Also: The Complete Data Science LinkedIn Profile Guide; How Data Analytics Can Assist in Fraud Detection; Research Guide for Depth Estimation with Deep Learning; 10 Free Must-read Books on AI; Beginners Guide to the Three Types of Machine Learning
- Live Webinar: Continual Learning with Human-in-the-loop - Nov 18, 2019.
Join this live webinar from cnvrg, Continual Learning with Human-in-the-loop, Nov 26 @ 12 PM EST, and learn the role of human-in-the-loop in your ML pipeline, how to close the loop in your pipeline, and much more.
- Generalization in Neural Networks - Nov 18, 2019.
When training a neural network in deep learning, its performance on processing new data is key. Improving the model's ability to generalize relies on preventing overfitting using these important methods.
- The Reinforcement-Learning Methods that Allow AlphaStar to Outcompete Almost All Human Players at StarCraft II - Nov 18, 2019.
The new AlphaStar achieved Grandmaster level at StarCraft II overcoming some of the limitations of the previous version. How did it do it?
- GitHub Repo Raider and the Automation of Machine Learning - Nov 18, 2019.
Since X never, ever marks the spot, this article raids the GitHub repos in search of quality automated machine learning resources. Read on for projects and papers to help understand and implement AutoML.
- On the sensationalism of artificial intelligence news - Nov 15, 2019.
With artificial intelligence and machine learning now a mainstay of our daily awareness, news organizations have been seen to overstate the reality behind progress in the field. Learn more about recent examples of media hyperbole and explore why this may be happening.
- Tips for a cost-effective machine learning project - Nov 15, 2019.
Spoiler: you don’t need a VM running 24/7 to handle 16 requests a day.
- Python Lists and List Manipulation - Nov 15, 2019.
In Python, lists store an ordered collection of items which can be of different types. This post is an overview of lists and their manipulation.
- AI ROI: The Questions You Need To Be Asking - Nov 14, 2019.
During this free Metis Corporate Training webinar, Dec 5 @ 12pm ET, Kerstin Frailey, Senior Data Scientist and Head of Executive Corporate Training at Metis, will walk through what you need to ask before, during, and after the lifetime of a data science project to accurately assess its impact on the business.
- How to Visualize Data in Python (and R) - Nov 14, 2019.
Producing accessible data visualizations is a key data science skill. The following guidelines will help you create the best representations of your data using R and Python's Pandas library.
- Topics Extraction and Classification of Online Chats - Nov 14, 2019.
This article provides covers how to automatically identify the topics within a corpus of textual data by using unsupervised topic modelling, and then apply a supervised classification algorithm to assign topic labels to each textual document by using the result of the previous step as target labels.
- Testing Your Machine Learning Pipelines - Nov 14, 2019.
Let’s take a look at traditional testing methodologies and how we can apply these to our data/ML pipelines.
- Top KDnuggets tweets, Nov 06-12: 10 FREE must-read ebooks on AI. Things just keep getting more interesting in the field, so use these resources to stay up to speed. - Nov 13, 2019.
Also: It's time to make your Data Science LinkedIn profile ready for recruiters.; Python Libraries for Interpretable Machine Learning - KDnuggets; Process your data with Pandas up to 4x faster with this new Python library.; How to Extract Google Maps Coordinates
- Python Workout / Practices of a Python Pro / Classic Computer Science Problems in Python - Nov 13, 2019.
Whether you’re a beginner or an expert, there’s always new ways you can improve your Python coding. Save 40% off this trio of Manning Python books today! Just enter the code nlpropython40 at checkout when you buy from manning.com.
- Transfer Learning Made Easy: Coding a Powerful Technique - Nov 13, 2019.
While the revolution of deep learning now impacts our daily lives, these networks are expensive. Approaches in transfer learning promise to ease this burden by enabling the re-use of trained models -- and this hands-on tutorial will walk you through a transfer learning technique you can run on your laptop.
- Beginners Guide to the Three Types of Machine Learning - Nov 13, 2019.
The following article is an introduction to classification and regression — which are known as supervised learning — and unsupervised learning — which in the context of machine learning applications often refers to clustering — and will include a walkthrough in the popular python library scikit-learn.
- How I Got Better at Machine Learning - Nov 13, 2019.
Check out this author's collection of tips and tricks that I learned over the years to get better at Machine Learning.
-
How to Speed up Pandas by 4x with one line of code, by George Seif - Nov 12, 2019.
While Pandas is the library for data processing in Python, it isn't really built for speed. Learn more about the new library, Modin, developed to distribute Pandas' computation to speedup your data prep. - Understanding NLP and Topic Modeling Part 1 - Nov 12, 2019.
In this post, we seek to understand why topic modeling is important and how it helps us as data scientists.
- MLOps for production-level machine learning [Nov 14 Webinar] - Nov 12, 2019.
This live webinar, Nov 14 @ 12pm EST, on MLOps for production-level machine learning, will detail MLOps, a compound of “machine learning” and “operations”, a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Register now.
- Research Guide for Depth Estimation with Deep Learning - Nov 12, 2019.
In this guide, we’ll look at papers aimed at solving the problems of depth estimation using deep learning.
- How to Extract Google Maps Coordinates - Nov 11, 2019.
In this article, I will show you how to quickly extract Google Maps coordinates with a simple and easy method.
-
The Complete Data Science LinkedIn Profile Guide - Nov 11, 2019.
With so many Data Scientists showing up on LinkedIn, it's time to make sure your profile is top-notch because your talent is still highly sought after. Recruitment specialists want to find you fast, and this guide will help you create the best profile to feature your expertise. - Top Stories, Nov 4-10: 10 Free Must-read Books on AI - Nov 11, 2019.
Also: Understanding Boxplots; Probability Learning: Maximum Likelihood; Designing Your Neural Networks; Facebook Has Been Quietly Open Sourcing Some Amazing Deep Learning Capabilities for PyTorch; 5 Statistical Traps Data Scientists Should Avoid
- How Data Analytics Can Assist in Fraud Detection - Nov 11, 2019.
A primary advantage of data analytics tools is that they can handle massive quantities of information at once. These solutions typically learn what's normal within a collection of information and how to spot anomalies.
- Facebook Adds This New Framework to It’s Reinforcement Learning Arsenal - Nov 11, 2019.
ReAgent is a new framework that streamlines the implementation of reasoning systems.
- Top October Stories: How to Become a (Good) Data Scientist; Everything a Data Scientist Should Know About Data Management; The Last SQL Guide for Data Analysis - Nov 8, 2019.
Also A European Approach to Master's Degrees in Data Science; How YouTube is Recommending Your Next Video
- What is Data Science? - Nov 8, 2019.
Data Science is pitched as a modern and exciting job offering high satisfaction. Does its reality really live up to the hype? Here, we show what it's really like to work as a Data Scientist.
-
Understanding Boxplots - Nov 8, 2019.
A boxplot. It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed. - Orchestrating Dynamic Reports in Python and R with Rmd Files - Nov 8, 2019.
Do you want to extract csv files with Python and visualize them in R? How does preparing everything in R and make conclusions with Python sound? Both are possible if you know the right libraries and techniques. Here, we’ll walk through a use-case using both languages in one analysis
- 3 Reasons to attend Data Natives, 25-26 November, Berlin - Nov 8, 2019.
Data Natives is an outstanding conference that lets you meet many talented Data Scientists and Data Professionals. Find your dream company or your dream employee and level up for 2020. Use code DN19_KDNuggets_50 to save.
- Monitoring Models at Scale - Nov 7, 2019.
Catch this Domino webinar on monitoring models at scale, Dec 11 @ 10am PT, covering detecting changes in pattern of real-world data your models are seeing in production, tracking how model accuracy and other quality metrics are changing over time, and getting alerted when health checks fail so that resolution workflows can be triggered.
- Data Cleaning and Preprocessing for Beginners - Nov 7, 2019.
Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.
- Set Operations Applied to Pandas DataFrames - Nov 7, 2019.
In this tutorial, we show how to apply mathematical set operations (union, intersection, and difference) to Pandas DataFrames with the goal of easing the task of comparing the rows of two datasets.
- How to Create a Vocabulary for NLP Tasks in Python - Nov 7, 2019.
This post will walkthrough a Python implementation of a vocabulary class for storing processed text data and related metadata in a manner useful for subsequently performing NLP tasks.
- Top KDnuggets tweets, Oct 30 – Nov 05: Everything a Data Scientist Should Know About Data Management - Nov 6, 2019.
Which Data Science Skills are core and which are hot/emerging ones?; The 4 Quadrants of Data Science Skills and 7 Principles for Creating a Viral DataViz; Microsoft open sources #SandDance, a visual data exploration tool.
- Meet Neebo: The Virtual Analytics Hub - Nov 6, 2019.
Neebo is a SaaS solution that enables analytics teams to connect to, find, combine and collaborate on trusted data assets in hybrid cloud landscapes, and provides a unified access point where they can more effectively leverage all their analytics assets and knowledge. In this blog, we will highlight some of the features of Neebo and how they can completely transform the way analytics teams operate.
- An Eight-Step Checklist for An Analytics Project - Nov 6, 2019.
Follow these eight headings of an audit sheet that business analysts should address before submitting the results of their analytics project. One recommended approach is to rewrite each step as a question, answer it, and then attach it to your project.
- Research Guide: Advanced Loss Functions for Machine Learning Models - Nov 6, 2019.
This guide explores research centered on a variety of advanced loss functions for machine learning models.
- The Last Defense Against Another AI Winter - Nov 6, 2019.
My short answer is this: Yes, another AI Winter will be here if you don’t deploy more ML solutions. You and your Data Science teams are the last line of defense against the AI Winter. You need to solve five key challenges to keep the momentum up.
-
10 Free Must-read Books on AI - Nov 5, 2019.
Artificial Intelligence continues to fill the media headlines while scientists and engineers rapidly expand its capabilities and applications. With such explosive growth in the field, there is a great deal to learn. Dive into these 10 free books that are must-reads to support your AI study and work. - Probability Learning: Maximum Likelihood - Nov 5, 2019.
The maths behind Bayes will be better understood if we first cover the theory and maths underlying another fundamental method of probabilistic machine learning: Maximum Likelihood. This post will be dedicated to explaining it.
- How to Become a Successful Healthcare Data Analyst - Nov 5, 2019.
Are you interested in starting your career in the data analysis domain? Read this informative blog on how to get your career off the ground.
- Top Stories, Oct 28 – Nov 3: 5 Statistical Traps Data Scientists Should Avoid; Top Machine Learning Software Tools for Developers - Nov 4, 2019.
Also: Why is Machine Learning Deployment Hard?; Data Sources 101; 5 Statistical Traps Data Scientists Should Avoid; Everything a Data Scientist Should Know About Data Management; How to Become a (Good) Data Scientist — Beginner Guide
- Practical Computer Vision Course with Real-Life Cases, Nov 18, Washington, DC - Nov 4, 2019.
This course, Practical Computer Vision Course with Real-Life Cases, Nov 18 in Washington, DC, will move you on the next step, providing you with practical means of solving business-specific tasks.Reserve your seat now.
- Designing Your Neural Networks - Nov 4, 2019.
Check out this step-by-step walk through of some of the more confusing aspects of neural nets to guide you to making smart decisions about your neural network architecture.
- Customer Segmentation Using K Means Clustering - Nov 4, 2019.
Customer Segmentation can be a powerful means to identify unsatisfied customer needs. This technique can be used by companies to outperform the competition by developing uniquely appealing products and services.
-
Facebook Has Been Quietly Open Sourcing Some Amazing Deep Learning Capabilities for PyTorch - Nov 4, 2019.
The new release of PyTorch includes some impressive open source projects for deep learning researchers and developers. -
Top Machine Learning Software Tools for Developers - Nov 1, 2019.
As a developer who is excited about leveraging machine learning for faster and more effective development, these software tools are worth trying out. - Build an Artificial Neural Network From Scratch: Part 1 - Nov 1, 2019.
This article focused on building an Artificial Neural Network using the Numpy Python library.
- MLOps for production-level machine learning - Nov 1, 2019.
This live webinar, Nov 14 @ 12pm EST, on MLOps for production-level machine learning, will detail MLOps, a compound of “machine learning” and “operations”, a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Register now.
- What is Machine Learning on Code? - Nov 1, 2019.
Not only can MLonCode help companies streamline their codebase and software delivery processes, but it also helps organizations better understand and manage their engineering talents.