2019 Jul
All (94) | Meetings (4) | News (8) | Opinions (25) | Top Stories, Tweets (11) | Tutorials, Overviews (44) | Webcasts & Webinars (2)
- Top KDnuggets tweets, Jul 24-30: Nothing but NumPy: Understanding and Creating Neural Nets w. Computational Graphs from Scratch; How Netflix works - Jul 31, 2019.
How Netflix works: the (hugely simplified) complex stuff that happens every time; Top Certificates and Certifications in Analytics, Data Science, ML; Nothing but NumPy: Understanding &Creating Neural Networks with Computation.
- Are We Ready to Partner With Machines?
Data Science Salon Miami, September 10-11 - Jul 31, 2019.When it comes to AI, there’s plenty of talk of the future of machines. But it’s the people behind AI development who have the insights needed to shape that future. Register now to catch all of our speakers at the Data Science Salon Miami, Sep 10-11, 2019. - Can we trust AutoML to go on full autopilot? - Jul 31, 2019.
We put an AutoML tool to the test on a real-world problem, and the results are surprising. Even with automatic machine learning, you still need expert data scientists.
- Five Command Line Tools for Data Science - Jul 31, 2019.
You can do more data science than you think from the terminal.
- Ten more random useful things in R you may not know about - Jul 31, 2019.
I had a feeling that R has developed as a language to such a degree that many of us are using it now in completely different ways. This means that there are likely to be numerous tricks, packages, functions, etc that each of us use, but that others are completely unaware of, and would find useful if they knew about them.
- A Data Science Playbook for explainable ML/xAI - Jul 30, 2019.
This technical webinar on Aug 14 discusses traditional and modern approaches for interpreting black box models. Additionally, we will review cutting edge research coming out of UCSF, CMU, and industry.
- Understanding Tensor Processing Units - Jul 30, 2019.
The Tensor Processing Unit (TPU) is Google's custom tool to accelerate machine learning workloads using the TensorFlow framework. Learn more about what TPUs do and how they can work for you.
- P-values Explained By Data Scientist - Jul 30, 2019.
This article is designed to give you a full picture from constructing a hypothesis testing to understanding p-value and using that to guide our decision making process.
- Here’s how you can accelerate your Data Science on GPU - Jul 30, 2019.
Data Scientists need computing power. Whether you’re processing a big dataset with Pandas or running some computation on a massive matrix with Numpy, you’ll need a powerful machine to get the job done in a reasonable amount of time.
- Exploring Python Basics. - Jul 29, 2019.
This free ebook is a great resource for data science beginners, providing a good introduction into Python, coding with Raspberry Pi, and using Python to building predictive models.
-
Top 10 Best Podcasts on AI, Analytics, Data Science, Machine Learning - Jul 29, 2019.
Check out our latest Top 10 Most Popular Data Science and Machine Learning podcasts available on iTunes. Stay up to date in the field with these recent episodes and join in with the current data conversations. - 7 Tips for Dealing With Small Data - Jul 29, 2019.
At my workplace, we produce a lot of functional prototypes for our clients. Because of this, I often need to make Small Data go a long way. In this article, I’ll share 7 tips to improve your results when prototyping with small datasets.
- Decentralized and Collaborative AI: How Microsoft Research is Using Blockchains to Build More Transparent Machine Learning Models - Jul 29, 2019.
Recently, AI researchers from Microsoft open sourced the Decentralized & Collaborative AI on Blockchain project that enables the implementation of decentralized machine learning models based on blockchain technologies.
- Top Stories, Jul 22-28: Top 13 Skills To Become a Rockstar Data Scientist; This New Google Technique Help Us Understand How Neural Networks are Thinking - Jul 29, 2019.
Also: Convolutional Neural Networks: A Python Tutorial Using TensorFlow and Keras; Fantastic Four of Data Science Project Preparation; The Death of Big Data and the Emergence of the Multi-Cloud Era; The title CDO started out as a joke
-
Convolutional Neural Networks: A Python Tutorial Using TensorFlow and Keras - Jul 26, 2019.
Different neural network architectures excel in different tasks. This particular article focuses on crafting convolutional neural networks in Python using TensorFlow and Keras. -
Top 13 Skills To Become a Rockstar Data Scientist - Jul 26, 2019.
Education, coding, SQL, big data platforms, storytelling and more. These are the 13 skills you need to master to become a rockstar data scientist. -
Fantastic Four of Data Science Project Preparation - Jul 26, 2019.
This article takes a closer look at the four fantastic things we should keep in mind when approaching every new data science project. - 50% ends Friday – Research Frontiers, AI Kick-start, BootCamp, and Career Expo - Jul 25, 2019.
ODSC focuses on research at its conferences and invites the experts pushing the boundaries of AI to speak. Between the two upcoming conferences, researchers from more than 20 of the top research institutes in the country (Open AI, NASA’s JPL, Google, MIT CSAIL, BAIR, The Turing Institute, and Max Planck and more) will deliver talks and lead trainings at ODSC West 2019.
- High-Quality AI And Machine Learning Data Labeling At Scale: A Brief Research Report - Jul 25, 2019.
Analyst firm Cognilytica estimates that as much as 80% of machine learning project time is spent on aggregating, cleaning, labeling, and augmenting machine learning model data. So, how do innovative machine learning teams prepare data in such a way that they can trust its quality, cost of preparation, and the speed with which it’s delivered?
- A Gentle Introduction to Noise Contrastive Estimation - Jul 25, 2019.
Find out how to use randomness to learn your data by using Noise Contrastive Estimation with this guide that works through the particulars of its implementation.
-
Top Certificates and Certifications in Analytics, Data Science, Machine Learning and AI - Jul 25, 2019.
Here are the top certificates and certifications in Analytics, AI, Data Science, Machine Learning and related areas. - Is SQL needed to be a data scientist? - Jul 25, 2019.
As long as there is ‘data’ in data scientist, Structured Query Language (or see-quel as we call it) will remain an important part of it. In this blog, let us explore data science and its relationship with SQL.
- Top KDnuggets tweets, Jul 17-23: Papers with Code: A Fantastic GitHub Resource for Machine Learning - Jul 24, 2019.
Also: Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS; The Hundred-Page Machine Learning Book Book Review; The Evolution of a ggplot; Notes on Feature Preprocessing: The What, the Why, and the How
- How to Share Data Science Secrets Without Sacrificing Security - Jul 24, 2019.
Learn how to incorporate security into your practices without slowing down your project. Read this ActiveState blog post to learn more.
- Neural Code Search: How Facebook Uses Neural Networks to Help Developers Search for Code Snippets - Jul 24, 2019.
Developers are always searching for answers to questions about their code. But how do they ask the right questions? Facebook is creating new NLP neural networks to help search code repositories that may advance information retrieval algorithms.
-
This New Google Technique Help Us Understand How Neural Networks are Thinking - Jul 24, 2019.
Recently, researchers from the Google Brain team published a paper proposing a new method called Concept Activation Vectors (CAVs) that takes a new angle to the interpretability of deep learning models. - Easy, One-Click Jupyter Notebooks - Jul 24, 2019.
All of the setup for software, networking, security, and libraries is automatically taken care of by the Saturn Cloud system. Data Scientists can then focus on the actual Data Science and not the tedious infrastructure work that falls around it
- 12 Things I Learned During My First Year as a Machine Learning Engineer - Jul 23, 2019.
Learn about the day-in-the-life of one machine learning engineer and the important lessons learned for being successful in that role.
- Kaggle Kernels Guide for Beginners: A Step by Step Tutorial - Jul 23, 2019.
This is an attempt to hold the hands of a complete beginner and walk them through the world of Kaggle Kernels — for them to get started.
- Is Bias in Machine Learning all Bad? - Jul 23, 2019.
We have been taught over our years of predictive model building that bias will harm our model. Bias control needs to be in the hands of someone who can differentiate between the right kind and wrong kind of bias.
- The title CDO started out as a joke - Jul 22, 2019.
How did the role of Chief Data Officer come to drive data literacy at companies around the world? Find out how it all began in this interview with the first who held the title at Yahoo!
- What’s the Best Data Strategy for Enterprises: Build, buy, partner or acquire? - Jul 22, 2019.
Every large organization is investing heavily in building data solutions and tools. They are building data solutions from scratch when they could be taking advantage of readily available tools and solutions. Many organizations are re-inventing the wheel and wasting resources.
- Things I Learned From the SciPy 2019 Lightning Talks - Jul 22, 2019.
This post summarizes the interesting aspects of the Day One of the SciPy 2019 lightning talks, a flash round of a dozen ~3 minute talks covering a wide variety of topics.
- Top Stories, Jul 15-21: The Death of Big Data and the Emergence of the Multi-Cloud Era; Bayesian deep learning and near-term quantum computers - Jul 22, 2019.
Also: Dealing with categorical features in machine learning; Computer Vision for Beginners: Part 1; Big Data for Insurance; A Summary of DeepMind's Protein Folding Upset at CASP13; The Hackathon Guide for Aspiring Data Scientists
- From Data Pre-processing to Optimizing a Regression Model Performance - Jul 19, 2019.
All you need to know about data pre-processing, and how to build and optimize a regression model using Backward Elimination method in Python.
-
Bayesian deep learning and near-term quantum computers: A cautionary tale in quantum machine learning - Jul 19, 2019.
This blog post is an overview of quantum machine learning written by the author of the paper Bayesian deep learning on a quantum computer. In it, we explore the application of machine learning in the quantum computing space. The authors of this paper hope that the results of the experiment help influence the future development of quantum machine learning. - Rethinking Mentoring In Data Science - Jul 19, 2019.
In recent years, I have heard the conversation of “find a mentor, you need a mentor to advance your career.” I received numerous requests from readers around the world to be their mentor. These requests encouraged me to think more closely about mentorship and the general expectations in the data science community.
- The Evolution of a ggplot - Jul 18, 2019.
A step-by-step tutorial showing how to turn a default ggplot into an appealing and easily understandable data visualization in R.
- Big Data for Insurance - Jul 18, 2019.
The insurance industry has always been quite conservative; however, the adoption of new technologies is not just a modern trend but a necessity to maintain the competitive pace. In the modern digital era, Big Data technologies help to process vast amounts of information, increase workflow efficiency, and reduce operational costs. Learn more about the benefits of Big Data for insurance from our material.
- Adapters: A Compact and Extensible Transfer Learning Method for NLP - Jul 18, 2019.
Adapters obtain comparable results to BERT on several NLP tasks while achieving parameter efficiency.
- Top KDnuggets tweets, Jul 10-16: Intuitive Visualization of Outlier Detection Methods; What’s wrong with the approach to Data Science? - Jul 17, 2019.
What's wrong with the approach to Data Science?; Intuitive Visualization of Outlier Detection Methods; The Death of Big Data and the Emergence of the Multi-Cloud Era
- Online Workshop: How to set up Kubernetes for all your machine learning workflows - Jul 17, 2019.
Join this free live online workshop, Jul 31 @12 PM ET, to learn how to set up your Kubernetes cluster, so you can run Spark, TensorFlow, and any ML framework instantly, touching on the entire machine learning pipeline from model training to model deployment.
-
A Summary of DeepMind’s Protein Folding Upset at CASP13 - Jul 17, 2019.
Learn how DeepMind dominated the last CASP competition for advancing protein folding models. Their approach using gradient descent is today's state of the art for predicting the 3D structure of a protein knowing only its comprising amino acid compounds. - How to Make Stunning 3D Plots for Better Storytelling - Jul 17, 2019.
3D Plots built in the right way for the right purpose are always stunning. In this article, we’ll see how to make stunning 3D plots with R using ggplot2 and rayshader.
- Computer Vision for Beginners: Part 1 - Jul 17, 2019.
Image processing is performing some operations on images to get an intended manipulation. Think about what we do when we start a new data analysis. We do some data preprocessing and feature engineering. It’s the same with image processing.
- Demystifying Data Science: Free Online Conference July 30-31 - Jul 16, 2019.
On Jul 30-31, join 22 speakers giving 16 talks and 6 workshops during Demystifying Data Science, a FREE two-day live online conference hosted by Metis, a leader in data science education.
- How to Build Disruptive Data Science Teams: 10 Best Practices - Jul 16, 2019.
Building a data science team from the ground up isn't easy. This strategic roadmap will help hiring managers with tactical advice and how to properly support a data science team once established.
- Things I Have Learned About Data Science - Jul 16, 2019.
Read this collection of 38 things the author has learned along his travels, and has opted to share for the benefit of the reader.
-
Dealing with categorical features in machine learning - Jul 16, 2019.
Many machine learning algorithms require that their input is numerical and therefore categorical features must be transformed into numerical features before we can use any of these algorithms. - First hand experience from Uber, Microsoft, Delivery Hero & more – at PAW London - Jul 15, 2019.
Top practitioners are coming to PAW London, 16-17 Oct 2019, to describe the design, deployment and business impact of their machine learning projects. Register using code KDNUGGETS for a 15% discount on your Predictive Analytics World ticket.
- Scaling a Massive State-of-the-art Deep Learning Model in Production - Jul 15, 2019.
A new NLP text writing app based on OpenAI's GPT-2 aims to write with you -- whenever you ask. Find out how the developers setup and deployed their model into production from an engineer working on the team.
- Secrets to a Successful Data Science Interview - Jul 15, 2019.
Are you puzzled as to what to prepare for data science interviews? That you are reading this document is a reflection of your seriousness in being a successful data scientist.
-
The Hackathon Guide for Aspiring Data Scientists - Jul 15, 2019.
This article is an overview of how to prepare for a hackathon as an aspiring data scientist, highlighting the 4 reasons why you should take part in one, along with a series of tips for participation. - Top Stories, Jul 8-14: The Death of Big Data and the Emergence of the Multi-Cloud Era; Training a Neural Network to Write Like Lovecraft - Jul 15, 2019.
Also: What's wrong with the approach to Data Science?; Introducing Gen: MITs New Language That Wants to be the TensorFlow of Programmable Inference; Top 10 Data Science Leaders You Should Follow; 5 Probability Distributions Every Data Scientist Should Know; XGBoost and Random Forest with Bayesian Optimisation
-
Introducing Gen: MIT’s New Language That Wants to be the TensorFlow of Programmable Inference - Jul 12, 2019.
Researchers from MIT recently unveiled a new probabilistic programming language named Gen, a language which allow researchers to write models and algorithms from multiple fields where AI techniques are applied without having to deal with equations or manually write high-performance code. - Pre-training, Transformers, and Bi-directionality - Jul 12, 2019.
Bidirectional Encoder Representations from Transformers BERT (Devlin et al., 2018) is a language representation model that combines the power of pre-training with the bi-directionality of the Transformer’s encoder (Vaswani et al., 2017). BERT improves the state-of-the-art performance on a wide array of downstream NLP tasks with minimal additional task-specific training.
- Top June Stories: 5 Useful Statistics Data Scientists Need to Know; 7 Steps to Mastering Intermediate Machine Learning with Python – 2019 Edition - Jul 12, 2019.
Also: Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS; If you're a developer transitioning into data science, here are your best resources.
-
Top 10 Data Science Leaders You Should Follow - Jul 12, 2019.
If you’re in the data science field, I strongly encourage you to follow these giants— which I’ll list down in the section below — and be a part of our data science community to learn from the best and share your experience and knowledge. - Do more data science, do less ops with self-service infrastructure & tools - Jul 11, 2019.
Do more data science, do less ops with self-service infrastructure & tools!
-
The Death of Big Data and the Emergence of the Multi-Cloud Era - Jul 11, 2019.
The Era of Big Data is coming to an end as the focus shifts from how we collect data to processing that data in real-time. Big Data is now a business asset supporting the next eras of multi-cloud support, machine learning, and real-time analytics. -
Training a Neural Network to Write Like Lovecraft - Jul 11, 2019.
In this post, the author attempts to train a neural network to generate Lovecraft-esque prose, known to be awkward and irregular at best. Did it end in success? If not, any suggestions on how it might have? Read on to find out. - 10 Simple Hacks to Speed up Your Data Analysis in Python - Jul 11, 2019.
This article lists some curated tips for working with Python and Jupyter Notebooks, covering topics such as easily profiling data, formatting code and output, debugging, and more. Hopefully you can find something useful within.
- Top KDnuggets tweets, Jul 03-09: How to choose a visualization; Data Science Jobs Report 2019 - Jul 10, 2019.
Also: How do you check the quality of your regression model in Python?; NLP vs. NLU: from Understanding a Language to Its Processing; 5 Probability Distributions Every Data Scientist Should Know; 10 More Free Must-Read Books for Machine Learning and Data Science
- How to Learn Python without First Needing to Learn Python - Jul 10, 2019.
Learn how data scientists and anyone coding with Python can set up a made-to-order runtime in minutes - not days. Read the 3-minute blog post.
- How to Showcase the Impact of Your Data Science Work - Jul 10, 2019.
You're a Data Scientist -- or preparing to land your first job -- and communicating your work to others, especially employers, so they understand your impact is essential. These five tips will help you help others appreciate your data science.
- A Gentle Guide to Starting Your NLP Project with AllenNLP - Jul 10, 2019.
For those who aren’t familiar with AllenNLP, I will give a brief overview of the library and let you know the advantages of integrating it to your project.
-
What’s wrong with the approach to Data Science? - Jul 10, 2019.
The job ‘Data Scientist’ has been around for decades, it was just not called “Data Scientist”. Statisticians have used their knowledge and skills using machine learning techniques such as Logistic Regression and Random Forest for prediction and insights for longer than people actually realize. - Math for Machine Learning - Jul 9, 2019.
This ebook explains the math involved and introduces you directly to the foundational topics in machine learning.
-
Why you’re not a job-ready data scientist (yet) - Jul 9, 2019.
Trying to snag a dream Data Science job, but can't seem to land one? Check out these four skills that companies really want and be prepared for your next interview. - Practical Speech Recognition with Python: The Basics - Jul 9, 2019.
Do you fear implementing speech recognition in your Python apps? Read this tutorial for a simple approach to getting practical with speech recognition using open source Python libraries.
- Annotated Heatmaps of a Correlation Matrix in 5 Simple Steps - Jul 9, 2019.
A heatmap is a graphical representation of data in which data values are represented as colors. That is, it uses color in order to communicate a value to the reader. This is a great tool to assist the audience towards the areas that matter the most when you have a large volume of data.
- Collaborative Evolutionary Reinforcement Learning - Jul 8, 2019.
Intel Researchers created a new approach to RL via Collaborative Evolutionary Reinforcement Learning (CERL) that combines policy gradient and evolution methods to optimize, exploit, and explore challenges.
- XGBoost and Random Forest® with Bayesian Optimisation - Jul 8, 2019.
This article will explain how to use XGBoost and Random Forest with Bayesian Optimisation, and will discuss the main pros and cons of these methods.
- Top Stories, Jul 1-7: 5 Probability Distributions Every Data Scientist Should Know; NLP vs. NLU: from Understanding a Language to Its Processing - Jul 8, 2019.
Also: XLNet Outperforms BERT on Several NLP Tasks; How do you check the quality of your regression model in Python?; How Data Science Is Used Within the Film Industry; Whats the Machine Learning Engineering Job Like; 7 Steps to Mastering Data Preparation for Machine Learning with Python — 2019 Edition
- DSGO19 Announces Speakers and Training Sessions - Jul 8, 2019.
Tickets now on sale for DataScienceGO, the #1 career-focused data science conference, coming to San Diego, CA, Sep 27-29. Use promo code KD-Nuggets for 20% off.
- Classifying Heart Disease Using K-Nearest Neighbors - Jul 8, 2019.
I have written this post for the developers and assumes no background in statistics or mathematics. The focus is mainly on how the k-NN algorithm works and how to use it for predictive modeling problems.
-
How Data Science Is Used Within the Film Industry - Jul 5, 2019.
As Data Science is becoming pervasive across so many industries, Hollywood is certainly not being left behind. Learn about how Big Data, analytics, and AI are now core drivers of the movies we watch and how we watch them. -
State of AI Report 2019 - Jul 5, 2019.
This year's "State of AI Report" has been released. Read it to find out about the latest in AI research, talent, industry, and politics form the past 12 months. - Top 8 Data Science Use Cases in Construction - Jul 5, 2019.
This article considers several of the most efficient and productive data science use cases in the construction industry.
- Top KDnuggets tweets, Jun 26 – Jul 02: An End-to-End Project on Time Series Analysis and Forecasting with #Python; The biggest mistake while learning #Python for #datascience - Jul 4, 2019.
Approaches to Text Summarization: An Overview; How to Learn Python for Data Science the Right Way; The biggest mistake you can make while learning #Python for #datascience; Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly,
- Cartoon: AI + Self-Driving + BBQ = ? - Jul 4, 2019.
KDnuggets Cartoon looks at what happens when AI and self-driving technology collide with the traditional summer pastime of grilling.
-
5 Probability Distributions Every Data Scientist Should Know - Jul 4, 2019.
Having an understanding of probability distributions should be a priority for data scientists. Make sure you know what you should by reviewing this post on the subject. -
NLP vs. NLU: from Understanding a Language to Its Processing - Jul 3, 2019.
As AI progresses and the technology becomes more sophisticated, we expect existing techniques to evolve. With these changes, will the well-founded natural language processing give way to natural language understanding? Or, are the two concepts subtly distinct to hold their own niche in AI? - Building a Recommender System, Part 2 - Jul 3, 2019.
This post explores an technique for collaborative filtering which uses latent factor models, a which naturally generalizes to deep learning approaches. Our approach will be implemented using Tensorflow and Keras.
- Nvidia’s New Data Science Workstation — a Review and Benchmark - Jul 3, 2019.
Nvidia has recently released their Data Science Workstation, a PC that puts together all the Data Science hardware and software into one nice package. The workstation is a total powerhouse machine, packed with all the computing power — and software — that’s great for plowing through data.
- Build your own AutoML computer vision pipeline, July 16 webinar - Jul 2, 2019.
This webinar will present a step-by-step use case so you can build your own AutoML computer vision pipelines, and will go through the essentials for research, deployment and training using Keras, PyTorch and TensorFlow.
- Examining the Transformer Architecture – Part 2: A Brief Description of How Transformers Work - Jul 2, 2019.
As The Transformer may become the new NLP standard, this review explores its architecture along with a comparison to existing approaches by RNN.
- 4 Most Popular Alternative Data Sources Explained - Jul 2, 2019.
Alternative data is the new game changer. To start with alternative data, people might even wonder from where you can get hold of alternative data that can give such a competitive advantage. This post details 4 alternative data sources that you can exploit to the fullest.
- Seven Key Dimensions to Help You Understand Artificial Intelligence Environments - Jul 2, 2019.
Understanding an AI environment is an incredibly complex task but there are several key dimensions that provide clarity on that reasoning.
- How do you check the quality of your regression model in Python? - Jul 2, 2019.
Linear regression is rooted strongly in the field of statistical learning and therefore the model must be checked for the ‘goodness of fit’. This article shows you the essential steps of this task in a Python ecosystem.
-
What’s the Machine Learning Engineering Job Like - Jul 1, 2019.
As a relatively new position, the day in the life of a machine learning engineer or data scientist is still a bit fluid. Find out what is like from people working today at Airbnb, SurveyMonkey, and Instagram. - A Data Scientist’s Path to Understanding Market Simulation - Jul 1, 2019.
Made possible by recent advances in computing power and machine learning, market simulation employs agent-based modeling, behavioral science and network science to recreate the complex dynamics and rules of how a population of people in a given market behave, influence each other and make decisions.
-
XLNet Outperforms BERT on Several NLP Tasks - Jul 1, 2019.
XLNet is a new pretraining method for NLP that achieves state-of-the-art results on several NLP tasks. - Top Stories, Jun 24-30: Understanding Cloud Data Services; 7 Steps to Mastering Data Preparation for Machine Learning with Python — 2019 Edition - Jul 1, 2019.
Also: How To Get Funding For AI Startups; Optimization with Python: How to make the most amount of money with the least amount of risk?; 5 Useful Statistics Data Scientists Need to Know; Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS