2021 Apr

All (56) | Opinions (13) | Products, Services (4) | Top Stories (1) | Tutorials, Overviews (38)

Gradient Boosted Decision Trees – A Conceptual Explanation

Gradient boosted decision trees involves implementing several models and aggregating their results. These boosted models have become popular thanks to their performance in machine learning competitions on Kaggle. In this article, we’ll see what gradient boosted decision trees are all about.

By Derrick Mwiti on Apr 30, 2021 in CatBoost, Decision Trees, Gradient Boosting, Machine Learning, Python, scikit-learn, XGBoost
FluDemic – using AI and Machine Learning to get ahead of disease

We are amidst a healthcare data explosion. AI/ML will be more vital than ever in the prevention and handling of future pandemics. Here, we walk you through the different facets of modeling infectious diseases, focusing on influenza and COVID-19.

By DataDriven Health on Apr 30, 2021 in AI, COVID-19, Healthcare, Machine Learning
Learn Neural Networks for Natural Language Processing Now

Still haven't come across enough quality contemporary natural language processing resources? Here is yet another freely-accessible offering from a top-notch university that might help quench your thirst for learning materials.

By Matthew Mayo on Apr 30, 2021 in CMU, Courses, Neural Networks, NLP
Feature Engineering of DateTime Variables for Data Science, Machine Learning

Learn how to make more meaningful features from DateTime type variables to be used by Machine Learning Models.

By Samarth Agrawal on Apr 29, 2021 in Data Science, Feature Engineering, Machine Learning, Python
Introducing The NLP Index

The NLP Index is a brand new resource for NLP code discovery, combining and indexing more than 3,000 paper and code pairs at launch. If you are interested in NLP research and locating the code and papers needed to understand an implement the latest research, you should check it out.

By Matthew Mayo on Apr 29, 2021 in Datasets, NLP, Research
How to Build an Impressive Data Science Resume

Every one of us needs a resume to showcase our skills and experience but how much effort are we putting into it to make it impactful. It is undeniable that resumes play a key role in our job application process. This article will explore some simple strategies to significantly improve the presentation as well as the content of data science resumes.

By Sharan Kumar Ravindran on Apr 28, 2021 in Career Advice, Data Science, Resume
Best Podcasts for Machine Learning

Podcasts, especially those featuring interviews, are great for learning about the subfields and tools of AI, as well as the rock stars and superheroes of the AI world. Here, we highlight some of the best podcasts today that are perfect for both those learning about machine learning and seasoned practitioners.

By Ritobrata Ghosh on Apr 28, 2021 in AI, Data Science, Machine Learning, Podcast
Using Data Science to Predict and Prevent Real World Problems

Do you have an interest in data science but lack an understanding of what, exactly, it can be used to accomplish in the real world? Read this article for a few examples of just how helpful data science can be for predicting and preventing real world problems.

By Devin Partida on Apr 28, 2021 in Data Science, Prediction
Why You Should Consider Being a Data Engineer Instead of a Data Scientist

A new king of the jungle has emerged.

By Terence Shin on Apr 27, 2021 in Career Advice, Data Engineer, Data Engineering, Data Science, Data Scientist
Multiple Time Series Forecasting with PyCaret

A step-by-step tutorial to forecast multiple time series with PyCaret.

By Moez Ali on Apr 27, 2021 in Forecasting, Machine Learning, PyCaret, Python, Time Series
Getting Started with Reinforcement Learning

Demystifying some of the main concepts and terminologies associated with Reinforcement Learning and their association with other fields of AI.

By Pier Paolo Ippolito on Apr 26, 2021 in AI, Beginners, Reinforcement Learning
Top 3 Challenges for Data & Analytics Leaders

The author shares the 3 top challenges faced as they led and established a data & analytics function, as well as ways in which these challenges were addressed. How have you solved the one challenge which has remained elusive to the author?

By Minoo Agarwal on Apr 26, 2021 in Analytics, Challenges, Data Analytics, Data Leadership
Data careers are NOT one-size fits all! Tips for uncovering your ideal role in the data space

Thriving as a data professional is about more than just making good money! It’s about FULFILLMENT & IMPACT. In this article, I will help you discover the BEST data role for you given your unique skill sets, personality & goals.

By Lillian Pierson on Apr 23, 2021 in Career Advice, Careers, Data Engineering, Data Science
Improving model performance through human participation

Certain industries, such as medicine and finance, are sensitive to false positives. Using human input in the model inference loop can increase the final precision and recall. Here, we describe how to incorporate human feedback at inference time, so that Machines + Humans = Higher Precision & Recall.

By Preetam Joshi on Apr 23, 2021 in Data Science Platform, Humans, Machine Learning, Model Performance, Precision, Recall
Data Science Books You Should Start Reading in 2021

Check out this curated list of the best data science books for any level.

By Przemek Chojecki on Apr 23, 2021 in Books, Data Science, Data Scientist, Deep Learning, Machine Learning
The Three Edge Case Culprits: Bias, Variance, and Unpredictability

Edge cases occur for three basic reasons: Bias – the ML system is too ‘simple’; Variance – the ML system is too ‘inexperienced’; Unpredictability – the ML system operates in an environment full of surprises. How do we recognize these edge cases situations, and what can we do about them?

By iMerit on Apr 22, 2021 in Bias, iMerit, Machine Learning, Variance
What is Adversarial Neural Cryptography?

The novel approach combines GANs and cryptography in a single, powerful security method.

By Jesus Rodriguez on Apr 22, 2021 in Adversarial, AI, Cryptography, GANs, Security
How to ace A/B Testing Data Science Interviews

Understanding the process of A/B testing and knowing how to discuss this approach during data science job interviews can give you a leg up over other candidates. This mock interview provides a step-by-step guide through how to demonstrate your mastery of the key concepts and logical considerations.

By Preeti Semwal on Apr 22, 2021 in A/B Testing, Data Science, Interview Questions
Top 10 Must-Know Machine Learning Algorithms for Data Scientists – Part 1

New to data science? Interested in the must-know machine learning algorithms in the field? Check out the first part of our list and introductory descriptions of the top 10 algorithms for data scientists to know.

By Matthew Mayo on Apr 22, 2021 in Algorithms, Bagging, Data Science, Data Scientist, Decision Trees, Linear Regression, Machine Learning, SVM, Top 10
Production-Ready Machine Learning NLP API with FastAPI and spaCy

Learn how to implement an API based on FastAPI and spaCy for Named Entity Recognition (NER), and see why the author used FastAPI to quickly build a fast and robust machine learning API.

By Julien Salinas on Apr 21, 2021 in API, FastAPI, NLP, Production, Python, spaCy
10 Must-Know Statistical Concepts for Data Scientists

Statistics is a building block of data science. If you are working or plan to work in this field, then you will encounter the fundamental concepts reviewed for you here. Certainly, there is much more to learn in statistics, but once you understand these basics, then you can steadily build your way up to advanced topics.

By Soner Yildirim on Apr 21, 2021 in Bayes Theorem, Correlation, Normal Distribution, P-value, Sampling, Statistics, Variance
Time Series Forecasting with PyCaret Regression Module

PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. See how to use PyCaret's Regression Module for Time Series Forecasting.

By Moez Ali on Apr 21, 2021 in Machine Learning, PyCaret, Python, Regression, Time Series
Data Analysis Using Tableau

Read this overview of using Tableau for sale data analysis, and see how visualization can help tell the business story.

By Juhi Sharma on Apr 20, 2021 in Business, Data Analysis, Ecommerce, Python, Sales, Tableau
Data Science 101: Normalization, Standardization, and Regularization

Normalization, standardization, and regularization all sound similar. However, each plays a unique role in your data preparation and model building process, so you must know when and how to use these important procedures.

By Susan Sivek on Apr 20, 2021 in Data Preprocessing, Feature Engineering, Normalization, Regression, Regularization, Statistics
Want To Get Good At Time Series Forecasting? Predict The Weather

This article is designed to help the reader understand the components of a time series.

By Michael Grogan on Apr 20, 2021 in Forecasting, Prediction, Time Series, Weather
Build an Effective Data Analytics Team and Project Ecosystem for Success

Apply these techniques to create a data analytics program that delivers solutions that delight end-users and meet their needs.

By Randy Runtsch on Apr 19, 2021 in Analytics Team, Career Advice, Data Science Team, Excel, Programming, SQL, Success
How to organize your data science project in 2021

Maintaining proper organization of all your data science projects will increase your productivity, minimize errors, and increase your development efficiency. This tutorial will guide you through a framework on how to keep everything in order on your local machine and in the cloud.

By Benjamin Obi Tayo on Apr 19, 2021 in Advice, Data Science, GitHub, Project
Free From Stanford: Machine Learning with Graphs

Check out the freely-available Stanford course Machine Learning with Graphs, taught by Jure Leskovec, and see how a world renowned researcher teaches their topic of expertise. Accessible materials include slides, videos, and more.

By Matthew Mayo on Apr 19, 2021 in Courses, Free, Graphs, Jure Leskovec, Machine Learning, Stanford
Data Profession Job Satisfaction: Beware Of The Drop

Latest KDnuggets Poll results: The Job satisfaction has declined for ML Engineers, Data Scientists, and Data Analysts, but remained the same for Data Engineers, and Managers/Directors. Data Scientist job satisfaction has an alarming drop in mid-career. Finally, which regions have the highest and lowest job satisfactions?

By Gregory Piatetsky on Apr 16, 2021 in Career, Data Analyst, Data Scientist, Jobs, Machine Learning Engineer, Poll
What makes a song popular? Analyzing Top Songs on Spotify

With so many great (and not-so-great) songs out there, it can be hard to find those that match your musical preferences. Follow along this ML model building project to explore the extensive song data available on Spotify and design a recommendation engine that could help you discover your next favorite artist!

By Sunku Sowmya Sree on Apr 16, 2021 in Beatles, Data Analysis, Data Exploration, Feature Selection, Music, Spotify
6 Mistakes To Avoid While Training Your Machine Learning Model

While training the AI model, multi-stage activities are performed to utilize the training data in the best manner, so that outcomes are satisfying. So, here are the 6 common mistakes you need to understand to make sure your AI model is successful.

By Cogito Tech on Apr 15, 2021 in Computer Vision, Data Labeling, Machine Learning, Mistakes
Top 3 Statistical Paradoxes in Data Science

Observation bias and sub-group differences generate statistical paradoxes.

By Francesco Casalegno on Apr 15, 2021 in Bias, Data Science, Simpson's Paradox, Statistics
The Most In-Demand Skills for Data Scientists in 2021

If you are preparing to make a career as a Data Scientist or are looking for opportunities to skill-up in your current role, this analysis of in-demand skills for 2021, based on over 15,000 Data Scientist job postings, should offer you a good idea as to which programming languages and software tools are increasing and decreasing in importance.

By Terence Shin on Apr 15, 2021 in AWS, Data Science Skills, Python, PyTorch, R, scikit-learn, SQL, TensorFlow
ETL in the Cloud: Transforming Big Data Analytics with Data Warehouse Automation

Today, organizations are increasingly implementing cloud ETL tools to handle large data sets. With data sets becoming larger by the day, unified ETL tools have become crucial for data integration needs of enterprises.

By Nitin Kumar on Apr 15, 2021 in Automation, Big Data, Big Data Analytics, Cloud, Data Analytics, Data Warehouse, ETL
Continuous Training for Machine Learning – a Framework for a Successful Strategy

A basic appreciation by anyone who builds machine learning models is that the model is not useful without useful data. This doesn't change after a model is deployed to production. Effectively monitoring and retraining models with updated data is key to maintaining valuable ML solutions, and can be accomplished with effective approaches to production-level continuous training that is guided by the data.

By Or Itzary on Apr 14, 2021 in Machine Learning, MLOps, Model Performance, Production, Real-time, Training Data
Top March Stories: Are You Still Using Pandas to Process Big Data in 2021? Here are two better options; How To Overcome The Fear of Math and Learn Math For Data Science

Also: Top YouTube Channels for Data Science; More Data Science Cheatsheets; Top 10 Python Libraries Data Scientists should know in 2021.

By Gregory Piatetsky on Apr 13, 2021 in Top stories
7 Must-Haves in your Data Science CV

If you are looking for a new role as a Data Scientist -- either as a first job fresh out of school, a career change, or a shift to another organization -- then check off as many of these critical points as possible to stand out in the crowd and pass the hiring manager's initial CV screen.

By Elad Cohen on Apr 13, 2021 in Business, Career Advice, Data Scientist, Machine Learning
How to Apply Transformers to Any Length of Text

Read on to find how to restore the power of NLP for long sequences.

By James Briggs on Apr 12, 2021 in BERT, NLP, Python, Text Analytics, Transformer
Interpretable Machine Learning: The Free eBook

Interested in learning more about interpretability in machine learning? Check out this free eBook to learn about the basics, simple interpretable models, and strategies for interpreting more complex black box models.

By Matthew Mayo on Apr 9, 2021 in AI, Explainability, Explainable AI, Free ebook, Interpretability
Deep Learning Recommendation Models (DLRM): A Deep Dive

The currency in the 21st century is no longer just data. It's the attention of people. This deep dive article presents the architecture and deployment issues experienced with the deep learning recommendation model, DLRM, which was open-sourced by Facebook in March 2019.

By Nishant Kumar on Apr 9, 2021 in Deep Learning, Recommendations, Recommender Systems
Deepfakes are now mainstream. What’s next?

Deepfakes have become mainstream. Here we take a closer look at recent news about deepfakes, and what it all might mean for the future.

By Dan Abdinoor on Apr 9, 2021 in AI, Deepfakes, Video
A/B Testing: 7 Common Questions and Answers in Data Science Interviews, Part 2

In this second article in this series, we’ll continue to take an interview-driven approach by linking some of the most commonly asked interview questions to different components of A/B testing, including selecting ideas for testing, designing A/B tests, evaluating test results, and making ship or no ship decisions.

By Emma Ding on Apr 8, 2021 in A/B Testing, Career Advice, Data Science, Data Scientist, Interview Questions
Start a Career in a Growing Field with Google’s Data Analytics Professional Certificate

Google's recently launched Data Analytics Professional Certificate on Coursera is great for anyone, regardless of background or experience. The program is completely online, self-paced, and costs $39 per month. Interested in preparing for a new career in a high-growth field?

By Coursera on Apr 7, 2021 in Certificate, Coursera, Data Analytics, Google
E-commerce Data Analysis for Sales Strategy Using Python

Check out this informative and concise case study applying data analysis using Python to a well-defined e-commerce scenario.

By Juhi Sharma on Apr 7, 2021 in Business, Data Analysis, Ecommerce, Python, Sales
Working With Time Series Using SQL

This article is an overview of using SQL to manipulate time series data.

By Michael Grogan on Apr 6, 2021 in SQL, Time Series
How Noisy Labels Impact Machine Learning Models

Not all training data labeling errors have the same impact on the performance of the Machine Learning system. The structure of the labeling errors make a difference. Read iMerit’s latest blog to learn how to minimize the impact of labeling errors.

By iMerit on Apr 6, 2021 in Data Labeling, Data Preparation, iMerit, Machine Learning
KDnuggets Top Blogs Reward Program

To encourage more high-quality and especially original contributions to KDnuggets, we announce KDnuggets Top Blogs Reward program, where we will pay the authors of top blogs published each month, starting with blogs published in May 2021.

By Gregory Piatetsky on Apr 6, 2021 in About KDnuggets, Blog Rewards
How to Dockerize Any Machine Learning Application

How can you -- an awesome Data Scientist -- also be known as an awesome software engineer? Docker. And these 3 simple steps to use it for your solutions over and over again.

By Arunn Thevapalan on Apr 6, 2021 in Advice, Applications, Containers, Deployment, Docker, Machine Learning
Automated Text Classification with EvalML

Learn how EvalML leverages Woodwork, Featuretools and the nlp-primitives library to process text data and create a machine learning model that can detect spam text messages.

By Angela Lin on Apr 6, 2021 in Automated Machine Learning, AutoML, NLP, Python, Text Analytics, Text Classification
How to deploy Machine Learning/Deep Learning models to the web

The full value of your deep learning models comes from enabling others to use them. Learn how to deploy your model to the web and access it as a REST API, and begin to share the power of your machine learning development with the world.

By Ahmad Anis on Apr 5, 2021 in Deep Learning, Deployment, Machine Learning, RESTful API
Awesome Tricks And Best Practices From Kaggle

Easily learn what is only learned by hours of search and exploration.

By Bex T. on Apr 5, 2021 in Data Science, Kaggle, Machine Learning, Tips
Shapash: Making Machine Learning Models Understandable

Establishing an expectation for trust around AI technologies may soon become one of the most important skills provided by Data Scientists. Significant research investments are underway in this area, and new tools are being developed, such as Shapash, an open-source Python library that helps Data Scientists make machine learning models more transparent and understandable.

By Yann Golhen on Apr 2, 2021 in Explainability, Machine Learning, Python, SHAP
What’s ETL?

Discover what ETL is, and see in what ways it’s critical for data science.

By Omer Mahmood on Apr 2, 2021 in Data Processing, Data Science, ETL
Easy AutoML in Python

We’re excited to announce that a new open-source project has joined the Alteryx open-source ecosystem. EvalML is a library for automated machine learning (AutoML) and model understanding, written in Python.

By Dylan Sherry on Apr 1, 2021 in Automated Machine Learning, AutoML, Machine Learning, Open Source, Python
The 8 Most Common Data Scientists

Admit it all you wanna-be, newbie, and old-old-school Data Scientists on the planet, whether you like it or not, you've probably behaved like one of these types. Or two. Or all eight.

By JABDE on Apr 1, 2021 in Data Scientist, Humor
A/B Testing: 7 Common Questions and Answers in Data Science Interviews, Part 1

In this article, we’ll take an interview-driven approach by linking some of the most commonly asked interview questions to different components of A/B testing, including selecting ideas for testing, designing A/B tests, evaluating test results, and making ship or no ship decisions.

By Emma Ding on Apr 1, 2021 in A/B Testing, Career Advice, Data Science, Data Scientist, Interview Questions

2021 Apr

Latest Posts

Top Posts