- Gradient Boosted Decision Trees – A Conceptual Explanation, by Derrick Mwiti - Apr 30, 2021.
Gradient boosted decision trees involves implementing several models and aggregating their results. These boosted models have become popular thanks to their performance in machine learning competitions on Kaggle. In this article, we’ll see what gradient boosted decision trees are all about.
- FluDemic – using AI and Machine Learning to get ahead of disease, by DataDriven Health - Apr 30, 2021.
We are amidst a healthcare data explosion. AI/ML will be more vital than ever in the prevention and handling of future pandemics. Here, we walk you through the different facets of modeling infectious diseases, focusing on influenza and COVID-19.
- Learn Neural Networks for Natural Language Processing Now, by Matthew Mayo - Apr 30, 2021.
Still haven't come across enough quality contemporary natural language processing resources? Here is yet another freely-accessible offering from a top-notch university that might help quench your thirst for learning materials.
- Feature Engineering of DateTime Variables for Data Science, Machine Learning, by Samarth Agrawal - Apr 29, 2021.
Learn how to make more meaningful features from DateTime type variables to be used by Machine Learning Models.
- The secret to analysing large, complex datasets quickly and productively?, by Thomas Richardson - Apr 29, 2021.
Data is beautiful, and lots of data is simply sublime, but be wary of the pitfalls. Sometimes you have so much data you can waste hours exploring without answering the important questions. These 5 tips will show you how to analyse large complex datasets productively by constraining yourself.
- Introducing The NLP Index, by Matthew Mayo - Apr 29, 2021.
The NLP Index is a brand new resource for NLP code discovery, combining and indexing more than 3,000 paper and code pairs at launch. If you are interested in NLP research and locating the code and papers needed to understand an implement the latest research, you should check it out.
- How to Build an Impressive Data Science Resume, by Sharan Kumar Ravindran - Apr 28, 2021.
Every one of us needs a resume to showcase our skills and experience but how much effort are we putting into it to make it impactful. It is undeniable that resumes play a key role in our job application process. This article will explore some simple strategies to significantly improve the presentation as well as the content of data science resumes.
- Best Podcasts for Machine Learning, by Ritobrata Ghosh - Apr 28, 2021.
Podcasts, especially those featuring interviews, are great for learning about the subfields and tools of AI, as well as the rock stars and superheroes of the AI world. Here, we highlight some of the best podcasts today that are perfect for both those learning about machine learning and seasoned practitioners.
- Using Data Science to Predict and Prevent Real World Problems, by Devin Partida - Apr 28, 2021.
Do you have an interest in data science but lack an understanding of what, exactly, it can be used to accomplish in the real world? Read this article for a few examples of just how helpful data science can be for predicting and preventing real world problems.
- KDD-2021, The premier Data Science Conference, Aug 14-18, Virtual, by KDD 2021 - Apr 27, 2021.
KDD 2021, the Association for Computing Machinery (ACM) Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) flagship conference, will take place virtually Aug 14-18.
- Why You Should Consider Being a Data Engineer Instead of a Data Scientist, by Terence Shin - Apr 27, 2021.
A new king of the jungle has emerged.
- Data Scientist vs Machine Learning Engineer – what are their skills?, by Matthew Przybyla - Apr 27, 2021.
As two very popular tech roles for 2021, the Data Scientist and Machine Learning Engineer can overlap or be entirely distinct, depending on the organization you work for. However, general differences between these positions require certain skill sets that you must be prepared for when applying for jobs.
- Multiple Time Series Forecasting with PyCaret, by Moez Ali - Apr 27, 2021.
A step-by-step tutorial to forecast multiple time series with PyCaret.
- Top Stories, Apr 19-25: How to organize your data science project in 2021; Data Science Books You Should Start Reading in 2021, by KDnuggets - Apr 26, 2021.
Also: How to ace A/B Testing Data Science Interviews; Top 10 Must-Know Machine Learning Algorithms for Data Scientists – Part 1; The Most In-Demand Skills for Data Scientists in 2021; Free From Stanford: Machine Learning with Graphs
- Learn how to integrate third-party location data with AWS Data Exchange, by AWS - Apr 26, 2021.
Join this webinar, May 6 @ 2PM ET, to discover how Yum! Brands and other organizations are leveraging location-based data to boost in-app location accuracy, increase in-store foot traffic, and expand ecommerce business.
- Getting Started with Reinforcement Learning, by Pier Paolo Ippolito - Apr 26, 2021.
Demystifying some of the main concepts and terminologies associated with Reinforcement Learning and their association with other fields of AI.
- Data science is not about data – applying Dijkstra principle to data science, by Mehmet Suzen - Apr 26, 2021.
What is Data Science really about? Is it the data, or the algorithms, or something else? Similar foundational philosophical struggles exist with other scientific fields, including computer science, and maybe we can look to these resolutions to better understand the true 'meaning' of data science.
- Top 3 Challenges for Data & Analytics Leaders, by Minoo Agarwal - Apr 26, 2021.
The author shares the 3 top challenges faced as they led and established a data & analytics function, as well as ways in which these challenges were addressed. How have you solved the one challenge which has remained elusive to the author?
- Data careers are NOT one-size fits all! Tips for uncovering your ideal role in the data space, by Lillian Pierson - Apr 23, 2021.
Thriving as a data professional is about more than just making good money! It’s about FULFILLMENT & IMPACT. In this article, I will help you discover the BEST data role for you given your unique skill sets, personality & goals.
- Improving model performance through human participation, by Preetam Joshi - Apr 23, 2021.
Certain industries, such as medicine and finance, are sensitive to false positives. Using human input in the model inference loop can increase the final precision and recall. Here, we describe how to incorporate human feedback at inference time, so that Machines + Humans = Higher Precision & Recall.
- Data Science Books You Should Start Reading in 2021, by Przemek Chojecki - Apr 23, 2021.
Check out this curated list of the best data science books for any level.
- The Three Edge Case Culprits: Bias, Variance, and Unpredictability, by iMerit - Apr 22, 2021.
Edge cases occur for three basic reasons: Bias – the ML system is too ‘simple’; Variance – the ML system is too ‘inexperienced’; Unpredictability – the ML system operates in an environment full of surprises. How do we recognize these edge cases situations, and what can we do about them?
- What is Adversarial Neural Cryptography?, by Jesus Rodriguez - Apr 22, 2021.
The novel approach combines GANs and cryptography in a single, powerful security method.
- How to ace A/B Testing Data Science Interviews, by Preeti Semwal - Apr 22, 2021.
Understanding the process of A/B testing and knowing how to discuss this approach during data science job interviews can give you a leg up over other candidates. This mock interview provides a step-by-step guide through how to demonstrate your mastery of the key concepts and logical considerations.
- Top 10 Must-Know Machine Learning Algorithms for Data Scientists – Part 1, by Matthew Mayo - Apr 22, 2021.
New to data science? Interested in the must-know machine learning algorithms in the field? Check out the first part of our list and introductory descriptions of the top 10 algorithms for data scientists to know.
- How Uber manages Machine Learning Experiments with Comet.ml, by Comet.ml - Apr 21, 2021.
At Uber, where ML is fundamental to most products, a mechanism to manage offline experiments easily is needed to improve developer velocity. To solve for this, Uber AI was looking for a solution that will potentially complement and extend its in-house experiment management and collaboration capabilities.
- Production-Ready Machine Learning NLP API with FastAPI and spaCy, by Julien Salinas - Apr 21, 2021.
Learn how to implement an API based on FastAPI and spaCy for Named Entity Recognition (NER), and see why the author used FastAPI to quickly build a fast and robust machine learning API.
- 10 Must-Know Statistical Concepts for Data Scientists, by Soner Yildirim - Apr 21, 2021.
Statistics is a building block of data science. If you are working or plan to work in this field, then you will encounter the fundamental concepts reviewed for you here. Certainly, there is much more to learn in statistics, but once you understand these basics, then you can steadily build your way up to advanced topics.
- Time Series Forecasting with PyCaret Regression Module, by Moez Ali - Apr 21, 2021.
PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. See how to use PyCaret's Regression Module for Time Series Forecasting.
- Top 10 Data Science Courses to Take in 2021, by Coursera - Apr 20, 2021.
Whether you are getting started with Data Science / Machine Learning or are an experienced professional looking to learn something new, check out these top 10 data science courses for 2021.
- Data Analysis Using Tableau, by Juhi Sharma - Apr 20, 2021.
Read this overview of using Tableau for sale data analysis, and see how visualization can help tell the business story.
- Data Science 101: Normalization, Standardization, and Regularization, by Susan Sivek - Apr 20, 2021.
Normalization, standardization, and regularization all sound similar. However, each plays a unique role in your data preparation and model building process, so you must know when and how to use these important procedures.
- Want To Get Good At Time Series Forecasting? Predict The Weather, by Michael Grogan - Apr 20, 2021.
This article is designed to help the reader understand the components of a time series.
- Top Stories, Apr 12-18: The Most In-Demand Skills for Data Scientists in 2021, by KDnuggets - Apr 19, 2021.
Also: Top 3 Statistical Paradoxes in Data Science; A/B Testing: 7 Common Questions and Answers in Data Science Interviews, Part 2; ETL in the Cloud: Transforming Big Data Analytics with Data Warehouse Automation; Essential Math for Data Science: Linear Transformation with Matrices
- Knowledge Graph Conference, covering tools, techniques, case studies and more, by experts. May 3-6, Virtual, by Knowledge Graph Conference - Apr 19, 2021.
"A force to be reckoned with" - the who’s who of knowledge graphs will convene at The Knowledge Graph Conference in May.
- Build an Effective Data Analytics Team and Project Ecosystem for Success, by Randy Runtsch - Apr 19, 2021.
Apply these techniques to create a data analytics program that delivers solutions that delight end-users and meet their needs.
- How to organize your data science project in 2021, by Benjamin Obi Tayo - Apr 19, 2021.
Maintaining proper organization of all your data science projects will increase your productivity, minimize errors, and increase your development efficiency. This tutorial will guide you through a framework on how to keep everything in order on your local machine and in the cloud.
- Free From Stanford: Machine Learning with Graphs, by Matthew Mayo - Apr 19, 2021.
Check out the freely-available Stanford course Machine Learning with Graphs, taught by Jure Leskovec, and see how a world renowned researcher teaches their topic of expertise. Accessible materials include slides, videos, and more.
- Data Profession Job Satisfaction: Beware Of The Drop, by Gregory Piatetsky - Apr 16, 2021.
Latest KDnuggets Poll results: The Job satisfaction has declined for ML Engineers, Data Scientists, and Data Analysts, but remained the same for Data Engineers, and Managers/Directors. Data Scientist job satisfaction has an alarming drop in mid-career. Finally, which regions have the highest and lowest job satisfactions?
- What makes a song popular? Analyzing Top Songs on Spotify, by Sunku Sowmya Sree - Apr 16, 2021.
With so many great (and not-so-great) songs out there, it can be hard to find those that match your musical preferences. Follow along this ML model building project to explore the extensive song data available on Spotify and design a recommendation engine that could help you discover your next favorite artist!
- Essential Math for Data Science: Linear Transformation with Matrices, by Hadrien Jean - Apr 16, 2021.
You’ll start seeing matrices, not only as operations on numbers, but also as a way to transform vector spaces. This conception will give you the foundations needed to understand more complex linear algebra concepts like matrix decomposition.
- 6 Mistakes To Avoid While Training Your Machine Learning Model, by Cogito Tech - Apr 15, 2021.
While training the AI model, multi-stage activities are performed to utilize the training data in the best manner, so that outcomes are satisfying. So, here are the 6 common mistakes you need to understand to make sure your AI model is successful.
- Top 3 Statistical Paradoxes in Data Science, by Francesco Casalegno - Apr 15, 2021.
Observation bias and sub-group differences generate statistical paradoxes.
- The Most In-Demand Skills for Data Scientists in 2021, by Terence Shin - Apr 15, 2021.
If you are preparing to make a career as a Data Scientist or are looking for opportunities to skill-up in your current role, this analysis of in-demand skills for 2021, based on over 15,000 Data Scientist job postings, should offer you a good idea as to which programming languages and software tools are increasing and decreasing in importance.
- ETL in the Cloud: Transforming Big Data Analytics with Data Warehouse Automation, by Nitin Kumar - Apr 15, 2021.
Today, organizations are increasingly implementing cloud ETL tools to handle large data sets. With data sets becoming larger by the day, unified ETL tools have become crucial for data integration needs of enterprises.
- Is Your Model Overtained?, by Charles Martin - Apr 14, 2021.
WeightWatcher is based on theoretical research (done injoint with UC Berkeley) into Why Deep Learning Works, based on our Theory of Heavy Tailed Self-Regularization (HT-SR). It uses ideas from Random Matrix Theory (RMT), Statistical Mechanics, and Strongly Correlated Systems.
- Continuous Training for Machine Learning – a Framework for a Successful Strategy, by Or Itzary - Apr 14, 2021.
A basic appreciation by anyone who builds machine learning models is that the model is not useful without useful data. This doesn't change after a model is deployed to production. Effectively monitoring and retraining models with updated data is key to maintaining valuable ML solutions, and can be accomplished with effective approaches to production-level continuous training that is guided by the data.
- Models of Data Science teams: Chess vs Checkers, by Marco Santoni - Apr 14, 2021.
Should we still consider data scientists and data engineers as separate roles? When should a team grow with full-stack data developers? Introducing the Checkers-like data team.
- Top March Stories: Are You Still Using Pandas to Process Big Data in 2021? Here are two better options; How To Overcome The Fear of Math and Learn Math For Data Science, by Gregory Piatetsky - Apr 13, 2021.
Also: Top YouTube Channels for Data Science; More Data Science Cheatsheets; Top 10 Python Libraries Data Scientists should know in 2021.
- Shaping the new digital age – with SAS and Microsoft, by SAS - Apr 13, 2021.
Join technology experts, partners and analysts in the industry for this webinar series to see how SAS Viya can help you make the most of AI, analytics and the cloud for faster decisions and trusted results.
- Automated Anomaly Detection Using PyCaret, by Ekta Sharma - Apr 13, 2021.
Learn to automate anomaly detection using the open source machine learning library PyCaret.
- 7 Must-Haves in your Data Science CV, by Elad Cohen - Apr 13, 2021.
If you are looking for a new role as a Data Scientist -- either as a first job fresh out of school, a career change, or a shift to another organization -- then check off as many of these critical points as possible to stand out in the crowd and pass the hiring manager's initial CV screen.
- Why Automated Feature Selection Has Its Risks, by Michael Grogan - Apr 13, 2021.
Theoretical relevance of features must not be ignored.
- 10 Real-Life Applications of Reinforcement Learning, by Derrick Mwiti - Apr 12, 2021.
In this article, we’ll look at some of the real-world applications of reinforcement learning.
- Zero-Shot Learning: Can you classify an object without seeing it before?, by Nagesh Chauhan - Apr 12, 2021.
Developing machine learning models that can perform predictive functions on data it has never seen before has become an important research area called zero-shot learning. We tend to be pretty great at recognizing things in the world we never saw before, and zero-shot learning offers a possible path toward mimicking this powerful human capability.
- Top Stories, Apr 5-11: Awesome Tricks And Best Practices From Kaggle; How to deploy Machine Learning/Deep Learning models to the web, by KDnuggets - Apr 12, 2021.
Also: Shapash: Making Machine Learning Models Understandable; A/B Testing: 7 Common Questions and Answers in Data Science Interviews, Part 2; How to deploy Machine Learning/Deep Learning models to the web; Working With Time Series Using SQL
- How to Apply Transformers to Any Length of Text, by James Briggs - Apr 12, 2021.
Read on to find how to restore the power of NLP for long sequences.
- Interpretable Machine Learning: The Free eBook, by Matthew Mayo - Apr 9, 2021.
Interested in learning more about interpretability in machine learning? Check out this free eBook to learn about the basics, simple interpretable models, and strategies for interpreting more complex black box models.
- Deep Learning Recommendation Models (DLRM): A Deep Dive, by Nishant Kumar - Apr 9, 2021.
The currency in the 21st century is no longer just data. It's the attention of people. This deep dive article presents the architecture and deployment issues experienced with the deep learning recommendation model, DLRM, which was open-sourced by Facebook in March 2019.
- Deepfakes are now mainstream. What’s next?, by Dan Abdinoor - Apr 9, 2021.
Deepfakes have become mainstream. Here we take a closer look at recent news about deepfakes, and what it all might mean for the future.
- Can Robots and Humans Combat Extinction Together? Find Out April 17, by DataYap - Apr 8, 2021.
Get ready to trade that “Zoom fatigue” for Zoom euphoria at the DataYap Virtual Conference, Apr 17, where you’ll have your pick of 15 panels on some of the hottest topics in the data and technology space led by some of the top names in data science.
- Key-Value Databases, Explained, by Alex Williams - Apr 8, 2021.
Among the four big NoSQL database types, key-value stores are probably the most popular ones due to their simplicity and fast performance. Let’s further explore how key-value stores work and what are their practical uses.
- Why machine learning struggles with causality, by Ben Dickson - Apr 8, 2021.
If there's one thing people know how to do, and that's guess what caused something else to happen. Usually these guesses are good, especially when making a visual observation of something in the physical world. AI continues to wrestle with such inference of causality, and fundamental challenges must be overcome before we can have "intuitive" machine learning.
- A/B Testing: 7 Common Questions and Answers in Data Science Interviews, Part 2, by Emma Ding - Apr 8, 2021.
In this second article in this series, we’ll continue to take an interview-driven approach by linking some of the most commonly asked interview questions to different components of A/B testing, including selecting ideas for testing, designing A/B tests, evaluating test results, and making ship or no ship decisions.
- Start a Career in a Growing Field with Google’s Data Analytics Professional Certificate, by Coursera - Apr 7, 2021.
Google's recently launched Data Analytics Professional Certificate on Coursera is great for anyone, regardless of background or experience. The program is completely online, self-paced, and costs $39 per month. Interested in preparing for a new career in a high-growth field?
- E-commerce Data Analysis for Sales Strategy Using Python, by Juhi Sharma - Apr 7, 2021.
Check out this informative and concise case study applying data analysis using Python to a well-defined e-commerce scenario.
- How to Make Sure Your Analysis Actually Gets Used, by Taylor Count - Apr 7, 2021.
Few things are as demoralizing as seeing your data analysis tossed aside. Learn from these tips -- assembled from experience, academic research, and industry best practice -- on how to make sure your hard work receives the credit it deserves and delivers the value to your organization that you expect.
- Microsoft Research Trains Neural Networks to Understand What They Read, by Jesus Rodriguez - Apr 7, 2021.
The new models make inroads in a new areas of deep learning known as machine reading comprehension.
- Working With Time Series Using SQL, by Michael Grogan - Apr 6, 2021.
This article is an overview of using SQL to manipulate time series data.
- How Noisy Labels Impact Machine Learning Models, by iMerit - Apr 6, 2021.
Not all training data labeling errors have the same impact on the performance of the Machine Learning system. The structure of the labeling errors make a difference. Read iMerit’s latest blog to learn how to minimize the impact of labeling errors.
- KDnuggets Top Blogs Reward Program, by Gregory Piatetsky - Apr 6, 2021.
To encourage more high-quality and especially original contributions to KDnuggets, we announce KDnuggets Top Blogs Reward program, where we will pay the authors of top blogs published each month, starting with blogs published in May 2021.
- How to Dockerize Any Machine Learning Application, by Arunn Thevapalan - Apr 6, 2021.
How can you -- an awesome Data Scientist -- also be known as an awesome software engineer? Docker. And these 3 simple steps to use it for your solutions over and over again.
- Automated Text Classification with EvalML, by Angela Lin - Apr 6, 2021.
Learn how EvalML leverages Woodwork, Featuretools and the nlp-primitives library to process text data and create a machine learning model that can detect spam text messages.
- Top Stories, Mar 29 – Apr 4: Top 10 Python Libraries Data Scientists should know in 2021; Shapash: Making Machine Learning Models Understandable, by KDnuggets - Apr 5, 2021.
Also: The 8 Most Common Data Scientists; Easy AutoML in Python; How to Succeed in Becoming a Freelance Data Scientist; The 8 Most Common Data Scientists
- The Best Machine Learning Frameworks & Extensions for TensorFlow, by Derrick Mwiti - Apr 5, 2021.
Check out this curated list of useful frameworks and extensions for TensorFlow.
- How to deploy Machine Learning/Deep Learning models to the web, by Ahmad Anis - Apr 5, 2021.
The full value of your deep learning models comes from enabling others to use them. Learn how to deploy your model to the web and access it as a REST API, and begin to share the power of your machine learning development with the world.
- Awesome Tricks And Best Practices From Kaggle, by Bex T. - Apr 5, 2021.
Easily learn what is only learned by hours of search and exploration.
- One Million KDnuggets Visitors in March. Wow., by Gregory Piatetsky - Apr 3, 2021.
KDnuggets has reached an amazing milestone of one million unique visitors in March 2021. We review how we got here.
- What did COVID do to all our models?, by Heather Fyson - Apr 2, 2021.
An interview with Dean Abbott and John Elder about change management, complexity, interpretability, and the risk of AI taking over humanity.
- Shapash: Making Machine Learning Models Understandable, by Yann Golhen - Apr 2, 2021.
Establishing an expectation for trust around AI technologies may soon become one of the most important skills provided by Data Scientists. Significant research investments are underway in this area, and new tools are being developed, such as Shapash, an open-source Python library that helps Data Scientists make machine learning models more transparent and understandable.
- What’s ETL?, by Omer Mahmood - Apr 2, 2021.
Discover what ETL is, and see in what ways it’s critical for data science.
- Easy AutoML in Python, by Dylan Sherry - Apr 1, 2021.
We’re excited to announce that a new open-source project has joined the Alteryx open-source ecosystem. EvalML is a library for automated machine learning (AutoML) and model understanding, written in Python.
- The 8 Most Common Data Scientists, by JABDE - Apr 1, 2021.
Admit it all you wanna-be, newbie, and old-old-school Data Scientists on the planet, whether you like it or not, you've probably behaved like one of these types. Or two. Or all eight.
- A/B Testing: 7 Common Questions and Answers in Data Science Interviews, Part 1, by Emma Ding - Apr 1, 2021.
In this article, we’ll take an interview-driven approach by linking some of the most commonly asked interview questions to different components of A/B testing, including selecting ideas for testing, designing A/B tests, evaluating test results, and making ship or no ship decisions.