- Gradient Boosted Decision Trees – A Conceptual Explanation, by Derrick Mwiti - Apr 30, 2021.
Gradient boosted decision trees involves implementing several models and aggregating their results. These boosted models have become popular thanks to their performance in machine learning competitions on Kaggle. In this article, we’ll see what gradient boosted decision trees are all about.
- FluDemic – using AI and Machine Learning to get ahead of disease, by DataDriven Health - Apr 30, 2021.
We are amidst a healthcare data explosion. AI/ML will be more vital than ever in the prevention and handling of future pandemics. Here, we walk you through the different facets of modeling infectious diseases, focusing on influenza and COVID-19.
- Learn Neural Networks for Natural Language Processing Now, by Matthew Mayo - Apr 30, 2021.
Still haven't come across enough quality contemporary natural language processing resources? Here is yet another freely-accessible offering from a top-notch university that might help quench your thirst for learning materials.
- Feature Engineering of DateTime Variables for Data Science, Machine Learning, by Samarth Agrawal - Apr 29, 2021.
Learn how to make more meaningful features from DateTime type variables to be used by Machine Learning Models.
- Introducing The NLP Index, by Matthew Mayo - Apr 29, 2021.
The NLP Index is a brand new resource for NLP code discovery, combining and indexing more than 3,000 paper and code pairs at launch. If you are interested in NLP research and locating the code and papers needed to understand an implement the latest research, you should check it out.
- Data Scientist vs Machine Learning Engineer – what are their skills?, by Matthew Przybyla - Apr 27, 2021.
As two very popular tech roles for 2021, the Data Scientist and Machine Learning Engineer can overlap or be entirely distinct, depending on the organization you work for. However, general differences between these positions require certain skill sets that you must be prepared for when applying for jobs.
- Multiple Time Series Forecasting with PyCaret, by Moez Ali - Apr 27, 2021.
A step-by-step tutorial to forecast multiple time series with PyCaret.
- Getting Started with Reinforcement Learning, by Pier Paolo Ippolito - Apr 26, 2021.
Demystifying some of the main concepts and terminologies associated with Reinforcement Learning and their association with other fields of AI.
- Improving model performance through human participation, by Preetam Joshi - Apr 23, 2021.
Certain industries, such as medicine and finance, are sensitive to false positives. Using human input in the model inference loop can increase the final precision and recall. Here, we describe how to incorporate human feedback at inference time, so that Machines + Humans = Higher Precision & Recall.
- Data Science Books You Should Start Reading in 2021, by Przemek Chojecki - Apr 23, 2021.
Check out this curated list of the best data science books for any level.
- What is Adversarial Neural Cryptography?, by Jesus Rodriguez - Apr 22, 2021.
The novel approach combines GANs and cryptography in a single, powerful security method.
- How to ace A/B Testing Data Science Interviews, by Preeti Semwal - Apr 22, 2021.
Understanding the process of A/B testing and knowing how to discuss this approach during data science job interviews can give you a leg up over other candidates. This mock interview provides a step-by-step guide through how to demonstrate your mastery of the key concepts and logical considerations.
- Top 10 Must-Know Machine Learning Algorithms for Data Scientists – Part 1, by Matthew Mayo - Apr 22, 2021.
New to data science? Interested in the must-know machine learning algorithms in the field? Check out the first part of our list and introductory descriptions of the top 10 algorithms for data scientists to know.
- Production-Ready Machine Learning NLP API with FastAPI and spaCy, by Julien Salinas - Apr 21, 2021.
Learn how to implement an API based on FastAPI and spaCy for Named Entity Recognition (NER), and see why the author used FastAPI to quickly build a fast and robust machine learning API.
- 10 Must-Know Statistical Concepts for Data Scientists, by Soner Yildirim - Apr 21, 2021.
Statistics is a building block of data science. If you are working or plan to work in this field, then you will encounter the fundamental concepts reviewed for you here. Certainly, there is much more to learn in statistics, but once you understand these basics, then you can steadily build your way up to advanced topics.
- Time Series Forecasting with PyCaret Regression Module, by Moez Ali - Apr 21, 2021.
PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. See how to use PyCaret's Regression Module for Time Series Forecasting.
- Data Analysis Using Tableau, by Juhi Sharma - Apr 20, 2021.
Read this overview of using Tableau for sale data analysis, and see how visualization can help tell the business story.
- Data Science 101: Normalization, Standardization, and Regularization, by Susan Sivek - Apr 20, 2021.
Normalization, standardization, and regularization all sound similar. However, each plays a unique role in your data preparation and model building process, so you must know when and how to use these important procedures.
- Want To Get Good At Time Series Forecasting? Predict The Weather, by Michael Grogan - Apr 20, 2021.
This article is designed to help the reader understand the components of a time series.
- How to organize your data science project in 2021, by Benjamin Obi Tayo - Apr 19, 2021.
Maintaining proper organization of all your data science projects will increase your productivity, minimize errors, and increase your development efficiency. This tutorial will guide you through a framework on how to keep everything in order on your local machine and in the cloud.
- Free From Stanford: Machine Learning with Graphs, by Matthew Mayo - Apr 19, 2021.
Check out the freely-available Stanford course Machine Learning with Graphs, taught by Jure Leskovec, and see how a world renowned researcher teaches their topic of expertise. Accessible materials include slides, videos, and more.
- What makes a song popular? Analyzing Top Songs on Spotify, by Sunku Sowmya Sree - Apr 16, 2021.
With so many great (and not-so-great) songs out there, it can be hard to find those that match your musical preferences. Follow along this ML model building project to explore the extensive song data available on Spotify and design a recommendation engine that could help you discover your next favorite artist!
- Essential Math for Data Science: Linear Transformation with Matrices, by Hadrien Jean - Apr 16, 2021.
You’ll start seeing matrices, not only as operations on numbers, but also as a way to transform vector spaces. This conception will give you the foundations needed to understand more complex linear algebra concepts like matrix decomposition.
- Top 3 Statistical Paradoxes in Data Science, by Francesco Casalegno - Apr 15, 2021.
Observation bias and sub-group differences generate statistical paradoxes.
- ETL in the Cloud: Transforming Big Data Analytics with Data Warehouse Automation, by Nitin Kumar - Apr 15, 2021.
Today, organizations are increasingly implementing cloud ETL tools to handle large data sets. With data sets becoming larger by the day, unified ETL tools have become crucial for data integration needs of enterprises.
- Is Your Model Overtained?, by Charles Martin - Apr 14, 2021.
WeightWatcher is based on theoretical research (done injoint with UC Berkeley) into Why Deep Learning Works, based on our Theory of Heavy Tailed Self-Regularization (HT-SR). It uses ideas from Random Matrix Theory (RMT), Statistical Mechanics, and Strongly Correlated Systems.
- Continuous Training for Machine Learning – a Framework for a Successful Strategy, by Or Itzary - Apr 14, 2021.
A basic appreciation by anyone who builds machine learning models is that the model is not useful without useful data. This doesn't change after a model is deployed to production. Effectively monitoring and retraining models with updated data is key to maintaining valuable ML solutions, and can be accomplished with effective approaches to production-level continuous training that is guided by the data.
- Automated Anomaly Detection Using PyCaret, by Ekta Sharma - Apr 13, 2021.
Learn to automate anomaly detection using the open source machine learning library PyCaret.
- 10 Real-Life Applications of Reinforcement Learning, by Derrick Mwiti - Apr 12, 2021.
In this article, we’ll look at some of the real-world applications of reinforcement learning.
- Zero-Shot Learning: Can you classify an object without seeing it before?, by Nagesh Chauhan - Apr 12, 2021.
Developing machine learning models that can perform predictive functions on data it has never seen before has become an important research area called zero-shot learning. We tend to be pretty great at recognizing things in the world we never saw before, and zero-shot learning offers a possible path toward mimicking this powerful human capability.
- How to Apply Transformers to Any Length of Text, by James Briggs - Apr 12, 2021.
Read on to find how to restore the power of NLP for long sequences.
- Interpretable Machine Learning: The Free eBook, by Matthew Mayo - Apr 9, 2021.
Interested in learning more about interpretability in machine learning? Check out this free eBook to learn about the basics, simple interpretable models, and strategies for interpreting more complex black box models.
- Deep Learning Recommendation Models (DLRM): A Deep Dive, by Nishant Kumar - Apr 9, 2021.
The currency in the 21st century is no longer just data. It's the attention of people. This deep dive article presents the architecture and deployment issues experienced with the deep learning recommendation model, DLRM, which was open-sourced by Facebook in March 2019.
- Key-Value Databases, Explained, by Alex Williams - Apr 8, 2021.
Among the four big NoSQL database types, key-value stores are probably the most popular ones due to their simplicity and fast performance. Let’s further explore how key-value stores work and what are their practical uses.
- A/B Testing: 7 Common Questions and Answers in Data Science Interviews, Part 2, by Emma Ding - Apr 8, 2021.
In this second article in this series, we’ll continue to take an interview-driven approach by linking some of the most commonly asked interview questions to different components of A/B testing, including selecting ideas for testing, designing A/B tests, evaluating test results, and making ship or no ship decisions.
- E-commerce Data Analysis for Sales Strategy Using Python, by Juhi Sharma - Apr 7, 2021.
Check out this informative and concise case study applying data analysis using Python to a well-defined e-commerce scenario.
- Microsoft Research Trains Neural Networks to Understand What They Read, by Jesus Rodriguez - Apr 7, 2021.
The new models make inroads in a new areas of deep learning known as machine reading comprehension.
- Working With Time Series Using SQL, by Michael Grogan - Apr 6, 2021.
This article is an overview of using SQL to manipulate time series data.
- How to Dockerize Any Machine Learning Application, by Arunn Thevapalan - Apr 6, 2021.
How can you -- an awesome Data Scientist -- also be known as an awesome software engineer? Docker. And these 3 simple steps to use it for your solutions over and over again.
- Automated Text Classification with EvalML, by Angela Lin - Apr 6, 2021.
Learn how EvalML leverages Woodwork, Featuretools and the nlp-primitives library to process text data and create a machine learning model that can detect spam text messages.
- The Best Machine Learning Frameworks & Extensions for TensorFlow, by Derrick Mwiti - Apr 5, 2021.
Check out this curated list of useful frameworks and extensions for TensorFlow.
- How to deploy Machine Learning/Deep Learning models to the web, by Ahmad Anis - Apr 5, 2021.
The full value of your deep learning models comes from enabling others to use them. Learn how to deploy your model to the web and access it as a REST API, and begin to share the power of your machine learning development with the world.
- Awesome Tricks And Best Practices From Kaggle, by Bex T. - Apr 5, 2021.
Easily learn what is only learned by hours of search and exploration.
- Shapash: Making Machine Learning Models Understandable, by Yann Golhen - Apr 2, 2021.
Establishing an expectation for trust around AI technologies may soon become one of the most important skills provided by Data Scientists. Significant research investments are underway in this area, and new tools are being developed, such as Shapash, an open-source Python library that helps Data Scientists make machine learning models more transparent and understandable.
- What’s ETL?, by Omer Mahmood - Apr 2, 2021.
Discover what ETL is, and see in what ways it’s critical for data science.
- Easy AutoML in Python, by Dylan Sherry - Apr 1, 2021.
We’re excited to announce that a new open-source project has joined the Alteryx open-source ecosystem. EvalML is a library for automated machine learning (AutoML) and model understanding, written in Python.
- A/B Testing: 7 Common Questions and Answers in Data Science Interviews, Part 1, by Emma Ding - Apr 1, 2021.
In this article, we’ll take an interview-driven approach by linking some of the most commonly asked interview questions to different components of A/B testing, including selecting ideas for testing, designing A/B tests, evaluating test results, and making ship or no ship decisions.