- Data Monetization 101, by John Farrall - Jul 30, 2021.
The evolving marketplace of data now includes many firms that support a variety of needs from organizations looking to grow with data. This listing of the key players categorized by target market provides an interesting picture of this exciting industry sector.
- 10 Machine Learning Model Training Mistakes, by Sandeep Uttamchandani, Ph.D. - Jul 30, 2021.
These common ML model training mistakes are easy to overlook but costly to redeem.
- GitHub Copilot Open Source Alternatives, by Matthew Mayo - Jul 29, 2021.
GitHub's Copilot code generation tool is currently only available via approved request. Here are 4 Copilot alternatives that you can use in your programming today.
- MLOps Best Practices, by Siddharth Kashiramka - Jul 29, 2021.
Many technical challenges must be overcome to achieve successful delivery of machine learning solutions at scale. This article shares best practices we encountered while architecting and applying a model deployment platform within a large organization, including required functionality, the recommendation for a scalable deployment pattern, and techniques for testing and performance tuning models to maximize platform throughput.
- A Brief Introduction to the Concept of Data, by Angelica Lo Duca - Jul 29, 2021.
Every aspiring data scientist must know the concept of data and the kind of analysis they can run. This article introduces the concept of data (quantitative and qualitative) and the types of analysis.
- dbt for Data Transformation – Hands-on Tutorial, by Essi Alizadeh - Jul 28, 2021.
The data build tool (dbt) is gaining in popularity and use, and this hands-on tutorial covers creating complex models, using variables and functions, running tests, generating docs, and many more features.
- Building Machine Learning Pipelines using Snowflake and Dask, by Daniel Foley - Jul 28, 2021.
In this post, I want to share some of the tools that I have been exploring recently and show you how I use them and how they helped improve the efficiency of my workflow. The two I will talk about in particular are Snowflake and Dask. Two very different tools but ones that complement each other well especially as part of the ML Lifecycle.
- Python Data Structures Compared, by Matthew Mayo - Jul 27, 2021.
Let's take a look at 5 different Python data structures and see how they could be used to store data we might be processing in our everyday tasks, as well as the relative memory they use for storage and time they take to create and access.
- Machine Learning Skills – Update Yours This Summer, by Ahmad Anis - Jul 27, 2021.
The process of mastering new knowledge often requires multiple passes to ensure the information is deeply understood. If you already began your journey into machine learning and data science, then you are likely ready for a refresher on topics you previously covered. This eight-week self-learning path will help you recapture the foundations and prepare you for future success in applying these skills.
- Facebook Open Sources a Chatbot That Can Discuss Any Topic, by Jesus Rodriguez - Jul 27, 2021.
The new version expands the capabilities of its predecessor building a much more natural conversational experience.
- Not Only for Deep Learning: How GPUs Accelerate Data Science & Data Analytics, by Kevin Vu - Jul 26, 2021.
Modern AI/ML systems’ success has been critically dependent on their ability to process massive amounts of raw data in a parallel fashion using task-optimized hardware. Can we leverage the power of GPU and distributed computing for regular data processing jobs too?
- Top Python Data Science Interview Questions, by Nate Rosidi - Jul 23, 2021.
Six must-know technical concepts and two types of questions to test them.
- Full cross-validation and generating learning curves for time-series models, by Mehmet Suzen - Jul 23, 2021.
Standard cross-validation on time series data is not possible because the data model is sequential, which does not lend well to splitting the data into statistically useful training and validation sets. However, a new approach called Reconstructive Cross-validation may pave the way toward performing this type of important analysis for predictive models with temporal datasets.
- How to Use Kafka Connect to Create an Open Source Data Pipeline for Processing Real-Time Data, by Paul Brebner - Jul 23, 2021.
This article shows you how to create a real-time data pipeline using only pure open source technologies. These include Kafka Connect, Apache Kafka, Kibana and more.
- Overview of Albumentations: Open-source library for advanced image augmentations, by Olga Chernytska - Jul 22, 2021.
With code snippets on augmentations and integrations with PyTorch and Tensorflow pipelines.
- The Lost Art of Decile Analysis, by Venkat Raman - Jul 22, 2021.
The goal of classification is a primary and widely-used application of machine learning algorithms. However, if careful consideration through additional analysis is not taken into the subtlety in the results of an even an apparently straightforward binary classifier, then the deeper meaning of your prediction may be obscured.
- ColabCode: Deploying Machine Learning Models From Google Colab, by Kaustubh Gupta - Jul 22, 2021.
New to ColabCode? Learn how to use it to start a VS Code Server, Jupyter Lab, or FastAPI.
- The Best SOTA NLP Course is Free!, by Matthew Mayo - Jul 21, 2021.
Hugging Face has recently released a course on using its libraries and ecosystem for practical NLP, and it appears to be very comprehensive. Have a look for yourself.
- WHT: A Simpler Version of the fast Fourier Transform (FFT) you should know, by Sean O'Connor - Jul 21, 2021.
The fast Walsh Hadamard transform is a simple and useful algorithm for machine learning that was popular in the 1960s and early 1970s. This useful approach should be more widely appreciated and applied for its efficiency.
- When to Retrain an Machine Learning Model? Run these 5 checks to decide on the schedule, by Dral & Samuylov - Jul 20, 2021.
Machine learning models degrade with time, and need to be regularly updated. In the article, we suggest how to approach retraining and plan for it in advance.
- 11 Important Probability Distributions Explained, by Terence Shin - Jul 20, 2021.
There are many distribution functions considered in statistics and machine learning, which can seem daunting to understand at first. Many are actually closely related, and with these intuitive explanations of the most important probability distributions, you can begin to appreciate the observations of data these distributions communicate.
- Understanding BERT with Hugging Face, by Kevin Vu - Jul 20, 2021.
We don’t really understand something before we implement it ourselves. So in this post, we will implement a Question Answering Neural Network using BERT and a Hugging Face Library.
- How Much Memory is your Machine Learning Code Consuming?, by Tirthajyoti Sarkar - Jul 19, 2021.
Learn how to quickly check the memory footprint of your machine learning function/module with one line of command. Generate a nice report too.
- Advice for Learning Data Science from Google’s Director of Research, by Benjamin Obi Tayo - Jul 19, 2021.
Surfing the professional career wave in data science is a hot prospect for many looking to get their start in the world. The digital revolution continues to create many exciting new opportunities. But, jumping in too fast without fully establishing your foundational skills can be detrimental to your success, as is suggested by this advice for data science newbies from Peter Norvig, the Director of Research at Google.
- How to Create Unbiased Machine Learning Models, by Philip Tannor - Jul 16, 2021.
In this post we discuss the concepts of bias and fairness in the Machine Learning world, and show how ML biases often reflect existing biases in society. Additionally, We discuss various methods for testing and enforcing fairness in ML models.
- High-Performance Deep Learning: How to train smaller, faster, and better models – Part 5, by Gaurav Menghani - Jul 16, 2021.
Training efficient deep learning models with any software tool is nothing without an infrastructure of robust and performant compute power. Here, current software and hardware ecosystems are reviewed that you might consider in your development when the highest performance possible is needed.
- Pushing No-Code Machine Learning to the Edge, by Devin Partida - Jul 16, 2021.
Discover the power of no-code machine learning, and what it can accomplish when pushed to edge devices.
- 7 Open Source Libraries for Deep Learning Graphs, by Kevin Vu - Jul 15, 2021.
In this article we’ll go through 7 up-and-coming open source libraries for graph deep learning, ranked in order of increasing popularity.
- Top 6 Data Science Online Courses in 2021, by Natassha Selvaraj - Jul 15, 2021.
As an aspiring data scientist, it is easy to get overwhelmed by the abundance of resources available on the Internet. With these 6 online courses, you can develop yourself from a novice to experienced in less than a year, and prepare you with the skills necessary to land a job in data science.
- Date Processing and Feature Engineering in Python, by Matthew Mayo - Jul 15, 2021.
Have a look at some code to streamline the parsing and processing of dates in Python, including the engineering of some useful and common features.
- Shareable data analyses using templates, by Cedric Dussud - Jul 14, 2021.
We've been using shared data analyses in production for three years. Here's how you can create reusable templates for common metrics and analyses.
- Geometric foundations of Deep Learning, by Michael Bronstein, Joan Bruna, Taco Cohen, and PV - Jul 14, 2021.
Geometric Deep Learning is an attempt for geometric unification of a broad class of machine learning problems from the perspectives of symmetry and invariance. These principles not only underlie the breakthrough performance of convolutional neural networks and the recent success of graph neural networks but also provide a principled way to construct new types of problem-specific inductive biases.
- SQL, Syllogisms, and Explanations, by Adrian Walker - Jul 14, 2021.
Check out the Executable English Platform, for self-explaining applications written in English that you can run in your browser.
- Streamlit Tips, Tricks, and Hacks for Data Scientists, by Kaveh Bakhtiyari - Jul 13, 2021.
Today, I am going to talk about a few tips that I learned within more than a year of using Streamlit, that you can also use to unleash your powerful DS/AI/ML (whatever they may be) applications.
- Become an Analytics Engineer in 90 Days, by Tuan Nguyen - Jul 12, 2021.
A new role of the Analytics Engineer is an exciting opportunity that crosses the skill sets of a Data Analyst and Data Engineer. Here, we describe how this position can evolve at an organization, and recommend self-learning resources that can be used to prepare for the multifaceted responsibilities.
- How to Tell if You Have Trained Your Model with Enough Data, by Charles Martin - Jul 12, 2021.
WeightWatcher is an open-source, diagnostic tool for evaluating the performance of (pre)-trained and fine-tuned Deep Neural Networks. It is based on state-of-the-art research into Why Deep Learning Works.
- Exploring the SwAV Method, by Antonio Ferraioli - Jul 9, 2021.
This post discusses the SwAV (Swapping Assignments between multiple Views of the same image) method from the paper “Unsupervised Learning of Visual Features by Contrasting Cluster Assignments” by M. Caron et al.
- High-Performance Deep Learning: How to train smaller, faster, and better models – Part 4, by Gaurav Menghani - Jul 9, 2021.
With the right software, hardware, and techniques at your fingertips, your capability to effectively develop high-performing models now hinges on leveraging automation to expedite the experimental process and building with the most efficient model architectures for your data.
- 5 Python Data Processing Tips & Code Snippets, by Matthew Mayo - Jul 9, 2021.
This is a small collection of Python code snippets that a beginner might find useful for data processing.
- A Lightning Fast Look at Single Line Exploratory Data Analysis, by Harsha Mandala - Jul 8, 2021.
Here's a very quick look at how you can perform EDA with a single line of code using D-Tale.
- Pandas not enough? Here are a few good alternatives to processing larger and faster data in Python, by DaurEd - Jul 8, 2021.
While the Pandas library remains a crucial workhorse in data processing and management for data science, some limitations exist that can impact efficiencies, especially with very large data sets. Here, a few interesting alternatives to Pandas are introduced to improve your large data handling performance.
- MLOps is an Engineering Discipline: A Beginner’s Overview, by Angad Gupta - Jul 8, 2021.
MLOps = ML + DEV + OPS. MLOps is the idea of combining the long-established practice of DevOps with the emerging field of Machine Learning.
- How to Get Practical Data Science Experience to be Career-Ready, by Terence Shin - Jul 7, 2021.
Becoming a professional in the field of data science takes more than just book-smarts. You need to have experience with real-world data sets, frequently-used tools, and an intuition for solutions that you can only gain from hands-on experience. These resources will jump start developing your practical skills.
- How to Build An Image Classifier in Few Lines of Code with Flash, by Irfan Alghani Khalid - Jul 7, 2021.
Introducing Flash: The high-level deep learning framework for beginners.
- ROC Curve Explained, by Zolzaya Luvsandorj - Jul 6, 2021.
Learn to visualise a ROC curve in Python.
- A Learning Path To Becoming a Data Scientist, by Sara Metwalli - Jul 6, 2021.
Becoming a professional data scientist may not be as easy as "1... 2... 3...", but these 10 steps can be your self-learning roadmap to kickstarting your future in the exciting and ever-expanding field of data science.
- GitHub Copilot: Your AI pair programmer – what is all the fuss about?, by Matthew Mayo - Jul 5, 2021.
GitHub just released Copilot, a code completion tool on steroids dubbed your "AI pair programmer." Read more about it, and see what all the fuss is about.
- Predict Customer Churn (the right way) using PyCaret, by Moez Ali - Jul 5, 2021.
A step-by-step guide on how to predict customer churn the right way using PyCaret that actually optimizes the business objective and improves ROI.
- Semantic Search: Measuring Meaning From Jaccard to Bert, by James Briggs - Jul 2, 2021.
In this article, we’ll cover a few of the most interesting — and powerful — of these techniques — focusing specifically on semantic search. We’ll learn how they work, what they’re good at, and how we can implement them ourselves.
- High-Performance Deep Learning: How to train smaller, faster, and better models – Part 3, by Gaurav Menghani - Jul 2, 2021.
Now that you are ready to efficiently build advanced deep learning models with the right software and hardware tools, the techniques involved in implementing such efforts must be explored to improve model quality and obtain the performance that your organization desires.
- Prepare Behavioral Questions for Data Science Interviews, by Zijing Zhu - Jul 2, 2021.
This is part 5 of a series by the author which helps readers nail the data science interviews with confidence.
- How to Use NVIDIA GPU Accelerated Libraries, by Kevin Vu - Jul 1, 2021.
If you are wondering how you can take advantage of NVIDIA GPU accelerated libraries for your AI projects, this guide will help answer questions and get you started on the right path.
- Learning Data Science Through Social Media, by Susan Sivek - Jul 1, 2021.
Want your social media algorithms to show you actual algorithms? Spare a moment during your social media scrolling to learn a bit of data science. Here are suggestions for at-a-glance access to good ideas and tips on your favorite platforms.