- Managing Your Reusable Python Code as a Data Scientist, by Matthew Mayo - Feb 11, 2022.
Here are a few approaches that I have settled on for managing my own reusable Python code as a data scientist, presented from most to least general code use, and aimed at beginners.
Python
- From Scratch: Permutation Feature Importance for ML Interpretability, by Seth Billiau - Jun 30, 2021.
Use permutation feature importance to discover which features in your dataset are useful for prediction — implemented from scratch in Python.
Feature Selection, Interpretability, Machine Learning, Python
- Computational Complexity of Deep Learning: Solution Approaches, by Dr. Vijay Srinivas Agneeswaran - Jun 29, 2021.
Why has deep learning been so successful? What is the fundamental reason that deep learning can learn from big data? Why cannot traditional ML learn from the large data sets that are now available for different tasks as efficiently as deep learning can?
Complexity, Deep Learning, Neural Networks
- 10 Mistakes You Should Avoid as a Data Science Beginner, by Isabelle Flückiger - Jun 29, 2021.
Read this article on how to gain a competitive advantage in the data science job market.
Beginners, Career Advice, Data Science
-
Add A New Dimension To Your Photos Using Python, by Dylan Roy - Jun 28, 2021.
Read this to learn how to breathe new life into your photos with a 3D Ken Burns Effect.
Google Colab, Image Generation, Image Processing, Python
- How to Train a Joint Entities and Relation Extraction Classifier using BERT Transformer with spaCy 3, by Walid Amamou - Jun 28, 2021.
A step-by-step guide on how to train a relation extraction classifier using Transformer and spaCy3.
BERT, NLP, Python, spaCy, Text Analytics, Transformer
- Applied Language Technology: A No-Nonsense Approach, by Matthew Mayo - Jun 25, 2021.
Here is a free entry-level applied natural language processing course that can fit into any beginner's roadmap to understanding NLP. Check it out.
NLP, Python, Text Analytics
- High-Performance Deep Learning: How to train smaller, faster, and better models – Part 2, by Gaurav Menghani - Jun 25, 2021.
As your organization begins to consider building advanced deep learning models with efficiency in mind to improve the power delivered through your solutions, the software and hardware tools required for these implementations are foundational to achieving high-performance.
Deep Learning, Efficiency, Machine Learning, Scalability
- How to create an interactive 3D chart and share it easily with anyone, by Olga Chernytska - Jun 25, 2021.
This is a short tutorial on a great Plotly feature.
Data Visualization, Graph, Python
- 10 Python Code Snippets We Should All Know, by Pralabh Saxena - Jun 24, 2021.
Check out these Python code snippets and start using them to solve everyday problems.
Programming, Python
- Create and Deploy Dashboards using Voila and Saturn Cloud, by Dhrumil Patel - Jun 23, 2021.
Working with and training large datasets, maintaining them all in one place, and deploying them to production is a challenging job. In this article, we covered what Saturn Cloud is and how it can speed up your end-to-end pipeline, how to create dashboards using Voila and Python and publish them to production in just a few easy steps.
Analytics, Cloud, Dashboard, Data Science, Machine Learning, Python
- Data Careers in Demand: Crowd Solutions Architect Explained, by Daria Baidakova - Jun 23, 2021.
How can crowdsourcing support the applications of data teams at an organization? With an ever-increasing demand for more and higher quality data, a new role of the Crowd Solutions Architect (CSA) can leverage the potential of the masses to bring an advantage to a business's capability to deliver effective AI-driven solutions.
Careers, Crowdsourcing, Data Architect, Explained, Toloka
- Fine-Tuning Transformer Model for Invoice Recognition, by Walid Amamou - Jun 23, 2021.
The author presents a step-by-step guide from annotation to training.
Business Analytics, Image Classification, NLP, Python, Transformer
- Amazing Low-Code Machine Learning Capabilities with New Ludwig Update, by Jesus Rodriguez - Jun 22, 2021.
Integration with Ray, MLflow and TabNet are among the top features of this release.
Low-Code, Machine Learning, Open Source, Uber
- What is Segmentation?, by Kevin Gray - Jun 22, 2021.
Segmentation refers to many things, and is one of the most frequently used words in marketing This article looks at segmentation from a somewhat different-than-usual perspective.
Analytics, Marketing Analytics, Segmentation
- Overview of AutoNLP from Hugging Face with Example Project, by Kevin Vu - Jun 21, 2021.
AutoNLP is a beta project from Hugging Face that builds on the company’s work with its Transformer project. With AutoNLP you can get a working model with just a few simple terminal commands.
Automated Machine Learning, AutoML, Hugging Face, NLP
-

Pandas vs SQL: When Data Scientists Should Use Each Tool, by Matthew Przybyla - Jun 21, 2021.
Exploring data sets and understanding its structure, content, and relationships is a routine and core process for any Data Scientist. Multiple tools exist for performing such analysis, and we take a deep dive into the benefits and different approaches of two important tools, SQL and Pandas.
Data Scientist, Pandas, SQL
- How to troubleshoot memory problems in Python, by Freddy Boulton - Jun 21, 2021.
Memory problems are hard to diagnose and fix in Python. This post goes through a step-by-step process for how to pinpoint and fix memory leaks using popular open source python packages.
Programming, Python
- High Performance Deep Learning, Part 1, by Gaurav Menghani - Jun 18, 2021.
Advancing deep learning techniques continue to demonstrate incredible potential to deliver exciting new AI-enhanced software and systems. But, training the most powerful models is expensive--financially, computationally, and environmentally. Increasing the efficiency of such models will have profound impacts in many ways, so developing future models with this intension in mind will only help to further expand the reach, applicability, and value of what deep learning has to offer.
Deep Learning, Efficiency, History, Machine Learning
- Dashboards for Interpreting & Comparing Machine Learning Models, by Himanshu Sharma - Jun 17, 2021.
This article discusses using Interpret to create dashboards for machine learning models.
Interpretability, Machine Learning, Modeling, Python
- The Best Way to Learn Practical NLP?, by Matthew Mayo - Jun 16, 2021.
Hugging Face has just released a course on using its libraries and ecosystem for practical NLP, and it appears to be very comprehensive. Have a look for yourself.
Courses, Hugging Face, NLP
- An introduction to Explainable AI (XAI) and Explainable Boosting Machines (EBM), by Chaitanya Krishna Kasaraneni - Jun 16, 2021.
Understanding why your AI-based models make the decisions they do is crucial for deploying practical solutions in the real-world. Here, we review some techniques in the field of Explainable AI (XAI), why explainability is important, example models of explainable AI using LIME and SHAP, and demonstrate how Explainable Boosting Machines (EBMs) can make explainability even easier.
AI, Deep Learning, Explainability, Gradient Boosting, Interpretability, LIME, Machine Learning, SHAP
- A Graph-based Text Similarity Method with Named Entity Information in NLP, by Prakhar Mishra - Jun 16, 2021.
In this article, the author summarizes the 2017 paper "A Graph-based Text Similarity Measure That Employs Named Entity Information" as per their understanding. Better understand the concepts by reading along.
Graphs, NLP, Similarity, Text Analytics
- 7 Data Security Best Practices for 2021, by Devin Partida - Jun 15, 2021.
Here are seven data security best practices to adopt this year.
Cybersecurity, Data Science, Security
- Beginners Guide to Debugging TensorFlow Models, by Ahmad Anis - Jun 15, 2021.
If you are new to working with a deep learning framework, such as TensorFlow, there are a variety of typical errors beginners face when building and training models. Here, we explore and solve some of the most common errors to help you develop a better intuition for debugging in TensorFlow.
Beginners, Deep Learning, TensorFlow
- Facebook Launches One of the Toughest Reinforcement Learning Challenges in History, by Jesus Rodriguez - Jun 15, 2021.
The FAIR team just launched the NetHack Challenge as part of the upcoming NeurIPS 2021 competition. The objective is to test new RL ideas using a one of the toughest game environments in the world.
Challenge, Facebook, Reinforcement Learning
-
Get Interactive Plots Directly With Pandas, by Parul Pandey - Jun 14, 2021.
Telling a story with data is a core function for any Data Scientist, and creating data visualizations that are simultaneously illuminating and appealing can be challenging. This tutorial reviews how to create Plotly and Bokeh plots directly through Pandas plotting syntax, which will help you convert static visualizations into interactive counterparts -- and take your analysis to the next level.
Bokeh, Data Visualization, Pandas, Plotly, Python
- Building a Knowledge Graph for Job Search Using BERT, by Walid Amamou - Jun 14, 2021.
A guide on how to create knowledge graphs using NER and Relation Extraction.
BERT, Careers, Data Science Skills, Knowledge Graph, NLP, Python, Search, Transformer
-

Top 10 Data Science Projects for Beginners, by Natassha Selvaraj - Jun 11, 2021.
Check out these projects for ideas to strengthen your skills and build a portfolio that stands out.
Beginners, Data Science, Portfolio, Project
-
Five types of thinking for a high performing data scientist, by Anand Rao - Jun 11, 2021.
The way you think about a problem and the conceptual process you go through to find a solution may be guided by your personal skills or the type of problem at hand. Many mental models exist representing a variety of thinking patterns -- and as a Data Scientist, appreciating different approaches can help you more effectively model data in the business world and communicate your results to the decision-makers.
Advice, Data Science Skills
- 9 Deadly Sins of Machine Learning Dataset Selection, by Sandeep Uttamchandani - Jun 11, 2021.
Avoid endless pain in model debugging by focusing on datasets upfront.
Datasets, Machine Learning
- The Essential Guide to Transformers, the Key to Modern SOTA AI, by Matthew Mayo - Jun 10, 2021.
You likely know Transformers from their recent spate of success stories in natural language processing, computer vision, and other areas of artificial intelligence, but are familiar with all of the X-formers? More importantly, do you know the differences, and why you might use one over another?
AI, Computer Vision, Deep Learning, NLP, Transformer
- Feature Selection – All You Ever Wanted To Know, by Danny Butvinik - Jun 10, 2021.
Although your data set may contain a lot of information about many different features, selecting only the "best" of these to be considered by a machine learning model can mean the difference between a model that performs well--with better performance, higher accuracy, and more computational efficiency--and one that falls flat. The process of feature selection guides you toward working with only the data that may be the most meaningful, and to accomplish this, a variety of feature selection types, methodologies, and techniques exist for you to explore.
Feature Engineering, Feature Selection, Machine Learning
-

How to Generate Automated PDF Documents with Python, by Mohammad Khorasani - Jun 10, 2021.
Discover how to leverage automation to create dazzling PDF documents effortlessly.
Data Visualization, PDF, Programming, Python
- The 7 Best Open Source AI Libraries You May Not Have Heard Of, by Kevin Vu - Jun 9, 2021.
AI researchers today have many exciting options for working with specialized tools. Although starting original projects from scratch is often not necessary, knowing which existing library to leverage remains a challenge. This list of generally unknown yet awesome, open-source libraries offers an interesting collection to consider for state-of-the-art research that spans from automatic machine learning to differentiable quantum circuits.
AI, Hyperparameter, Julia, Open Source, Probability, Quantum Computing
- This Data Visualization is the First Step for Effective Feature Selection, by Benjamin Obi Tayo - Jun 8, 2021.
Understanding the most important features to use is crucial for developing a model that performs well. Knowing which features to consider requires experimentation, and proper visualization of your data can help clarify your initial selections. The scatter pairplot is a great place to start.
Data Visualization, Feature Selection, Statistics, Stocks
- The only Jupyter Notebooks extension you truly need, by Olga Chernytska - Jun 8, 2021.
Now you don’t need to restart the kernel after editing the code in your custom imports.
Deployment, Jupyter, Machine Learning, Python
- 5 Data Science Open-source Projects You Should Consider Contributing to, by Sara Metwalli - Jun 7, 2021.
As you prepare to interview for a position in data science or are looking to jump to the next level, now is the time to enhance your skills and your resume with by working on rea, open-source projects. Here, we suggest a great selection of projects you can contribute to and help build something awesome, so, all you need to do choose one and tackle it head on.
Caffe, Data Science, Data Science Skills, GitHub, Google, Machine Learning, Open Source
- How to Fine-Tune BERT Transformer with spaCy 3, by Walid Amamou - Jun 7, 2021.
A step-by-step guide on how to create a knowledge graph using NER and Relation Extraction.
BERT, Knowledge Graph, NLP, Python, spaCy, Transformer
- PyCaret 101: An introduction for beginners, by Moez Ali - Jun 7, 2021.
This article is a great overview of how to get started with PyCaret for all your machine learning projects.
Machine Learning, PyCaret, Python
-

5 Tasks To Automate With Python, by Dylan Roy - Jun 4, 2021.
Here are 5 tasks you can automate with Python, and how to do it.
Automation, Programming, Python
- Beyond Brainless AI with a Feature Store, by Jim Dowling - Jun 4, 2021.
AI-powered products that are limited to the data available within its application are like jellyfish: its autonomic system makes it functional, but it lacks a brain. However, you can evolve your models with data enriched "brains" through the help of a feature store.
AI, Data Engineering, Feature Store, Machine Learning
- 10 Deadly Sins of Machine Learning Model Training, by Sandeep Uttamchandani, Ph.D. - Jun 4, 2021.
These mistakes are easy to overlook but costly to redeem.
Machine Learning, Modeling, Training
- How a Data Scientist Should Communicate with Stakeholders, by Nate Rosidi - Jun 3, 2021.
Effective and collaborative communication with stakeholders is a skill that can help you survive in your role as a Data Scientist at your organization. Learn how to master this interaction, and you will perform your job better, see improved outcomes from your projects, and grow in your capabilities and career.
Advice, Communication, Data Science Skills, Data Scientist
- Machine Learning Model Interpretation, by Himanshu Sharma - Jun 2, 2021.
Read this overview of using Skater to build machine learning visualizations.
Explainability, Interpretability, Machine Learning, Python
- Stop (and Start) Hiring Data Scientists, by Ian Xiao - Jun 2, 2021.
Large companies are losing many data scientists to smaller companies, so what should executives and managers do? These three “stop & start” tactics can improve talent retention, and help define a new way of recruiting and working for the Data Science field.
Attrition, Career, Data Scientist, Hiring
-
How to Make Python Code Run Incredibly Fast, by Pralabh Saxena - Jun 2, 2021.
In this article, I have explained some tips and tricks to optimize and speed up Python code.
Optimization, Performance, Programming, Python
- How to Create and Deploy a Simple Sentiment Analysis App via API, by Matthew Mayo - Jun 1, 2021.
In this article we will create a simple sentiment analysis app using the HuggingFace Transformers library, and deploy it using FastAPI.
FastAPI, Hugging Face, NLP, Python, Sentiment Analysis, Transformer