- Make Pandas 3 Times Faster with PyPolars, by Satyam Kumar - May 31, 2021.
Learn how to speed up your Pandas workflow using the PyPolars library.
- Top 4 Data Extraction Tools, by Zoltan Bettenbuk - May 31, 2021.
Data extraction tools give you the boost you need for gathering information from a multitude of data sources. These four data extraction tools will help liberate you from manual data entry, understand complex documents, and simplify the data extraction process.
- Top Stories, May 24-30: A Guide On How To Become A Data Scientist (Step By Step Approach), by KDnuggets - May 31, 2021.
Also: Top Programming Languages and Their Uses; Data Scientist, Data Engineer & Other Data Careers, Explained; Vaex: Pandas but 1000x faster; Choosing the Right BI Tool for Your Business
- Supercharge Your Machine Learning Experiments with PyCaret and Gradio, by Moez Ali - May 31, 2021.
A step-by-step tutorial to develop and interact with machine learning pipelines rapidly.
- State of Mathematical Optimization Report, 2021, by Gurobi - May 28, 2021.
Download your copy of Gurobi's first-ever "State of Mathematical Optimization Report," which is based on data from a survey of commercial mathematical optimization users. Get yours now.
- Essential Math for Data Science: Basis and Change of Basis, by Hadrien Jean - May 28, 2021.
In this article, you will learn what the basis of a vector space is, see that any vectors of the space are linear combinations of the basis vectors, and see how to change the basis using change of basis matrices.
- 4 Tips for Dataset Curation for NLP Projects, by Paul Barba - May 28, 2021.
You have heard it before, and you will hear it again. It's all about the data. Curating the right data is also so important than just curating any data. When dealing with text data, many hard-earned lessons have been learned by others over the years, and here are four data curation tips that you should be sure to follow during your next NLP project.
- Choosing the Right BI Tool for Your Business, by Angshuman Guha - May 28, 2021.
Here are six questions to ask as you search for the best BI tool for your specific needs.
- AIRSIDE LIVE Is Where Big Data, Data Security and Data Governance Converge, by Okera - May 27, 2021.
Free virtual summit on June 3rd offers sessions from data industry leaders and practitioners on challenges and solutions in an ever-changing, data-driven landscape.
- Great New Resource for Natural Language Processing Research and Applications, by Matthew Mayo - May 27, 2021.
The NLP Index is a brand new resource for NLP code discovery, combining and indexing more than 3,000 paper and code pairs at launch. If you are interested in NLP research and locating the code and papers needed to understand an implement the latest research, you should check it out.
- AI Books you should read in 2021, by Przemek Chojecki - May 27, 2021.
As of late, every year seems to be a "break-out" year for AI. So, it's time for you to get ready for the future in the age of automation. This collection of books will help you prepare for the many opportunities to come, many of which may not have yet been imagined.
- Top Data and Analytics Trends, by Sigmoid - May 27, 2021.
Experts and enthusiasts have already started pondering over the data and analytics trends that are expected to take the center stage, going forward. The following is a list of top trends which will dominate the market this year.
- Budgeting For Your AI Training Data: Consider These 3 Factors, by Shaip - May 26, 2021.
Before you even plan to procure the data, one of the most important considerations in determining how much you should spend on your AI training data. In this article, we will give you insights to develop an effective budget for AI training data.
- Topic Modeling with Streamlit, by Bryan Patrick Wood - May 26, 2021.
What does it take to create and deploy a topic modeling web application quickly? Read this post to see how the author uses Python NLP packages for topic modeling, Streamlit for the web application framework, and Streamlit Sharing for deployment.
- The Rise of Vector Data, by Pinecone - May 25, 2021.
Embedding models convert raw data such as text, images, audio, logs, and videos into vector embeddings (“vectors”) to be used for predictions, comparisons, and other cognitive-like functions.
- Where Did You Apply Analytics, Data Science, Machine Learning in 2020/2021?, by Matthew Mayo - May 25, 2021.
Take part in the latest KDnuggets survey, and let us know where you have been applying Analytics, Data Science, Machine Learning in 2020/2021.
- These Soft Skills Can Make or Break Your Data Science Career, by Stefan Maraj - May 25, 2021.
In an industry long ruled by hard skills, the future career success of tomorrow’s data scientists might well depend on their ability to deploy a variety of soft skills into the workplace.
- Write and train your own custom machine learning models using PyCaret, by Moez Ali - May 25, 2021.
A step-by-step, beginner-friendly tutorial on how to write and train custom machine learning models in PyCaret.
- Top Stories, May 17-23: Data Scientist, Data Engineer & Other Data Careers, Explained, by KDnuggets - May 24, 2021.
Also: Vaex: Pandas but 1000x faster; A checklist to track your Data Science progress; How to Determine if Your Machine Learning Model is Overtrained; The Most In Demand Skills for Data Engineers in 2021
- How to Deal with Categorical Data for Machine Learning, by Shelvi Garg - May 24, 2021.
Check out this guide to implementing different types of encoding for categorical data, including a cheat sheet on when to use what type.
- Data Validation in Machine Learning is Imperative, Not Optional, by Aggarwal & Bose - May 24, 2021.
Before we reach model training in the pipeline, there are various components like data ingestion, data versioning, data validation, and data pre-processing that need to be executed. In this article, we will discuss data validation, why it is important, its challenges, and more.
- 6 Business Trends Benefiting Data Scientists, by Devin Partida - May 21, 2021.
Here are six business trends making data scientists even more in-demand.
- How to pitch to VCs, explained: The Deck We Used to Raise Capital For Our Open-Source ELT Platform, by John Lafleur - May 21, 2021.
Winning seed funding from venture capitalists is a daunting task, and the pitch is key. Learn how one effective slide deck resulted in a successful early funding round for an open-source start-up, Airbyte.
- Building RESTful APIs using Flask, by Mahadev Easwar - May 21, 2021.
Learn about using the lightweight web framework in Python from this article.
- DataOps: 5 things that you need to know, by Sigmoid - May 20, 2021.
DataOps (Data Operations) has assumed a critical role in the age of big data to drive definitive impact on business outcomes. This process-oriented and agile methodology synergizes the components of DevOps and the capabilities of data engineers and data scientists to support data-focused workloads in enterprises. Here is a detailed look at DataOps.
- Awesome list of datasets in 100+ categories, by Etienne D. Noumen - May 20, 2021.
With an estimated 44 zettabytes of data in existence in our digital world today and approximately 2.5 quintillion bytes of new data generated daily, there is a lot of data out there you could tap into for your data science projects. It's pretty hard to curate through such a massive universe of data, but this collection is a great start. Here, you can find data from cancer genomes to UFO reports, as well as years of air quality data to 200,000 jokes. Dive into this ocean of data to explore as you learn how to apply data science techniques or leverage your expertise to discover something new.
- How to Determine if Your Machine Learning Model is Overtrained, by Charles Martin - May 20, 2021.
WeightWatcher is based on theoretical research (done injoint with UC Berkeley) into Why Deep Learning Works, based on our Theory of Heavy Tailed Self-Regularization (HT-SR). It uses ideas from Random Matrix Theory (RMT), Statistical Mechanics, and Strongly Correlated Systems.
- DataOps Summit 2021 CFP Is Now Open!, by DataOps Summit - May 19, 2021.
Calling all Conductors of Chaos: Tell Us How You Tamed your Data at DataOps Summit 2021 CFP is open through May 31st
- Differentiable Programming from Scratch, by Guillaume Saupin - May 19, 2021.
In this article, we are going to explain what Differentiable Programming is by developing from scratch all the tools needed for this exciting new kind of programming.
- A checklist to track your Data Science progress, by Pascal Janetzky - May 19, 2021.
Whether you are just starting out in data science or already a gainfully-employed professional, always learning more to advance through state-of-the-art techniques is part of the adventure. But, it can be challenging to track of your progress and keep an eye on what's next. Follow this checklist to help you scale your expertise from entry-level to advanced.
- Data Practitioner Survey: Want to know what you’re worth?, by Informa - May 18, 2021.
Want to know what you’re worth? The AI Summit is compiling a 2021 Data Practitioner Salary Report. We would love your input if you are involved with machine learning in a business context, whether as a software architect, data scientist, engineer, developer, modeller, administrator, or analyst.
- Animated Bar Chart Races in Python, by Shelvi Garg - May 18, 2021.
A quick and step-by-step beginners project to create an animation bar graph for an amazing Covid dataset.
- The Most In Demand Skills for Data Engineers in 2021, by Terence Shin - May 18, 2021.
If you are preparing to make a career in data or are looking for opportunities to skill-up in your current data-centric role, then this analysis of in-demand skills for 2021, based on over 17,000 Data Engineer job postings, should offer you a good idea as to which programming languages and software tools are increasing and decreasing in importance.
- Easy MLOps with PyCaret + MLflow, by Moez Ali - May 18, 2021.
A beginner-friendly, step-by-step tutorial on integrating MLOps in your Machine Learning experiments using PyCaret.
- Machine Translation in a Nutshell, by Kevin Gray and Dr. Anna Farzin - May 17, 2021.
Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California for a snapshot of machine translation. Dr. Farzindar also provided the original art for this article.
- Top Stories, May 10-16: Essential Linear Algebra for Data Science and Machine Learning, by KDnuggets - May 17, 2021.
Also: Data Preparation in SQL, with Cheat Sheet!; Best Python Books for Beginners and Advanced Programmers; Similarity Metrics in NLP; The NoSQL Know-It-All Compendium
- Vaex: Pandas but 1000x faster, by Ahmad Anis - May 17, 2021.
If you are working with big data, especially on your local machine, then learning the basics of Vaex, a Python library that enables the fast processing of large datasets, will provide you with a productive alternative to Pandas.
- Binary Classification with Automated Machine Learning, by Derrick Mwiti - May 17, 2021.
Check out how to use the open-source MLJAR auto-ML to build accurate models faster.
- Best Python Books for Beginners and Advanced Programmers, by Claire D. Costa - May 14, 2021.
Let's take a look at nine of the best Python books for both beginners and advanced programmers, covering topics such as data science, machine learning, deep learning, NLP, and more.
- The next-generation of AutoML frameworks, by Aleksandra Plonska and Piotr Plonski - May 14, 2021.
AutoML frameworks are getting better every day, and can provide high-performing ML pipelines, unique data insights, and ML explanations. No longer black-boxes, these powerful tools offer self-documenting capabilities and native Python notebook support.
- DeepMind Wants to Reimagine One of the Most Important Algorithms in Machine Learning, by Jesus Rodriguez - May 14, 2021.
In one of the most important papers this year, DeepMind proposed a multi-agent structure to redefine PCA.
- The NoSQL Know-It-All Compendium, by Alex Williams - May 13, 2021.
Are you a NoSQL beginner, but want to become a NoSQL Know-It-All? Well, this is the place for you. Get up to speed on NoSQL technologies from a beginner's point of view, with this collection of related progressive posts on the subject. NoSQL? No problem!
- 6 side hustles for an aspiring data scientist, by Ahmad Bin Shafiq - May 13, 2021.
As an aspiring data scientist or an employed professional, many opportunities exist for you to offer your skills to a broader audience through side gigs. While the difficulty and risk vary, experiences from applying your data science practice to areas outside your immediate career path can increase your expertise while even increasing your bank account.
- The Explainable Boosting Machine, by Dr. Robert Kübler - May 13, 2021.
As accurate as gradient boosting, as interpretable as linear regression.
- Super Charge Python with Pandas on GPUs Using Saturn Cloud, by Tyler Folkman - May 12, 2021.
Saturn Cloud is a tool that allows you to have 10 hours of free GPU computing and 3 hours of Dask Cluster computing a month for free. In this tutorial, you will learn how to use these free resources to process data using Pandas on a GPU. The experiments show that Pandas is over 1,000,000% slower on a CPU as compared to running Pandas on a Dask cluster of GPUs.
- How to become an online data science tutor, by Iliya Valchanov - May 12, 2021.
Your expertise in data science may be serving you well in your day job or you are on track to land that next dream position to do what you love. There are many others aspiring to attain your level of skill, and maybe you could consider helping them out... through a side gig of teaching.
- Top April Stories: The Most In-Demand Skills for Data Scientists in 2021, by Gregory Piatetsky - May 11, 2021.
The Most In-Demand Skills for Data Scientists in 2021; Data Science Books You Should Start Reading in 2021; How to organize your data science project; Shapash: Making Machine Learning Models Understandable.
- Make Connections With SAS Live Web Learning, by SAS - May 11, 2021.
Through a year of uncertainty, the demand for analytics skills and the desire to continue skills development remained consistent. Take this opportunity to join SAS expert instructors and learn the latest skills in a Live Web class.
- Confidence Intervals for XGBoost, by Guillaume Saupin - May 11, 2021.
Read this article about building a regularized Quantile Regression objective.
- Must-have Chrome Extensions For Machine Learning Engineers And Data Scientists, by Himanshu Ragtah - May 11, 2021.
Browser extensions are a productivity secret weapon for hackers and developers. Many machine learning practitioners use Chrome, and this list features must-have Chrome extensions for machine learning engineers and data scientists that you should check out today.
- What Makes AI Trustworthy?, by Ronel Sylvester - May 11, 2021.
This blog pertains to the importance of why AI needs to be trustworthy as well as what makes it trustworthy. AI predictions/suggestions should not just be taken at face value, but rather delved into at a deeper level. We need to understand how an AI system makes its predictions to put our trust in it. Trust should not be built on prediction accuracy alone.
- Top Stories, May 3-9: Charticulator: Microsoft Research open-sourced a game-changing Data Visualization platform; Data Preparation in SQL, with Cheat Sheet!, by KDnuggets - May 10, 2021.
Also: Rebuilding My 7 Python Projects; Applying Python’s Explode Function to Pandas DataFrames; Data Scientist vs Machine Learning Engineer – what are their skills?; XGBoost Explained: DIY XGBoost Library in Less Than 200 Lines of Python
- Similarity Metrics in NLP, by James Briggs - May 10, 2021.
This post covers the use of euclidean distance, dot product, and cosine similarity as NLP similarity metrics.
- Essential Linear Algebra for Data Science and Machine Learning, by Benjamin Obi Tayo - May 10, 2021.
Linear algebra is foundational in data science and machine learning. Beginners starting out along their learning journey in data science--as well as established practitioners--must develop a strong familiarity with the essential concepts in linear algebra.
- Ensemble Methods Explained in Plain English: Bagging, by Claudia Ng - May 10, 2021.
Understand the intuition behind bagging with examples in Python.
- Applying Python’s Explode Function to Pandas DataFrames, by Michael Mosesov - May 7, 2021.
Read this applied Python method to solve the issue of accessing column by date/ year using the Pandas library and functions lambda(), list(), map() & explode().
- We Don’t Need Data Engineers, We Need Better Tools for Data Scientists, by Devin Petersohn - May 7, 2021.
In today's data science jobs landscape, a variety of roles are being filled from specialized engineering positions to the more generalized data scientist. However, is it possible that some of these job types are duplicative or misdirected, such as that of the Data Engineer, which might exist as we know it because of a lack of adequate tooling for Data Scientists?
- A Comprehensive Guide to Ensemble Learning – Exactly What You Need to Know, by Derrick Mwiti - May 6, 2021.
This article covers ensemble learning methods, and exactly what you need to know in order to understand and implement them.
- Feature stores – how to avoid feeling that every day is Groundhog Day, by Monte Zweben - May 6, 2021.
Feature stores stop the duplication of each task in the ML lifecycle. You can reuse features and pipelines for different models, monitor models consistently, and sidestep data leakage with this MLOps technology that everyone is talking about.
- What is Neural Search?, by Pradeep Sharma - May 6, 2021.
And how to get started with it with no prior experience in Machine Learning.
- Rebuilding My 7 Python Projects, by Kaustubh Gupta - May 5, 2021.
This is how I rebuilt My Python Projects: Data Science, Web Development & Android Apps.
- What makes a winning entry in a Machine Learning competition?, by Harald Carlens - May 5, 2021.
So you want to show your grit in a Kaggle-style competition? Many, many others have the same idea, including domain experts and non-experts, and academic and corporate teams. What does it take for your bright ideas and skills to come out on top of thousands of competitors?
- The Machine Learning Research Championed by the Biggest AI Labs in the World, by Jesus Rodriguez - May 5, 2021.
How Google, Microsoft, Facebook, DeepMind, OpenAI, Amazon and IBM think about the future of AI.
- How to get started managing data quality with SQL and scale, by Soda.io - May 4, 2021.
Silent data quality issues are the biggest problem facing data teams today, who are flying blind with no systems or processes in place to monitor and detect bad data before it has a downstream impact.
- Deploy a Dockerized FastAPI App to Google Cloud Platform, by Krueger & Franklin - May 4, 2021.
A short guide to deploying a Dockerized Python app to Google Cloud Platform using Cloud Run and a SQL instance.
- Disentangling AI, Machine Learning, and Deep Learning, by Kevin Vu - May 4, 2021.
The field of Artificial Intelligence is extremely broad and captures a winding history through the evolution of various sub-fields that experienced many ups and downs over the years. Appreciating AI within its historical contexts will enhance your communication with the public, colleagues, and potential hiring managers, as well as guide your thinking as you progress in the application and study of state-of-the-art techniques.
- A simple static visualization can often be the best approach, by Kai Wong - May 4, 2021.
How I overengineered a worse solution by making an interactive visualization.
- Top Stories, Apr 26 – May 2: Data Scientist vs Machine Learning Engineer – what are their skills?, by KDnuggets - May 3, 2021.
Also: Data Science Books You Should Start Reading in 2021; Data science is not about data – applying Dijkstra principle to data science; How to ace A/B Testing Data Science Interviews; Top 10 Must-Know Machine Learning Algorithms for Data Scientists – Part 1
- Cloud Based Web Scraping for Big Data Applications, by Octoparse - May 3, 2021.
As the need to store and access big data increases, web scraping and web crawling technologies are becoming more and more useful. Today, companies use web scraping technology for myriad reasons. Read on to find the uses of cloud-based web scraping for big data apps.
- How To Generate Meaningful Sentences Using a T5 Transformer, by Vatsal Saglani - May 3, 2021.
Read this article to see how to develop a text generation API using the T5 transformer.
- Charticulator: Microsoft Research open-sourced a game-changing Data Visualization platform, by Josh Taylor - May 3, 2021.
Creating grand charts and graphs from your data analysis is supported by many powerful tools. However, how to make these visualizations meaningful can remain a mystery. To address this challenge, Microsoft Research has quietly open-sourced a game-changing visualization platform.
- XGBoost Explained: DIY XGBoost Library in Less Than 200 Lines of Python, by Guillaume Saupin - May 3, 2021.
Understand how XGBoost work with a simple 200 lines codes that implement gradient boosting for decision trees.
- Hilarious Data Science Humor, by Yi Li - May 2, 2021.
Data scientists and developers share a goofy sense of humor. Here are some puns that we — data scientists/programmers— can definitely relate to.