2021 May

All (51) | Opinions (10) | Products, Services (5) | Tutorials, Overviews (36)

Make Pandas 3 Times Faster with PyPolars

Learn how to speed up your Pandas workflow using the PyPolars library.

By Satyam Kumar on May 31, 2021 in Pandas, Performance, Python
Top 4 Data Extraction Tools

Data extraction tools give you the boost you need for gathering information from a multitude of data sources. These four data extraction tools will help liberate you from manual data entry, understand complex documents, and simplify the data extraction process.

By Zoltan Bettenbuk on May 31, 2021 in Data Preparation, import.io, Web Scraping
Supercharge Your Machine Learning Experiments with PyCaret and Gradio

A step-by-step tutorial to develop and interact with machine learning pipelines rapidly.

By Moez Ali on May 31, 2021 in Deployment, Machine Learning, Pipeline, PyCaret, Python
State of Mathematical Optimization Report, 2021

Download your copy of Gurobi's first-ever "State of Mathematical Optimization Report," which is based on data from a survey of commercial mathematical optimization users. Get yours now.

By Gurobi on May 28, 2021 in Gurobi, Optimization, Report
Essential Math for Data Science: Basis and Change of Basis

In this article, you will learn what the basis of a vector space is, see that any vectors of the space are linear combinations of the basis vectors, and see how to change the basis using change of basis matrices.

By Hadrien Jean on May 28, 2021 in Data Science, Linear Algebra, Mathematics
4 Tips for Dataset Curation for NLP Projects

You have heard it before, and you will hear it again. It's all about the data. Curating the right data is also so important than just curating any data. When dealing with text data, many hard-earned lessons have been learned by others over the years, and here are four data curation tips that you should be sure to follow during your next NLP project.

By Paul Barba on May 28, 2021 in Data Preparation, Lexalytics, NLP, Project
Choosing the Right BI Tool for Your Business

Here are six questions to ask as you search for the best BI tool for your specific needs.

By Angshuman Guha on May 28, 2021 in BI, Business, Business Intelligence, Tools
Great New Resource for Natural Language Processing Research and Applications

The NLP Index is a brand new resource for NLP code discovery, combining and indexing more than 3,000 paper and code pairs at launch. If you are interested in NLP research and locating the code and papers needed to understand an implement the latest research, you should check it out.

By Matthew Mayo on May 27, 2021 in Datasets, NLP, Research
AI Books you should read in 2021

As of late, every year seems to be a "break-out" year for AI. So, it's time for you to get ready for the future in the age of automation. This collection of books will help you prepare for the many opportunities to come, many of which may not have yet been imagined.

By Przemek Chojecki on May 27, 2021 in AGI, AI, Books, Business, China, Singularity
Budgeting For Your AI Training Data: Consider These 3 Factors

Before you even plan to procure the data, one of the most important considerations in determining how much you should spend on your AI training data. In this article, we will give you insights to develop an effective budget for AI training data.

By Shaip on May 26, 2021 in AI, Data Preparation, Training Data
Topic Modeling with Streamlit

What does it take to create and deploy a topic modeling web application quickly? Read this post to see how the author uses Python NLP packages for topic modeling, Streamlit for the web application framework, and Streamlit Sharing for deployment.

By Bryan Patrick Wood on May 26, 2021 in Deployment, NLP, Python, spaCy, Streamlit, Text Analytics, Topic Modeling
The Rise of Vector Data

Embedding models convert raw data such as text, images, audio, logs, and videos into vector embeddings (“vectors”) to be used for predictions, comparisons, and other cognitive-like functions.

By Pinecone on May 25, 2021 in Distributed Representation, Pinecone, Representation
Where Did You Apply Analytics, Data Science, Machine Learning in 2020/2021?

Take part in the latest KDnuggets survey, and let us know where you have been applying Analytics, Data Science, Machine Learning in 2020/2021.

By Matthew Mayo on May 25, 2021 in Analytics, Data Science, Machine Learning, Poll, Survey
These Soft Skills Can Make or Break Your Data Science Career

In an industry long ruled by hard skills, the future career success of tomorrow’s data scientists might well depend on their ability to deploy a variety of soft skills into the workplace.

By Stefan Maraj on May 25, 2021 in Career Advice, Communication, Data Science, Data Science Skills, Decision Making
Write and train your own custom machine learning models using PyCaret

A step-by-step, beginner-friendly tutorial on how to write and train custom machine learning models in PyCaret.

By Moez Ali on May 25, 2021 in Machine Learning, Modeling, PyCaret, Python, Training
Data Validation in Machine Learning is Imperative, Not Optional

Before we reach model training in the pipeline, there are various components like data ingestion, data versioning, data validation, and data pre-processing that need to be executed. In this article, we will discuss data validation, why it is important, its challenges, and more.

By Aggarwal & Bose on May 24, 2021 in Data Quality, Machine Learning, Production, Validation
6 Business Trends Benefiting Data Scientists

Here are six business trends making data scientists even more in-demand.

By Devin Partida on May 21, 2021 in Business, Data Science, Data Scientist, Trends
How to pitch to VCs, explained: The Deck We Used to Raise Capital For Our Open-Source ELT Platform

Winning seed funding from venture capitalists is a daunting task, and the pitch is key. Learn how one effective slide deck resulted in a successful early funding round for an open-source start-up, Airbyte.

By John Lafleur on May 21, 2021 in Data Preparation, ELT, ETL, Startup, VC
Building RESTful APIs using Flask

Learn about using the lightweight web framework in Python from this article.

By Mahadev Easwar on May 21, 2021 in API, Flask, Python, RESTful API
DataOps: 5 things that you need to know

DataOps (Data Operations) has assumed a critical role in the age of big data to drive definitive impact on business outcomes. This process-oriented and agile methodology synergizes the components of DevOps and the capabilities of data engineers and data scientists to support data-focused workloads in enterprises. Here is a detailed look at DataOps.

By Sigmoid on May 20, 2021 in Data Engineer, Data Engineering, DataOps
Awesome list of datasets in 100+ categories

With an estimated 44 zettabytes of data in existence in our digital world today and approximately 2.5 quintillion bytes of new data generated daily, there is a lot of data out there you could tap into for your data science projects. It's pretty hard to curate through such a massive universe of data, but this collection is a great start. Here, you can find data from cancer genomes to UFO reports, as well as years of air quality data to 200,000 jokes. Dive into this ocean of data to explore as you learn how to apply data science techniques or leverage your expertise to discover something new.

By Etienne D. Noumen on May 20, 2021 in Big Data, Data Science, Datasets
How to Determine if Your Machine Learning Model is Overtrained

WeightWatcher is based on theoretical research (done injoint with UC Berkeley) into Why Deep Learning Works, based on our Theory of Heavy Tailed Self-Regularization (HT-SR). It uses ideas from Random Matrix Theory (RMT), Statistical Mechanics, and Strongly Correlated Systems.

By Charles Martin on May 20, 2021 in Learning, Modeling, Python, Training
A checklist to track your Data Science progress

Whether you are just starting out in data science or already a gainfully-employed professional, always learning more to advance through state-of-the-art techniques is part of the adventure. But, it can be challenging to track of your progress and keep an eye on what's next. Follow this checklist to help you scale your expertise from entry-level to advanced.

By Pascal Janetzky on May 19, 2021 in Advice, Beginners, Data Preparation, Data Science, Deep Learning
Animated Bar Chart Races in Python

A quick and step-by-step beginners project to create an animation bar graph for an amazing Covid dataset.

By Shelvi Garg on May 18, 2021 in COVID-19, Data Science, Data Visualization, Pandas, Python, Visualization
The Most In Demand Skills for Data Engineers in 2021

If you are preparing to make a career in data or are looking for opportunities to skill-up in your current data-centric role, then this analysis of in-demand skills for 2021, based on over 17,000 Data Engineer job postings, should offer you a good idea as to which programming languages and software tools are increasing and decreasing in importance.

By Terence Shin on May 18, 2021 in Apache Spark, AWS, Data Engineer, Data Science Skills, Data Scientist, Python, Skills, SQL
Easy MLOps with PyCaret + MLflow

A beginner-friendly, step-by-step tutorial on integrating MLOps in your Machine Learning experiments using PyCaret.

By Moez Ali on May 18, 2021 in Machine Learning, MLflow, MLOps, PyCaret, Python
Machine Translation in a Nutshell

Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California for a snapshot of machine translation. Dr. Farzindar also provided the original art for this article.

By Kevin Gray and Dr. Anna Farzin on May 17, 2021 in Machine Translation, Neural Networks, NLP, Text Analytics
Vaex: Pandas but 1000x faster

If you are working with big data, especially on your local machine, then learning the basics of Vaex, a Python library that enables the fast processing of large datasets, will provide you with a productive alternative to Pandas.

By Ahmad Anis on May 17, 2021 in Big Data, Data Preprocessing, Pandas, Scalability, Vaex
Binary Classification with Automated Machine Learning

Check out how to use the open-source MLJAR auto-ML to build accurate models faster.

By Derrick Mwiti on May 17, 2021 in Automated Machine Learning, AutoML, Classification, Open Source
Best Python Books for Beginners and Advanced Programmers

Let's take a look at nine of the best Python books for both beginners and advanced programmers, covering topics such as data science, machine learning, deep learning, NLP, and more.

By Claire D. Costa on May 14, 2021 in Analytics, Books, Data Science, Deep Learning, Machine Learning, Python
The NoSQL Know-It-All Compendium

Are you a NoSQL beginner, but want to become a NoSQL Know-It-All? Well, this is the place for you. Get up to speed on NoSQL technologies from a beginner's point of view, with this collection of related progressive posts on the subject. NoSQL? No problem!

By Alex Williams on May 13, 2021 in Beginners, Databases, NoSQL, SQL
6 side hustles for an aspiring data scientist

As an aspiring data scientist or an employed professional, many opportunities exist for you to offer your skills to a broader audience through side gigs. While the difficulty and risk vary, experiences from applying your data science practice to areas outside your immediate career path can increase your expertise while even increasing your bank account.

By Ahmad Bin Shafiq on May 13, 2021 in Career Advice, Data Scientist, Kaggle, Online Education, Youtube
The Explainable Boosting Machine

As accurate as gradient boosting, as interpretable as linear regression.

By Dr. Robert Kübler on May 13, 2021 in Decision Trees, Explainability, Gradient Boosting, Interpretability, Machine Learning
Super Charge Python with Pandas on GPUs Using Saturn Cloud

Saturn Cloud is a tool that allows you to have 10 hours of free GPU computing and 3 hours of Dask Cluster computing a month for free. In this tutorial, you will learn how to use these free resources to process data using Pandas on a GPU. The experiments show that Pandas is over 1,000,000% slower on a CPU as compared to running Pandas on a Dask cluster of GPUs.

By Tyler Folkman on May 12, 2021 in Cloud, GPU, Pandas, Python
How to become an online data science tutor

Your expertise in data science may be serving you well in your day job or you are on track to land that next dream position to do what you love. There are many others aspiring to attain your level of skill, and maybe you could consider helping them out... through a side gig of teaching.

By Iliya Valchanov on May 12, 2021 in Career Advice, Data Science Education, Online Education
Make Connections With SAS Live Web Learning

Through a year of uncertainty, the demand for analytics skills and the desire to continue skills development remained consistent. Take this opportunity to join SAS expert instructors and learn the latest skills in a Live Web class.

By SAS on May 11, 2021 in Analytics, Credit Risk, Data Science Education, Online Education, SAS
What Makes AI Trustworthy?

This blog pertains to the importance of why AI needs to be trustworthy as well as what makes it trustworthy. AI predictions/suggestions should not just be taken at face value, but rather delved into at a deeper level. We need to understand how an AI system makes its predictions to put our trust in it. Trust should not be built on prediction accuracy alone.

By Ronel Sylvester on May 11, 2021 in AI, Bias, Explainable AI, Trust
Similarity Metrics in NLP

This post covers the use of euclidean distance, dot product, and cosine similarity as NLP similarity metrics.

By James Briggs on May 10, 2021 in Metrics, NLP, Similarity
Essential Linear Algebra for Data Science and Machine Learning

Linear algebra is foundational in data science and machine learning. Beginners starting out along their learning journey in data science--as well as established practitioners--must develop a strong familiarity with the essential concepts in linear algebra.

By Benjamin Obi Tayo on May 10, 2021 in Data Science Education, Data Visualization, Linear Algebra, Linear Regression, Mathematics, Python
Ensemble Methods Explained in Plain English: Bagging

Understand the intuition behind bagging with examples in Python.

By Claudia Ng on May 10, 2021 in Algorithms, Bagging, Ensemble Methods, Python
Applying Python’s Explode Function to Pandas DataFrames

Read this applied Python method to solve the issue of accessing column by date/ year using the Pandas library and functions lambda(), list(), map() & explode().

By Michael Mosesov on May 7, 2021 in Data Analysis, Pandas, Programming, Python
A Comprehensive Guide to Ensemble Learning – Exactly What You Need to Know

This article covers ensemble learning methods, and exactly what you need to know in order to understand and implement them.

By Derrick Mwiti on May 6, 2021 in CatBoost, Ensemble Methods, Machine Learning, Python, random forests algorithm, scikit-learn, XGBoost
Feature stores – how to avoid feeling that every day is Groundhog Day

Feature stores stop the duplication of each task in the ML lifecycle. You can reuse features and pipelines for different models, monitor models consistently, and sidestep data leakage with this MLOps technology that everyone is talking about.

By Monte Zweben on May 6, 2021 in Data Preparation, Feature Store, Machine Learning, MLOps
What is Neural Search?

And how to get started with it with no prior experience in Machine Learning.

By Pradeep Sharma on May 6, 2021 in Neural Networks, NLP, Search, Search Engine
Rebuilding My 7 Python Projects

This is how I rebuilt My Python Projects: Data Science, Web Development & Android Apps.

By Kaustubh Gupta on May 5, 2021 in Data Science, Programming, Project, Python
What makes a winning entry in a Machine Learning competition?

So you want to show your grit in a Kaggle-style competition? Many, many others have the same idea, including domain experts and non-experts, and academic and corporate teams. What does it take for your bright ideas and skills to come out on top of thousands of competitors?

By Harald Carlens on May 5, 2021 in Challenge, Competition, Kaggle, Machine Learning, PyTorch, TensorFlow
How to get started managing data quality with SQL and scale

Silent data quality issues are the biggest problem facing data teams today, who are flying blind with no systems or processes in place to monitor and detect bad data before it has a downstream impact.

By Soda.io on May 4, 2021 in Data Preparation, Data Quality, Scalability, SQL
Deploy a Dockerized FastAPI App to Google Cloud Platform

A short guide to deploying a Dockerized Python app to Google Cloud Platform using Cloud Run and a SQL instance.

By Krueger & Franklin on May 4, 2021 in API, Deployment, Docker, FastAPI, Google Cloud
How To Generate Meaningful Sentences Using a T5 Transformer

Read this article to see how to develop a text generation API using the T5 transformer.

By Vatsal Saglani on May 3, 2021 in API, Hugging Face, Natural Language Generation, NLP, Python, Transformer
Charticulator: Microsoft Research open-sourced a game-changing Data Visualization platform

Creating grand charts and graphs from your data analysis is supported by many powerful tools. However, how to make these visualizations meaningful can remain a mystery. To address this challenge, Microsoft Research has quietly open-sourced a game-changing visualization platform.

By Josh Taylor on May 3, 2021 in Data Visualization, Microsoft
XGBoost Explained: DIY XGBoost Library in Less Than 200 Lines of Python

Understand how XGBoost work with a simple 200 lines codes that implement gradient boosting for decision trees.

By Guillaume Saupin on May 3, 2021 in Algorithms, Machine Learning, Python, XGBoost

2021 May

Latest Posts

Top Posts