2021 Jun

All (66) | Opinions (17) | Products, Services (7) | Tutorials, Overviews (42)

5 Tasks To Automate With Python

Here are 5 tasks you can automate with Python, and how to do it.

By Dylan Roy on Dec 27, 2022 in Python
Managing Your Reusable Python Code as a Data Scientist

Here are a few approaches that I have settled on for managing my own reusable Python code as a data scientist, presented from most to least general code use, and aimed at beginners.

By Matthew Mayo on Feb 11, 2022 in Python
Ethics, Fairness, and Bias in AI

As more AI-enhanced applications seep into our daily lives and expand their reach to larger swaths of populations around the world, we must clearly understand the vulnerabilities trained machine leaning models can exhibit based on the data used during development. Such issues can negatively impact select groups of people, so addressing the ethical decisions made by AI--possibly unknowingly--is important to the long-term fairness and success of this new technology.

By Aditya Aggarwal on Jun 30, 2021 in AI, Algorithms, Bias, Ethics
From Scratch: Permutation Feature Importance for ML Interpretability

Use permutation feature importance to discover which features in your dataset are useful for prediction — implemented from scratch in Python.

By Seth Billiau on Jun 30, 2021 in Feature Selection, Interpretability, Machine Learning, Python
Computational Complexity of Deep Learning: Solution Approaches

Why has deep learning been so successful? What is the fundamental reason that deep learning can learn from big data? Why cannot traditional ML learn from the large data sets that are now available for different tasks as efficiently as deep learning can?

By Dr. Vijay Srinivas Agneeswaran on Jun 29, 2021 in Complexity, Deep Learning, Neural Networks
Unleashing the Power of MLOps and DataOps in Data Science

Organizations trying to move forward with analytics and data science initiatives -- while floating in an ocean of data -- must enhance their overall approach and culture to embrace a foundation on DataOps and MLOps. Leveraging these operational frameworks are necessary to enable the data to generate real business value.

By Yash Mehta on Jun 29, 2021 in Best Practices, Data Science, DataOps, MLOps
10 Mistakes You Should Avoid as a Data Science Beginner

Read this article on how to gain a competitive advantage in the data science job market.

By Isabelle Flückiger on Jun 29, 2021 in Beginners, Career Advice, Data Science
Add A New Dimension To Your Photos Using Python

Read this to learn how to breathe new life into your photos with a 3D Ken Burns Effect.

By Dylan Roy on Jun 28, 2021 in Google Colab, Image Generation, Image Processing, Python
Data Scientists are from Mars and Software Developers are from Venus

Within the broad universe of IT in the business world, the approaches for deploying solutions by traditional software engineers and trendy, new data scientists couldn't be more different. However, appreciating these differences are incredibly important because great business value can be gained by integrating both worlds of development into driving more efficiency and effectiveness into an organization.

By Anand Rao on Jun 28, 2021 in Data Scientist, Software Developer
How to Train a Joint Entities and Relation Extraction Classifier using BERT Transformer with spaCy 3

A step-by-step guide on how to train a relation extraction classifier using Transformer and spaCy3.

By Walid Amamou on Jun 28, 2021 in BERT, NLP, Python, spaCy, Text Analytics, Transformer
High-Performance Deep Learning: How to train smaller, faster, and better models – Part 2

As your organization begins to consider building advanced deep learning models with efficiency in mind to improve the power delivered through your solutions, the software and hardware tools required for these implementations are foundational to achieving high-performance.

By Gaurav Menghani on Jun 25, 2021 in Deep Learning, Efficiency, Machine Learning, Scalability
How to create an interactive 3D chart and share it easily with anyone

This is a short tutorial on a great Plotly feature.

By Olga Chernytska on Jun 25, 2021 in Data Visualization, Graph, Python
What will the demand for Data Scientists be in 10 years? Will Data Scientists be extinct?

Participate in the latest KDnuggets survey and share your opinion: what does the next decade have in store for data scientist demand?

By Matthew Mayo on Jun 24, 2021 in Data Science, Data Scientist, Poll, Survey, Trends
In-Warehouse Machine Learning and the Modern Data Science Stack

As your organization matures its data science portfolio and capabilities, establishing a modern data stack is vital to enabling such growth. Here, we overview various in-data warehouse machine learning services, and discuss each of their benefits and requirements.

By Nick Acosta on Jun 24, 2021 in Amazon Redshift, Analytics, BigQuery, Cloud, Data Science, Data Warehouse, Machine Learning, Modern Data Stack
10 Python Code Snippets We Should All Know

Check out these Python code snippets and start using them to solve everyday problems.

By Pralabh Saxena on Jun 24, 2021 in Programming, Python
Workflow Orchestration with Prefect and Coiled

Coiled helps data scientists use Python for ambitious problems, scaling to the cloud for computing power, ease, and speed—all tuned for the needs of teams and enterprises. In this demo example, see how to spin up a Coiled cluster to execute Prefect jobs during runtime.

By Coiled.io on Jun 23, 2021 in Coiled.io, Modern Data Stack, Orchestration, Prefect, Python, Workflow
Create and Deploy Dashboards using Voila and Saturn Cloud

Working with and training large datasets, maintaining them all in one place, and deploying them to production is a challenging job. In this article, we covered what Saturn Cloud is and how it can speed up your end-to-end pipeline, how to create dashboards using Voila and Python and publish them to production in just a few easy steps.

By Dhrumil Patel on Jun 23, 2021 in Analytics, Cloud, Dashboard, Data Science, Machine Learning, Python
Data Careers in Demand: Crowd Solutions Architect Explained

How can crowdsourcing support the applications of data teams at an organization? With an ever-increasing demand for more and higher quality data, a new role of the Crowd Solutions Architect (CSA) can leverage the potential of the masses to bring an advantage to a business's capability to deliver effective AI-driven solutions.

By Daria Baidakova on Jun 23, 2021 in Careers, Crowdsourcing, Data Architect, Explained, Toloka
Fine-Tuning Transformer Model for Invoice Recognition

The author presents a step-by-step guide from annotation to training.

By Walid Amamou on Jun 23, 2021 in Business Analytics, Image Classification, NLP, Python, Transformer
The Word “WORD” Has 13 Meanings

Thoughts around Knowledge Graphs, the semantic nature of language, and the two main types of word ambiguity.

By Expert.ai on Jun 22, 2021 in Expert.ai, Knowledge Graph, NLP, Text Analytics
Amazing Low-Code Machine Learning Capabilities with New Ludwig Update

Integration with Ray, MLflow and TabNet are among the top features of this release.

By Jesus Rodriguez on Jun 22, 2021 in Low-Code, Machine Learning, Open Source, Uber
Analytics Engineering Everywhere

Many new roles have appeared in the data world ever since the rise of the Data Scientist took the spotlight several years ago. Now, there is a new core player ready to take center stage, and we may see in five years, nearly every organization will have an Analytics Engineering team.

By Jason Ganz on Jun 22, 2021 in Analytics, Analytics Engineering, Data Engineering, dbt
What is Segmentation?

Segmentation refers to many things, and is one of the most frequently used words in marketing This article looks at segmentation from a somewhat different-than-usual perspective.

By Kevin Gray on Jun 22, 2021 in Analytics, Marketing Analytics, Segmentation
Overview of AutoNLP from Hugging Face with Example Project

AutoNLP is a beta project from Hugging Face that builds on the company’s work with its Transformer project. With AutoNLP you can get a working model with just a few simple terminal commands.

By Kevin Vu on Jun 21, 2021 in Automated Machine Learning, AutoML, Hugging Face, NLP
Pandas vs SQL: When Data Scientists Should Use Each Tool

Exploring data sets and understanding its structure, content, and relationships is a routine and core process for any Data Scientist. Multiple tools exist for performing such analysis, and we take a deep dive into the benefits and different approaches of two important tools, SQL and Pandas.

By Matthew Przybyla on Jun 21, 2021 in Data Scientist, Pandas, SQL
How to troubleshoot memory problems in Python

Memory problems are hard to diagnose and fix in Python. This post goes through a step-by-step process for how to pinpoint and fix memory leaks using popular open source python packages.

By Freddy Boulton on Jun 21, 2021 in Programming, Python
Major changes: Where Analytics, Data Science, Machine Learning were applied in 2020/21

Our latest poll shows major change in where AI, Data Science, Machine Learning are being applied, with decline in interest in traditional fields like CRM/Consumer Analytics, and growth in applications to Computer Vision, COVID, Agriculture, and Education.

By Gregory Piatetsky on Jun 18, 2021 in Agriculture, Computer Vision, Consumer Analytics, Education, Finance, Industry, Poll
High Performance Deep Learning, Part 1

Advancing deep learning techniques continue to demonstrate incredible potential to deliver exciting new AI-enhanced software and systems. But, training the most powerful models is expensive--financially, computationally, and environmentally. Increasing the efficiency of such models will have profound impacts in many ways, so developing future models with this intension in mind will only help to further expand the reach, applicability, and value of what deep learning has to offer.

By Gaurav Menghani on Jun 18, 2021 in Deep Learning, Efficiency, History, Machine Learning
Data Science is Not Becoming Extinct in 10 Years, Your Skills Might

4 reasons why data science is here to stay and what you need to do to ensure that your skillset stays in demand.

By Ahmar Shah, PhD on Jun 18, 2021 in Career Advice, Data Science, Data Science Skills, Data Scientist
How to Land a Data Analytics Job in 6 Months

Go from zero to hero in under six months. Data science has a very high barrier of entry. It is a very competitive field that everybody from different educational backgrounds are looking to get into. Here is useful advice on how to proceed.

By Natassha Selvaraj on Jun 17, 2021 in Career Advice, Careers, Data Analyst, Data Analytics
Data storytelling: brains are built for visuals, but hearts turn on stories

Today, we need much more than just numbers about our organization to understand, gain insights, and take relevant actions. While visualizations of the data are important, making an emotional connection with the stories behind the data is key. If you want to sell a story, send a missile to the heart.

By Hrvoje Smolic on Jun 17, 2021 in Business Analytics, Communication, Data Visualization, Storytelling
How a Polytechnic Helps You Make the Tech-Business Connection

WPI welcomes professionals of all levels to its 100% online MS in Business Analytics — no GRE or GMAT required. Get started here.

By Worcester Polytechnic Institute on Jun 16, 2021 in Business, MS in Business Analytics, Online Education, WPI
The Best Way to Learn Practical NLP?

Hugging Face has just released a course on using its libraries and ecosystem for practical NLP, and it appears to be very comprehensive. Have a look for yourself.

By Matthew Mayo on Jun 16, 2021 in Courses, Hugging Face, NLP
An introduction to Explainable AI (XAI) and Explainable Boosting Machines (EBM)

Understanding why your AI-based models make the decisions they do is crucial for deploying practical solutions in the real-world. Here, we review some techniques in the field of Explainable AI (XAI), why explainability is important, example models of explainable AI using LIME and SHAP, and demonstrate how Explainable Boosting Machines (EBMs) can make explainability even easier.

By Chaitanya Krishna Kasaraneni on Jun 16, 2021 in AI, Deep Learning, Explainability, Gradient Boosting, Interpretability, LIME, Machine Learning, SHAP
A Graph-based Text Similarity Method with Named Entity Information in NLP

In this article, the author summarizes the 2017 paper "A Graph-based Text Similarity Measure That Employs Named Entity Information" as per their understanding. Better understand the concepts by reading along.

By Prakhar Mishra on Jun 16, 2021 in Graphs, NLP, Similarity, Text Analytics
The Data Matters: Choosing the right data to analyze can make or break your analysis

We started Nomad Data to help data scientists and business analysts quickly find the right commercial datasets to match their specific use case. We catalog use cases of data and use machine learning and AI to match analysis goals with datasets.

By Nomad Data on Jun 15, 2021 in Consumer Analytics, Datasets, Geospatial
7 Data Security Best Practices for 2021

Here are seven data security best practices to adopt this year.

By Devin Partida on Jun 15, 2021 in Cybersecurity, Data Science, Security
Beginners Guide to Debugging TensorFlow Models

If you are new to working with a deep learning framework, such as TensorFlow, there are a variety of typical errors beginners face when building and training models. Here, we explore and solve some of the most common errors to help you develop a better intuition for debugging in TensorFlow.

By Ahmad Anis on Jun 15, 2021 in Beginners, Deep Learning, TensorFlow
Facebook Launches One of the Toughest Reinforcement Learning Challenges in History

The FAIR team just launched the NetHack Challenge as part of the upcoming NeurIPS 2021 competition. The objective is to test new RL ideas using a one of the toughest game environments in the world.

By Jesus Rodriguez on Jun 15, 2021 in Challenge, Facebook, Reinforcement Learning
Data Scientists Will be Extinct in 10 Years

And why it’s not a bad thing.

By Mikhail Mew on Jun 14, 2021 in Career Advice, Data Science, Data Science Skills, Data Scientist
Get Interactive Plots Directly With Pandas

Telling a story with data is a core function for any Data Scientist, and creating data visualizations that are simultaneously illuminating and appealing can be challenging. This tutorial reviews how to create Plotly and Bokeh plots directly through Pandas plotting syntax, which will help you convert static visualizations into interactive counterparts -- and take your analysis to the next level.

By Parul Pandey on Jun 14, 2021 in Bokeh, Data Visualization, Pandas, Plotly, Python
Building a Knowledge Graph for Job Search Using BERT

A guide on how to create knowledge graphs using NER and Relation Extraction.

By Walid Amamou on Jun 14, 2021 in BERT, Careers, Data Science Skills, Knowledge Graph, NLP, Python, Search, Transformer
Top 10 Data Science Projects for Beginners

Check out these projects for ideas to strengthen your skills and build a portfolio that stands out.

By Natassha Selvaraj on Jun 11, 2021 in Beginners, Data Science, Portfolio, Project
Five types of thinking for a high performing data scientist

The way you think about a problem and the conceptual process you go through to find a solution may be guided by your personal skills or the type of problem at hand. Many mental models exist representing a variety of thinking patterns -- and as a Data Scientist, appreciating different approaches can help you more effectively model data in the business world and communicate your results to the decision-makers.

By Anand Rao on Jun 11, 2021 in Advice, Data Science Skills
The Essential Guide to Transformers, the Key to Modern SOTA AI

You likely know Transformers from their recent spate of success stories in natural language processing, computer vision, and other areas of artificial intelligence, but are familiar with all of the X-formers? More importantly, do you know the differences, and why you might use one over another?

By Matthew Mayo on Jun 10, 2021 in AI, Computer Vision, Deep Learning, NLP, Transformer
Feature Selection – All You Ever Wanted To Know

Although your data set may contain a lot of information about many different features, selecting only the "best" of these to be considered by a machine learning model can mean the difference between a model that performs well--with better performance, higher accuracy, and more computational efficiency--and one that falls flat. The process of feature selection guides you toward working with only the data that may be the most meaningful, and to accomplish this, a variety of feature selection types, methodologies, and techniques exist for you to explore.

By Danny Butvinik on Jun 10, 2021 in Feature Engineering, Feature Selection, Machine Learning
How to Generate Automated PDF Documents with Python

Discover how to leverage automation to create dazzling PDF documents effortlessly.

By Mohammad Khorasani on Jun 10, 2021 in Data Visualization, PDF, Programming, Python
How to speed up a Deep Learning Language model by almost 50X at half the cost

In this blog post, we show how to accelerate fine-tuning the ALBERT language model while also reducing costs by using Determined’s built-in support for distributed training with AWS spot instances.

By Determined AI on Jun 9, 2021 in AWS, Deep Learning, Distributed Computing, Hugging Face, NLP
Data Scientists, You Need to Know How to Code

You need to know how to code — and not just code, but write good code.

By Tyler Folkman on Jun 9, 2021 in Career Advice, Data Science, Data Scientist, Programming
The 7 Best Open Source AI Libraries You May Not Have Heard Of

AI researchers today have many exciting options for working with specialized tools. Although starting original projects from scratch is often not necessary, knowing which existing library to leverage remains a challenge. This list of generally unknown yet awesome, open-source libraries offers an interesting collection to consider for state-of-the-art research that spans from automatic machine learning to differentiable quantum circuits.

By Kevin Vu on Jun 9, 2021 in AI, Hyperparameter, Julia, Open Source, Probability, Quantum Computing
How a Single Mistake Wasted 3 Years of My Data Science Journey

Self-paced courses are just sleeping pills; Industry experts are the right choice.

By Pranjal Saxena on Jun 9, 2021 in Courses, Data Science, Experts, Mistakes
SAS® Visual Data Science Decisioning powered by SAS® Viya®: Free Trial

SAS® Visual Data Science Decisioning provides the ultimate analytics experience. Start your free trial and get access to the latest in data visualization, machine learning, forecasting, model deployment and more.

By SAS on Jun 8, 2021 in Analytics, Data Science, Data Visualization, Decision Support, SAS, Viya
This Data Visualization is the First Step for Effective Feature Selection

Understanding the most important features to use is crucial for developing a model that performs well. Knowing which features to consider requires experimentation, and proper visualization of your data can help clarify your initial selections. The scatter pairplot is a great place to start.

By Benjamin Obi Tayo on Jun 8, 2021 in Data Visualization, Feature Selection, Statistics, Stocks
The only Jupyter Notebooks extension you truly need

Now you don’t need to restart the kernel after editing the code in your custom imports.

By Olga Chernytska on Jun 8, 2021 in Deployment, Jupyter, Machine Learning, Python
5 Tips for Picking an Edge AI Platform

Edge Analytics isn’t just coding and tools. The different environment outside the datacenter or cloud means a purpose built platform is the best way to deliver consistent results. We discuss 5 different considerations for an edge platform to support your training and deployment.

By Erik Ottem-Cachengo on Jun 8, 2021 in AI, Analytics, Platform
5 Data Science Open-source Projects You Should Consider Contributing to

As you prepare to interview for a position in data science or are looking to jump to the next level, now is the time to enhance your skills and your resume with by working on rea, open-source projects. Here, we suggest a great selection of projects you can contribute to and help build something awesome, so, all you need to do choose one and tackle it head on.

By Sara Metwalli on Jun 7, 2021 in Caffe, Data Science, Data Science Skills, GitHub, Google, Machine Learning, Open Source
How to Fine-Tune BERT Transformer with spaCy 3

A step-by-step guide on how to create a knowledge graph using NER and Relation Extraction.

By Walid Amamou on Jun 7, 2021 in BERT, Knowledge Graph, NLP, Python, spaCy, Transformer
PyCaret 101: An introduction for beginners

This article is a great overview of how to get started with PyCaret for all your machine learning projects.

By Moez Ali on Jun 7, 2021 in Machine Learning, PyCaret, Python
BigQuery vs Snowflake: A Comparison of Data Warehouse Giants

In this article we are going to compare the two topmost data warehouses: BigQuery and Snowflake.

By Anji Velagana on Jun 3, 2021 in BigQuery, Data Warehouse, Snowflake
How a Data Scientist Should Communicate with Stakeholders

Effective and collaborative communication with stakeholders is a skill that can help you survive in your role as a Data Scientist at your organization. Learn how to master this interaction, and you will perform your job better, see improved outcomes from your projects, and grow in your capabilities and career.

By Nate Rosidi on Jun 3, 2021 in Advice, Communication, Data Science Skills, Data Scientist
Will There Be a Shortage of Data Science Jobs in the Next 5 Years?

The data science workflow is getting automated day by day.

By Pranjal Saxena on Jun 3, 2021 in Automation, Career Advice, Data Science, Data Scientist
Similarity Search: Euclid of Alexandria goes shoe shopping

Many applications can be improved with similarity search. Similarity search can provide more relevant results and therefore improve business outcomes such as conversion rates, engagement rates, detected threats, data quality, and customer satisfaction.

By Pinecone on Jun 2, 2021 in Neural Networks, Pinecone, Recommender Systems, Search
Machine Learning Model Interpretation

Read this overview of using Skater to build machine learning visualizations.

By Himanshu Sharma on Jun 2, 2021 in Explainability, Interpretability, Machine Learning, Python
Stop (and Start) Hiring Data Scientists

Large companies are losing many data scientists to smaller companies, so what should executives and managers do? These three “stop & start” tactics can improve talent retention, and help define a new way of recruiting and working for the Data Science field.

By Ian Xiao on Jun 2, 2021 in Attrition, Career, Data Scientist, Hiring
How to Create and Deploy a Simple Sentiment Analysis App via API

In this article we will create a simple sentiment analysis app using the HuggingFace Transformers library, and deploy it using FastAPI.

By Matthew Mayo on Jun 1, 2021 in FastAPI, Hugging Face, NLP, Python, Sentiment Analysis, Transformer
How I Doubled My Income with Data Science and Machine Learning

Many career opportunities exist in the ever-expanding domain of data. Finding your place -- and finding your salary -- is largely up to your dedication, focus, and drive to learn. If you are an aspiring Data Scientist or have already started your professional journey, there are multiple strategies for maximizing your earning potential.

By Terence Shin on Jun 1, 2021 in Career Advice, Data Science, Data Science Skills, Machine Learning, Salary

2021 Jun

Latest Posts

Top Posts