- NLP Insights for the Penguin Café Orchestra, by Expert.ai - Aug 31, 2021.
We give an example of how to use Expert.ai and Python to investigate favorite music albums.
Expert.ai, Music, NLP, Python
- CSV Files for Storage? No Thanks. There’s a Better Option, by Dario Radečić - Aug 31, 2021.
Saving data to CSV’s is costing you both money and disk space. It’s time to end it.
Data Management, Pandas, Parquet, Python
- Multilabel Document Categorization, step by step example, by Saurabh Sharma - Aug 31, 2021.
This detailed guide explores an unsupervised and supervised learning two-stage approach with LDA and BERT to develop a domain-specific document categorizer on unlabeled documents.
BERT, Data Labeling, Document Classification, LDA, NLP, Topic Modeling
- A Python Data Processing Script Template, by Matthew Mayo - Aug 31, 2021.
Here's a skeleton general purpose template for getting a Python command line script fleshed out as quickly as possible.
Programming, Python
- Top Stories, Aug 23-29: Automate Microsoft Excel and Word Using Python, by KDnuggets - Aug 30, 2021.
Also Django's 9 Most Common Applications; Learning Data Science and Machine Learning: First Steps after the Roadmap; The Significance of Data-centric AI
Top stories
- Beacon North America,
the latest and greatest in data analytics, Sep 14-15, by Google - Aug 30, 2021.
On Sep 14-15, Looker (Google Cloud) will be hosting BEACON, bi-annual data thought leadership virtual event series - find original content at the forefront of data, analytics trends, future predictions and best practices. Sign up now - tickets are free.
Google Cloud, Looker, Meetings
- Introducing Packed BERT for 2x Training Speed-up in Natural Language Processing, by Krell & Kosec - Aug 30, 2021.
Check out this new BERT packing algorithm for more efficient training.
BERT, NLP, Python, Training
- Data Science Project Infrastructure: How To Create It, by Nate Rosidi - Aug 30, 2021.
The intension for most data science projects is to build something that people use. Creating something purposeful requires a solid infrastructure and processes that keeps problem-solving front-and-center for your audience.
Data Science, Infrastructure, Project
-
The Top Industries Hiring Data Scientists in 2021, by Devin Partida - Aug 30, 2021.
People realize that effective uses of data can increase competitiveness, even in a challenging marketplace. Here are six industries hiring data scientists now that will likely continue doing so for the foreseeable future.
Career Advice, Finance, Insurance, Life Science, Telecom
- 3 Data Acquisition, Annotation, and Augmentation Tools, by Matthew Mayo - Aug 27, 2021.
Check out these 3 projects found around GitHub that can help with your data acquisition, annotation, and augmentation tasks.
Computer Vision, Data Annotation, Data Labeling, Datasets, GitHub, NLP, Synthetic Data
- How causal inference lifts augmented analytics beyond flatland, by Michael Klaput - Aug 27, 2021.
In our quest to better understand and predict business outcomes, traditional predictive modeling tends to fall flat. However, causal inference techniques along with business analytics approaches can unravel what truly changes your KPIs.
Analytics, Causality, Data Science, Python, Regression
- Automated Data Labeling with Machine Learning, by Watchful - Aug 26, 2021.
Labeling training data is the one step in the data pipeline that has resisted automation. It’s time to change that.
Data Labeling, Data Preparation, Machine Learning
- Coding Ethics for AI & AIOps: Designing Responsible AI Systems, by Manisha Singh - Aug 26, 2021.
AI ops has taken Human machine collaboration to the next level where humans and machines are not just coexisting but are collaborating and working together like team members.
AI, Bias, DevOps, Ethics, ModelOps, Responsible AI
- 11 Best Data Science Education Platforms, by Zulie Rane - Aug 26, 2021.
We cover 11 best Data Science Education platforms for 11 different use cases, ranging from specific languages to hands-on learners, to the best free option.
Data Science Education, Data Scientist, Online Education, Programming
- The Most Important Tool for Data Engineers, by Leo Godin - Aug 26, 2021.
And it has nothing to do with Python or SQL
Career Advice, Data Engineer, Data Engineering
- Florida Hacks with IBM, by BeMyApp - Aug 25, 2021.
Join the Florida Hacks with IBM virtual hackathon and create a project to tackle sustainability challenges. IBM will provide mentorship and data sets to help bring your ideas to life.
Florida, Hackathon, IBM
- 15 Python Snippets to Optimize your Data Science Pipeline, by Lucas Soares - Aug 25, 2021.
Quick Python solutions to help your data science cycle.
Data Science, Optimization, Pipeline, Python
- What is Noise?, by Vasant Dhar - Aug 25, 2021.
We might have a reasonable sense for what "noise" is as some statically random phenomena that occurs in Nature. But, how can this same characteristic be defined--and understood--within the context of making judgements, such as in human behavior, corporate decision-making, medicine, the law, and AI systems?
Bias, Book, Daniel Kahneman, Statistics, Variance, Vasant Dhar
- Essential Features of An Efficient Data Integration Solution, by Rabia Hatim - Aug 24, 2021.
This blog highlights the essential features of a data integration solution that help an organization generate consistent and accurate data to keep the business running smoothly.
Big Data, Data Analytics, Data Integration, Data Processing
-
Learning Data Science and Machine Learning: First Steps After The Roadmap, by Harshit Tyagi - Aug 24, 2021.
Just getting into learning data science may seem as daunting as (if not more than) trying to land your first job in the field. With so many options and resources online and in traditional academia to consider, these pre-requisites and pre-work are recommended before diving deep into data science and AI/ML.
Data Science, Machine Learning, Mathematics, Python, Roadmap, Statistics
- Top Stories, Aug 16-22: The Difference Between Data Scientists and ML Engineers; Prefect: How to Write and Schedule Your First ETL Pipeline with Python, by KDnuggets - Aug 23, 2021.
Also: Open Source Datasets for Computer Vision; Prefect: How to Write and Schedule Your First ETL Pipeline with Python; Most Common Data Science Interview Questions and Answers; How to Select an Initial Model for your Data Science Problem.
Top stories
- Jurassic-1 Language Models and AI21 Studio, by AI21 - Aug 23, 2021.
AI21 Labs’ new developer platform offers instant access to our 178B-parameter language model, to help you build sophisticated text-based AI applications at scale.
AI, GPT-3, NLP
-
Django’s 9 Most Common Applications, by Aakash Bijwe - Aug 23, 2021.
Django is a Python web application framework enjoying widespread adoption in the data science community. But what else can you use Django for? Read this article for 9 use cases where you can put Django to work.
Django, Programming, Python
- 7 reasons you should get a formal degree in Data Science, by Purvanshi Mehta - Aug 23, 2021.
So many options are now available online to learn in the field of data science. There are several factors to consider to determine if these options or a traditional degree from an academic institution is the best approach for your personal learning style and career aspirations.
Data Science Education
- 5 Things That Make My Job as a Data Scientist Easier, by Shree Vandana - Aug 23, 2021.
After working as a Data Scientist for a year, I am here to share some things I learnt along the way that I feel are helpful and have increased my efficiency. Hopefully some of these tips can help you in your journey :)
Data Science, Data Scientist, Metrics, Pandas, Plotly, Python, Time Series, Visualization
- Stack Overflow Survey Data Science Highlights, by Matthew Mayo - Aug 20, 2021.
The results of the 2021 Stack Overflow Developer Survey were recently released, which is a fascinating snapshot of today's developers and the tools they are using. Have a look at some selections from the report, particularly those which may be of interest to data professionals.
Cloud, Data Science, Databases, Developers, Programming, Programming Languages, StackOverflow, Survey
- Demystifying AI: The prejudices of Artificial Intelligence (and human beings), by Manjesh Gupta - Aug 20, 2021.
AI models are necessarily trained on historical data from the real-world--data that is generated from the daily goings on of society. If social-based biases are inherent in the training data, then will the AI predictions highlight these same biases? If so, what should we do (or not do) about making AI fair?
AI, Bias, Ethics, Humans vs Machines
- How to Select an Initial Model for your Data Science Problem, by Zachary Warnes - Aug 20, 2021.
Save yourself some time and headaches and start simple.
Data Science, Linear Regression, Logistic Regression, Modeling
- Speeding up data understanding by interactive exploration, by Visplore - Aug 19, 2021.
A key success factor of data science projects is to understand the data well. This blog explains why coding can be inefficient for this and how you can improve.
Communication, Data Exploration, Data Visualization, Visplore
- 5 Data Science Career Mistakes To Avoid, by Tessa Xie - Aug 19, 2021.
Everyone makes mistakes, which can be a good thing when they lead to learning and improvements over time. But, we can also try to first learn from others to expedite our personal growth. To get started, consider these lessons learned the hard way, so you don’t have to.
Career Advice, Data Science, Mistakes
- Enhancing Machine Learning Personalization through Variety, by Raghavan Kirthivasan - Aug 19, 2021.
Personalization drives growth and is a touchstone of good customer experience. Personalization driven through machine learning can enable companies to improve this experience while improving ROI for marketing campaigns. However, challenges exist in these techniques for when personalization makes sense and how and when specific options are recommended.
Machine Learning, Personalization, Recommender Systems
- 15 Things I Look for in Data Science Candidates, by Mathias Gruber - Aug 19, 2021.
This article presents advice for anyone looking or hiring for data science jobs, written by someone with practical and useful insight.
Career Advice, Data Science, Data Science Skills, Data Scientist
- Amazon Web Services Webinar: Accelerating clinical trial and biomedical development processes with healthcare data, by Roidna - Aug 18, 2021.
Join this webinar on August 27 to learn how to leverage external healthcare datasets to make faster decisions with greater accuracy – accelerating biomedical development and improving patient welfare.
AWS, Healthcare, IBM
- When Correlation is Better than Causation, by Brittany Davis - Aug 18, 2021.
Identifying causality in an analysis isn't always practical. We show a heuristic approach for using correlations to inform decisions.
Causation, Correlation, Data Science
-
Open Source Datasets for Computer Vision, by Kevin Vu - Aug 18, 2021.
Access to high-quality, noise-free, large-scale datasets is crucial for training complex deep neural network models for computer vision applications. Many open-source datasets are developed for use in image classification, pose estimation, image captioning, autonomous driving, and object segmentation. These datasets must be paired with the appropriate hardware and benchmarking strategies to optimize performance.
Computer Vision, Datasets, Open Source
- Data Scientist’s Guide to Efficient Coding in Python, by Dr. Varshita Sher - Aug 18, 2021.
Read this fantastic collection of tips and tricks the author uses for writing clean code on a day-to-day basis.
Programming, Python, Tips
- Top July Stories: Data Scientists and ML Engineers Are Luxury Employees, by Gregory Piatetsky - Aug 17, 2021.
Also: Top 6 Data Science Online Courses in 2021; Advice for Learning Data Science from Google's Director of Research; 5 Lessons McKinsey Taught Me That Will Make You a Better Data Scientist
Top stories
- Leaders at Allstate, eBay & Red Bull Agree: Don’t Miss the Rev 3 Enterprise MLOps Summit, by Domino - Aug 17, 2021.
Join data science and MLOps leaders in-person in Chicago this November.
Data Science, Domino, MLOps, Summit
- Linear Algebra for Natural Language Processing, by Taaniya Arora - Aug 17, 2021.
Learn about representing word semantics in vector space.
Linear Algebra, Mathematics, NLP, Python
- Model Drift in Machine Learning – How To Handle It In Big Data, by Sai Geetha - Aug 17, 2021.
Rendezvous Architecture helps you run and choose outputs from a Champion model and many Challenger models running in parallel without many overheads. The original approach works well for smaller data sets, so how can this idea adapt to big data pipelines?
Big Data, Data Engineering, Data Preparation, Machine Learning, Model Drift
- Top Stories, Aug 9-15: The Difference Between Data Scientists and ML Engineers, by KDnuggets - Aug 16, 2021.
Also: Most Common Data Science Interview Questions and Answers; 3 Reasons Why You Should Use Linear Regression Models Instead of Neural Networks; How My Learning Path Changed After Becoming a Data Scientist; MLOPs And Machine Learning RoadMap
Top stories
- KDnuggets Top Blogs Rewards for July 2021, by Gregory Piatetsky - Aug 16, 2021.
These top blogs were winners of KDnuggets Top Blog Rewards Program for July: Data Scientists and ML Engineers Are Luxury Employees; Top 6 Data Science Online Courses in 2021; Advice for Learning Data Science from Google's Director of Research; Pandas not enough? Here are a few good alternatives; A Learning Path To Becoming a Data Scientist; 5 Lessons McKinsey Taught Me That Will Make You a Better Data Scientist
Blog Rewards, Top stories
-
Prefect: How to Write and Schedule Your First ETL Pipeline with Python, by Dario Radečić - Aug 16, 2021.
Workflow management systems made easy — both locally and in the cloud.
Cloud, ETL, Pipeline, Python
- Agile Data Labeling: What it is and why you need it, by Jennifer Prendki - Aug 16, 2021.
The notion of Agile in software development has made waves across industries with its revolution for productivity. Can the same benefits be applied to the often arduous task of annotating data sets for machine learning?
Agile, Data Labeling, Machine Learning, Tesla
- Writing Your First Distributed Python Application with Ray, by Michael Galarnyk - Aug 16, 2021.
Using Ray, you can take Python code that runs sequentially and transform it into a distributed application with minimal code changes. Read on to find out why you should use Ray, and how to get started.
Distributed Computing, Parallelism, Python, Workflow
- How to Train a BERT Model From Scratch, by James Briggs - Aug 13, 2021.
Meet BERT’s Italian cousin, FiliBERTo.
BERT, Hugging Face, NLP, Python, Training
- Querying the Most Granular Demographics Dataset, by Matti Grotheer - Aug 13, 2021.
Having access to broad and detailed population data can potentially offer enormous value to any organization looking to interact with specific demographics. However, access alone is not sufficient without being able to leverage advanced techniques to explore and visualize the data.
Big Data, Data Visualization, Geolocation, Neo4j, Open Source
- Introduction to Statistical Learning Second Edition, by Matthew Mayo - Aug 13, 2021.
The second edition of the classic "An Introduction to Statistical Learning, with Applications in R" was published very recently, and is now freely-available via PDF on the book's website.
Books, Data Science, Machine Learning, R, Statistical Learning, Statistics
- MLOps And Machine Learning Roadmap, by Ben Rogojan - Aug 12, 2021.
A 16–20 week roadmap to review machine learning and learn MLOps.
Courses, DataRobot, Deployment, DevOps, Kubeflow, Kubernetes, Machine Learning, Microsoft Azure, MLOps
- 3 mindset changes to become a better analyst, by Bobby Pinero - Aug 12, 2021.
Once fresh out of school and ready to burst into an organization as a new hire with newly-developed skills and knowledge, many have learned that things tend to be a little different in the "real world" compared to university. A few shifts in your approach to continued learning and expanding your confidence might help you professionally reach a little further, faster.
Advice, Career Advice, Data Analyst
- How to Detect and Overcome Model Drift in MLOps, by Bhaskar Ammu - Aug 12, 2021.
This article has a look at model drift, and how to detect and overcome it in production MLOps.
Machine Learning, MLOps, Production
- 2021 State of Production Machine Learning Survey, by Anyscale - Aug 11, 2021.
We invite you to take the 2021 State of Production Machine Learning survey and help shed light on the latest trends in the adoption of machine learning (ML) in the industry.
Anyscale, Machine Learning, Production, Survey
-

The Difference Between Data Scientists and ML Engineers, by Kurtis Pykes - Aug 11, 2021.
What's the difference? Responsibilities, expertise, and salary expectations.
Career Advice, Data Scientist, Machine Learning Engineer
- For SQL, or why I’m so over-protective of my data people, by Pedram Navid - Aug 11, 2021.
For decades, SQL has been the foundation for how humans interact with data. Alternate approaches seem to continually attempt to replace this powerful language. However, while much progress remains in the techniques and tools for the curation and management of data, the skilled craftspeople who work with data -- through the lens of SQL -- are likely to be around for decades more.
SQL
- DeepMind’s New Super Model: Perceiver IO is a Transformer that can Handle Any Dataset, by Jesus Roriguez - Aug 11, 2021.
The new transformer-based architecture can process audio, video and images using a single model.
DeepMind, Modeling, Transformer
- AI in Real Life, by SAS - Aug 10, 2021.
What do you need to get started on your AI journey? Putting together a combination of the right project, people and infrastructure is no easy task. SAS and MIT SMR have collaborated to provide a comprehensive set of resources to guide you from conception to implementation. Learn from experts that successfully launched AI projects.
AI, SAS, Success, Use Cases
- How My Learning Path Changed After Becoming a Data Scientist, by Soner Yildrim - Aug 10, 2021.
I keep learning but in a different way.
Career Advice, Data Science, Data Scientist, Learning
-
Practising SQL without your own database, by Hui XiangChua - Aug 10, 2021.
SQL is a very important skill for data analysts and data scientists. However, when you are just starting out learning in the field, how can you practice querying with SQL if you don’t have any data stored in a database?
Beginners, Data.world, SQL
- Visualizing Bias-Variance, by Theodore Tsitsimis - Aug 10, 2021.
In this article, we'll explore some different perspectives of what the bias-variance trade-off really means with the help of visualizations.
Bias, Machine Learning, Variance, Visualization
- 5 Tips for Writing Clean R Code, by Marcin Dubel - Aug 9, 2021.
This article summarizes the most common mistakes to avoid and outline best practices to follow in programming in general. Follow these tips to speed up the code review iteration process and be a rockstar developer in your reviewer’s eyes!
Programming, R
- Top Stories, Aug 2-8: 3 Reasons Why You Should Use Linear Regression Models Instead of Neural Networks; Bootstrap a Modern Data Stack in 5 minutes with Terraform, by KDnuggets - Aug 9, 2021.
Also: Most Common Data Science Interview Questions and Answers; How Visualization is Transforming Exploratory Data Analysis; GitHub Copilot Open Source Alternatives; How To Become A Freelance Data Scientist – 4 Practical Tips
Top stories
- Including ModelOps in your AI strategy, by Giuliano Liguori - Aug 9, 2021.
The strategic power of AI has been established thoroughly across many industries and companies, leading to surges in model creation. Investments in the people, processes, and tools for operationalizing models, referred to as ModelOps, lag. This function of operationalizing, integrating, and deploying AI models in line with businesses value expectations is growing into a core business capability as global use of AI matures.
AI, ModelOps, Strategy
-
How to Query Your Pandas Dataframe, by Matthew Przybyla - Aug 9, 2021.
A Data Scientist’s perspective on SQL-like Python functions.
Data Preprocessing, Data Processing, Pandas, Python, SQL
- Using Twitter to Understand Pizza Delivery Apprehension During COVID, by Arimitra Maiti - Aug 6, 2021.
Analyzing customer sentiments and capturing any specific difference in emotion to order Dominos pizza in India during lockdown.
Analytics, COVID-19, Data Science, Retail, Sentiment Analysis, Twitter
-
Bootstrap a Modern Data Stack in 5 minutes with Terraform, by Tuan Nguyen - Aug 6, 2021.
What is a Modern Data Stack and how do you deploy one? This guide will motivate you to start on this journey with setup instructions for Airbyte, BigQuery, dbt, Metabase, and everything else you need using Terraform.
BigQuery, Cloud, Data Warehousing, dbt, Modern Data Stack
- Essential Math for Data Science: Introduction to Systems of Linear Equations, by Hadrien Jean - Aug 6, 2021.
In this post, you’ll see how you can use systems of equations and linear algebra to solve a linear regression problem.
Data Science, Linear Algebra, Mathematics
- Be Wary of Automated Feature Selection — Chi Square Test of Independence Example, by Venkat Raman - Aug 5, 2021.
When Data Scientists use chi square test for feature selection, they just merely go by the ritualistic “If your p-value is low, the null hypothesis must go”. The automated function they use behaves no differently.
Automated Data Science, Automated Machine Learning, Feature Selection, Statistics
-

Most Common Data Science Interview Questions and Answers, by Nate Rosidi - Aug 5, 2021.
After analyzing 900+ data science interview questions from companies over the past few years, the most common data science interview question categories are reviewed in this guide, each explained with an example.
Data Science, Interview Questions
- Artificial Intelligence vs Machine Learning in Cybersecurity, by Peter Baltazar - Aug 5, 2021.
Artificial Intelligence and Machine Learning are the next-gen technology used in various fields. With the rise in online threats, it has become essential to include these technologies in cybersecurity. In this post, we will know what roles do AI and ML play in cybersecurity.
AI, Cybersecurity, Machine Learning, Security
-
How Visualization is Transforming Exploratory Data Analysis, by Todd Mostak - Aug 4, 2021.
Data analysts are dealing with bigger datasets than ever before, making interrogation difficult. Visualized Exploratory Data Analysis, supported by advanced parallel computing, promises an answer.
Data Analysis, Data Exploration, Data Visualization, Geospatial
-
How To Become A Freelance Data Scientist – 4 Practical Tips, by Pau Labarta Bajo - Aug 4, 2021.
If you are a nerd-ish data scientist who wants to start working as an independent (remote) freelance data scientist, then these four practical tips can help you transition from the traditional 9-to-5 job to a dynamic experience as a remote contractor, just as the author did three years ago.
Career Advice, Consulting, Data Scientist, Freelance
- How DeepMind Trains Agents to Play Any Game Without Intervention, by Jesus Rodriguez - Aug 4, 2021.
A new paper proposes a new architecture and training environment for generally capable agents.
Agents, AI, DeepMind, Games
- Free dataset worth $1350 to test the accent gap!, by DefinedCrowd - Aug 3, 2021.
With so many accent variations, how do speech and voice technologies keep up? In a few words: accented speech training data, representative of diverse groups of people. The more people your model can understand, the more likely you are to acquire and retain customers.
Competition, Dataset, Marketplace, Speech Recognition
- Mastering Clustering with a Segmentation Problem, by Indraneel Dutta Baruah - Aug 3, 2021.
The one stop shop for implementing the most widely used models in Python for unsupervised clustering.
Clustering, DBSCAN, K-means, Machine Learning, Segmentation, Unsupervised Learning
- 30 Most Asked Machine Learning Questions Answered, by Abhay Parashar - Aug 3, 2021.
There is always a lot to learn in machine learning. Whether you are new to the field or a seasoned practitioner and ready for a refresher, understanding these key concepts will keep your skills honed in the right direction.
Beginners, Interview Questions, Machine Learning, Regression, scikit-learn
- How To 2x Your Data Analytics Consulting Rates (Overnight), by Lillian Pierson, P.E. - Aug 3, 2021.
Looking to up your data analytics consulting rates? Learn exactly what most freelancers are charging, and the rates you SHOULD be charging as a business intelligence and analytics consultant. This post will show you what you need to know to achieve maximum results for your data consulting career.
Analytics, Career Advice, Careers, Data Science
-
GPU-Powered Data Science (NOT Deep Learning) with RAPIDS, by Tirthajyoti Sarkar - Aug 2, 2021.
How to utilize the power of your GPU for regular data science and machine learning even if you do not do a lot of deep learning work.
Data Science, GPU, Python
- Top Stories, Jul 26 – Aug 1: GitHub Copilot Open Source Alternatives; Why and how should you learn “Productive Data Science”?, by KDnuggets - Aug 2, 2021.
Also: Advice for Learning Data Science from Google’s Director of Research; Design patterns in machine learning; A Brief Introduction to the Concept of Data; 5 Mistakes I Wish I Had Avoided in My Data Science Career
Top stories
- Development & Testing of ETL Pipelines for AWS Locally, by Subhash Sreenivasachar - Aug 2, 2021.
Typically, development and testing ETL pipelines is done on real environment/clusters which is time consuming to setup & requires maintenance. This article focuses on the development and testing of ETL pipelines locally with the help of Docker & LocalStack. The solution gives flexibility to test in a local environment without setting up any services on the cloud.
AWS, Data Engineering, ETL, Pipeline