- Hiring or Looking to Get Hired in Data Science/Analytics? The INFORMS Virtual Career Fair is for You, by INFORMS - Mar 31, 2021.
Hiring or looking to get hired in Data Science / Analytics? The INFORMS Virtual Career Fair, April 13, is for you. Register today!
- 3 More Free Top Notch Natural Language Processing Courses, by Matthew Mayo - Mar 31, 2021.
Are you looking to continue your learning of natural language processing? This small collection of 3 free top notch courses will allow you to do just that.
- Data vault: new weaponry in your data science toolkit, by Bas Vlaming - Mar 31, 2021.
Data Vault is a modern data modelling approach for capturing (historical) data in a structurally auditable and tractable way. While very helpful for data engineers, the Data Vault also enables Data Science in practice.
- Introduction to the White-Box AI: the Concept of Interpretability, by SciForce - Mar 31, 2021.
ML models interpretability can be seen as “the ability to explain or to present in understandable terms to a human.” Read this article and learn to go beyond the black box of AI, where algorithms make predictions, toward the underlying explanation remains unknown and untraceable.
- Sudoku Rules: Using a Decision Engine to Solve Candidate Pairs, by FICO - Mar 30, 2021.
Follow along with the author's most recent installment in their quest to solve Sudoku puzzles, this time with the help of a decision engine to solve candidate pairs.
- Software Engineering Best Practices for Data Scientists, by Madison Hunter - Mar 30, 2021.
This is a crash course on how to bridge the gap between data science and software engineering.
- Why So Many Data Scientists Quit Good Jobs at Great Companies, by Adam Sroka - Mar 30, 2021.
The role of the Data Scientist continues to offer many great opportunities as a career. However, the 'sexiest job of the 21st century' has lost some of its appeal because of unrealized expectations and how organizations might leverage this type of work. Having a better understanding of how data science typically plays out in the business world can help you achieve the success you want.
- Explainable Visual Reasoning: How MIT Builds Neural Networks that can Explain Themselves, by Jesus Rodriguez - Mar 30, 2021.
New MIT research attempts to close the gap between state-of-the-art performance and interpretable models in computer vision tasks.
- Great News for KDnuggets subscribers! You now have access to the WorldData.AI Partners Plan at no cost, by Gregory Piatetsky - Mar 29, 2021.
Great News for KDnuggets subscribers! You now have access to the WorldData.AI Partners Plan at no cost, including access to some of the premium datasets only available to enterprise members. Connect your data to many of 3.5 Billion WorldData datasets and improve your Data Science and Machine Learning models! Subscribe to KDnuggets to get access.
- Top Stories, Mar 22-28: How to Succeed in Becoming a Freelance Data Scientist - Mar 29, 2021.
Also: Top 10 Python Libraries Data Scientists should know in 2021; More Data Science Cheatsheets; The Portfolio Guide for Data Science Beginners; The Best Machine Learning Frameworks & Extensions for Scikit-learn
- How to break a model in 20 days — a tutorial on production model analytics, by Dral & Samuylova - Mar 29, 2021.
This is an article on how models fail in production, and how to spot it.
- What Took Me So Long to Land a Data Scientist Job, by Soner Yildirim - Mar 29, 2021.
Learning all you need to learn about data science is only part of the adventure. Landing that first job is another. While it might take a while to get your foot into the door, there are several key efforts you can do to shorten this time as much as possible.
- Deep Learning Is Becoming Overused, by Michael Grogan - Mar 29, 2021.
Understanding the data is the first port of call.
- MongoDB in the Cloud: Three Solutions for 2021, by Krueger & Franklin - Mar 26, 2021.
An overview of pricing and compatibility for MongoDB Atlas, AWS DocumentDB, Azure Cosmos DB.
- Overview of MLOps, by Steve Shwartz - Mar 26, 2021.
Building a machine learning model is great, but to provide real business value, it must be made useful and maintained to remain useful over time. Machine Learning Operations (MLOps), overviewed here, is a rapidly growing space that encompasses everything required to deploy a machine learning model into production, and is a crucial aspect to delivering this sought after value.
- Multilingual CLIP with Huggingface + PyTorch Lightning, by Sachin Abeywardana - Mar 26, 2021.
An overview of training OpenAI's CLIP on Google Colab.
- MS in Data Science at Ramapo, by Ramapo College - Mar 25, 2021.
Ramapo College’s Master of Science in Data Science program will teach you to collect, synthesize, and analyze big data, become skilled in programming languages like R and Python, and leverage advanced tools to meet the demands of modern business and science.
- The question that makes your data project more valuable, by Brittany Davis - Mar 25, 2021.
If you are the "data person" for your organization, then providing meaningful results to stakeholder data requests can sometimes feel like shots in the dark. However, you can make sure your data analysis is actionable by asking one magic question before getting started.
- Data Science Curriculum for Professionals, by Brock Taute - Mar 25, 2021.
If you are looking to expand or transition your current professional career that is buried in spreadsheet analysis into one powered by data science, then you are in for an exciting but complex journey with much to explore and master. To begin your adventure, following this complete road map to guide you from a gnome in the forest of spreadsheets to an AI wizard known far and wide throughout the kingdom.
- Extraction of Objects In Images and Videos Using 5 Lines of Code, by Ayoola Olafenwa - Mar 25, 2021.
PixelLib is a library created for easy integration of image and video segmentation in real life applications. Learn to use PixelLib to extract objects In images and videos with minimal code.
- Solve for Success: The Transformative Power of Data Visualization, by Copyright Clearance Center - Mar 24, 2021.
Learn from experts and hear real-world use cases about how you and your organization can optimize data to enable innovation through visualization. Register now.
- 15 Habits I Learned from Highly Effective Data Scientists, by Madison Hunter - Mar 24, 2021.
I’m using these habits in 2021 to become a more effective future data scientist.
- Top 10 Python Libraries Data Scientists should know in 2021, by Terence Shin - Mar 24, 2021.
So many Python libraries exist that offer powerful and efficient foundations for supporting your data science work and machine learning model development. While the list may seem overwhelming, there are certain libraries you should focus your time on, as they are some of the most commonly used today.
- Rejection Sampling with Python, by Michael Grogan - Mar 24, 2021.
Read this article on rejection sampling with examples using the Normal and Cauchy Distributions.
- Applying Natural Language Processing in Healthcare, by Formulated - Mar 23, 2021.
New advances in natural language processing (NLP) based on deep learning and transfer learning have made a whole set of software use cases in healthcare viable. The Healthcare NLP Summit is a free online conference on April 6th and 7th, bringing together 30+ technical sessions from across the community that works to apply these advances in the real world.
- How to Succeed in Becoming a Freelance Data Scientist, by Devin Partida - Mar 23, 2021.
With recent growth in data science, now is the best time to get into freelancing. The following steps will help you get started with finding clients or help you improve your current strategy.
- Metric Matters, Part 2: Evaluating Regression Models, by Susan Sivek - Mar 23, 2021.
In this second part review of the many options available for choosing metrics to evaluate machine learning models, learn how to select the most appropriate metric for your analysis of regression models.
- Top YouTube Machine Learning Channels, by Matthew Mayo - Mar 23, 2021.
These are the top 15 YouTube channels for machine learning as determined by our stated criteria, along with some additional data on the channels to help you decide if they may have some content useful for you.
- The Best Machine Learning Frameworks & Extensions for Scikit-learn, by Derrick Mwiti - Mar 22, 2021.
Learn how to use a selection of packages to extend the functionality of Scikit-learn estimators.
- The Portfolio Guide for Data Science Beginners, by Navid Mashinchi - Mar 22, 2021.
Whether you are an aspiring or seasoned Data Scientist, establishing a clear and well-designed online portfolio presence will help you stand out in the industry, and provide potential employers a powerful understanding of your work and capabilities. These tips will help you brainstorm and launch your first data science portfolio.
- Teaching AI to See Like a Human, by Jesus Rodriguez - Mar 22, 2021.
DeepMind Generative Query Networks can infer knowledge as they navigate a visual environment.
- Top Stories, Mar 15-21: More Data Science Cheatsheets - Mar 22, 2021.
Also: How To Overcome The Fear of Math and Learn Math For Data Science; Know your data much faster with the new Sweetviz Python library; Introducing dbt, the ETL and ELT Disrupter; Must Know for Data Scientists and Data Analysts: Causal Design Patterns
- Learning from machine learning mistakes, by Emeli Dral - Mar 19, 2021.
Read this article and discover how to find weak spots of a regression model.
- AI in Dating: Can Algorithms Help You Find Love?, by Yuliya Sychikova - Mar 19, 2021.
Can AI algorithms help us find love? Can they go a step further and replace a human being as a partner in a relationship? Here, we analyze how far technology has come in helping us meet "our" people, find love, and feel less lonely.
- How to build a DAG Factory on Airflow, by Axel Furlan - Mar 19, 2021.
A guide to building efficient DAGs with half of the code.
- Wrangle Summit 2021: All the Best People, Ideas, and Technology in Data Engineering, All in One Place, by Trifacta - Mar 18, 2021.
At Wrangle Summit 2021, Apr 7-9, you’ll get access to all the best people, ideas, and technology in data engineering, all in one place. Learn how to refine raw data and engineer unique data products, and gain insights from your data that can catalyze real, measurable business success.
- More Data Science Cheatsheets, by Matthew Mayo - Mar 18, 2021.
It's time again to look at some data science cheatsheets. Here you can find a short selection of such resources which can cater to different existing levels of knowledge and breadth of topics of interest.
- How to frame the right questions to be answered using data, by Benjamin Obi Tayo - Mar 18, 2021.
Understanding your data first is a key step before going too far into any data science project. But, you can't fully understand your data until you know the right questions to ask of it.
- A Simple Way to Time Code in Python, by Krueger & Franklin - Mar 18, 2021.
Read on to find out how to use a decorator to time your functions.
- Data Annotation: tooling & workflows latest trends, by iMerit - Mar 17, 2021.
As AI continues to boom, improved technologies and processes for data labeling and annotation are on the rise. iMerit, a leader in providing high-quality data for Machine Learning and AI, shares the latest trends in annotation workflow and tooling.
- Automating Machine Learning Model Optimization, by Himanshu Sharma - Mar 17, 2021.
This articles presents an overview of using Bayesian Tuning and Bandits for machine learning.
- Introducing dbt, the ETL and ELT Disrupter, by Terence Shin - Mar 17, 2021.
Moving and processing data is happening 24/7/365 world-wide at massive scales that only get larger by the hour. Tools exist to introduce efficiencies in how data can be extracted from sources, transformed through calculations, and loaded into target data repositories. However, on their own, these tools can introduce some restrictions in the processing, especially for the needs of data analytics and data science.
- How to Begin Your NLP Journey, by Diego Lopez Yse - Mar 17, 2021.
In this blog post, learn how to process text using Python.
- Natural Language Processing Pipelines, Explained, by Ram Tavva - Mar 16, 2021.
This article presents a beginner's view of NLP, as well as an explanation of how a typical NLP pipeline might look.
- Metric Matters, Part 1: Evaluating Classification Models, by Susan Sivek - Mar 16, 2021.
You have many options when choosing metrics for evaluating your machine learning models. Select the right one for your situation with this guide that considers metrics for classification models.
- Data Validation and Data Verification – From Dictionary to Machine Learning, by Aggarwal & Bose - Mar 16, 2021.
In this article, we will understand the difference between data verification and data validation, two terms which are often used interchangeably when we talk about data quality. However, these two terms are distinct.
- Sudoku Rules: Using A Decision Engine To Solve Sudoku, by Fernado Donati Jorge - Mar 15, 2021.
See the progress the author has made since last time, after setting themselves the challenge of solving Sudoku puzzles using an optimized inference engine, along with a few other advanced features of FICO® Blaze Advisor®.
- Are you satisfied in your job? Take our Data Community Job Satisfaction Survey, by Matthew Mayo - Mar 15, 2021.
The latest KDnuggets survey is looking to determine the job satisfaction levels of the data community. Take a few moments to contribute your answer and help paint a picture of the current situation.
- 10 Amazing Machine Learning Projects of 2020, by Anupam Chugh - Mar 15, 2021.
So much progress in AI and machine learning happened in 2020, especially in the areas of AI-generating creativity and low-to-no-code frameworks. Check out these trending and popular machine learning projects released last year, and let them inspire your work throughout 2021.
- Forget Telling Stories; Help People Navigate, by Stan Pugsley - Mar 15, 2021.
When designing reporting & visualizations, think of them as part of a navigation framework rather than stand-alone information.
- Top Stories, Mar 8-14: How To Overcome The Fear of Math and Learn Math For Data Science - Mar 15, 2021.
Also: Know your data much faster with the new Sweetviz Python library; Must Know for Data Scientists and Data Analysts: Causal Design Patterns; Are You Still Using Pandas to Process Big Data in 2021? Here are two better options; 3 Mathematical Laws Data Scientists Need To Know
- AI Industry Innovation: Making the Invisible Visible, by AI Accelerator Institute - Mar 12, 2021.
AI Accelerator Festival: Hardware Acceleration for AI at the Edge . The world's only end-user led event dedicated to accelerating industries by harnessing the power of AI. March 16-19, 2021.
- Kedro-Airflow: Orchestrating Kedro Pipelines with Airflow, by Jo Stitchbury - Mar 12, 2021.
The Kedro team and Astronomer have released Kedro-Airflow 0.4.0 to help you develop modular, maintainable & reproducible code with orchestration superpowers!
- Must Know for Data Scientists and Data Analysts: Causal Design Patterns, by Emily Riederer - Mar 12, 2021.
Industry is a prime setting for observational causal inference, but many companies are blind to causal measurement beyond A/B tests. This formula-free primer illustrates analysis design patterns for measuring causal effects from observational data.
- Know your data much faster with the new Sweetviz Python library, by Francois Bertrand - Mar 12, 2021.
One of the latest exploratory data analysis libraries is a new open-source Python library called Sweetviz, for just the purposes of finding out data types, missing information, distribution of values, correlations, etc. Find out more about the library and how to use it here.
- Top February Stories: We Don’t Need Data Scientists, We Need Data Engineers; How to create stunning visualizations using python from scratch, by Gregory Piatetsky - Mar 11, 2021.
Also: How to Get Your First Job in Data Science without Any Work Experience; Telling a Great Data Story: A Visualization Decision Tree
- Advance your career in Data Science with HSE Master in Data Science, by Coursera - Mar 11, 2021.
HSE’s Master of Data Science is the first fully English-taught online data science Master’s from a Russian university. The degree is designed for students with or without prior coding experience. The final application deadline is June 17th. Learn more about HSE’s Master of Data Science now.
- A Beginner’s Guide to the CLIP Model, by Matthew Brems - Mar 11, 2021.
CLIP is a bridge between computer vision and natural language processing. I'm here to break CLIP down for you in an accessible and fun read! In this post, I'll cover what CLIP is, how CLIP works, and why CLIP is cool.
- The Inferential Statistics Data Scientists Should Know, by Nagesh Chauhan - Mar 11, 2021.
The foundations of Data Science and machine learning algorithms are in mathematics and statistics. To be the best Data Scientists you can be, your skills in statistical understanding should be well-established. The more you appreciate statistics, the better you will understand how machine learning performs its apparent magic.
- A Machine Learning Model Monitoring Checklist: 7 Things to Track, by Emeli Dral & Elena Samuylova - Mar 11, 2021.
Once you deploy a machine learning model in production, you need to make sure it performs. In the article, we suggest how to monitor your models and open-source tools to use.
- Read This Before You Apply to a Business Analytics Master’s Program, by Carnegie Mellon University - Mar 10, 2021.
Considering a master’s in business analytics? Here are four things to know before you apply.
- How to Speed Up Pandas with Modin, by Michael Galarnyk - Mar 10, 2021.
The Modin library has the ability to scale your pandas workflows by changing one line of code and integration with the Python ecosystem and Ray clusters. This tutorial goes over how to get started with Modin and how it can speed up your pandas workflows.
- How To Overcome The Fear of Math and Learn Math For Data Science, by Arnuld On Data - Mar 10, 2021.
Many aspiring Data Scientists, especially when self-learning, fail to learn the necessary math foundations. These recommendations for learning approaches along with references to valuable resources can help you overcome a personal sense of not being "the math type" or belief that you "always failed in math."
- DeepMind’s AlphaFold & the Protein Folding Problem, by Kevin Vu - Mar 10, 2021.
Recently, DeepMind's AlphaFold made impressive headway in the protein structure prediction problem. Read this for an overview and explanation.
- A Solid Investment: Banking on Talent Development., by SAS - Mar 9, 2021.
The demand for analytics skills and talent has never been higher. As the workforce continues to evolve, so do the technology and skillsets needed. Learn how the Millennium Bank partnered with SAS to customize a development and training program that improved skills, knowledge, and retention.
- Document Databases, Explained, by Alex Williams - Mar 9, 2021.
Out of all the NoSQL database types, document-stores are considered the most sophisticated ones. They store data in a JSON format which as opposed to a classic rows and columns structure.
- 4 Machine Learning Concepts I Wish I Knew When I Built My First Model, by Terence Shin - Mar 9, 2021.
Diving into building your first machine learning model will be an adventure -- one in which you will learn many important lessons the hard way. However, by following these four tips, your first and subsequent models will be put on a path toward excellence.
- Is It Too Late to Learn AI?, by Frederik Bussler - Mar 9, 2021.
Have you missed the train on learning AI?
- 8 Women in AI Who Are Striving to Humanize the World, by Liudmyla Taranenko - Mar 8, 2021.
Some exceptional female researchers and engineers are working on projects to make the world a better place with the help of AI, data science, and machine learning.
- Top Stories, Mar 1-7: Top YouTube Channels for Data Science - Mar 8, 2021.
Also: Are You Still Using Pandas to Process Big Data in 2021? Here are two better options; 3 Mathematical Laws Data Scientists Need To Know; Google’s Model Search is a New Open Source Framework that Uses Neural Networks to Build Neural Networks; Machine Learning Systems Design: A Free Stanford Course
- More Resources for Women in AI, Data Science, and Machine Learning, by Gregory Piatetsky - Mar 8, 2021.
Useful resources to help more women enter and succeed in AI, Data Science, and Machine Learning fields.
- Beautiful decision tree visualizations with dtreeviz, by Eryk Lewinson - Mar 8, 2021.
Improve the old way of plotting the decision trees and never go back!
- 11 Essential Code Blocks for Complete EDA (Exploratory Data Analysis), by Susan Maina - Mar 5, 2021.
This article is a practical guide to exploring any data science project and gain valuable insights.
- Speeding up Scikit-Learn Model Training, by Michael Galarnyk - Mar 5, 2021.
If your scikit-learn models are taking a bit of time to train, then there are several techniques you can use to make the processing more efficient. From optimizing your model configuration to leveraging libraries to speed up training through parallelization, you can build the best scikit-learn model possible in the least amount of time.
- Bayesian Hyperparameter Optimization with tune-sklearn in PyCaret, by Antoni Baum - Mar 5, 2021.
PyCaret, a low code Python ML library, offers several ways to tune the hyper-parameters of a created model. In this post, I'd like to show how Ray Tune is integrated with PyCaret, and how easy it is to leverage its algorithms and distributed computing to achieve results superior to default random search method.
- Start a career in Computer Science with Penn’s Master in Computer Science and Information Technology, by Coursera - Mar 4, 2021.
Penn MS of Computer and Information Technology is an online masters degree tailored for non-CS majors, empowering then to succeed in computing and technology fields. Apply by May 1.
- Reducing the High Cost of Training NLP Models With SRU++, by Tao Lei, PhD - Mar 4, 2021.
The increasing computation time and costs of training natural language models (NLP) highlight the importance of inventing computationally efficient models that retain top modeling power with reduced or accelerated computation. A single experiment training a top-performing language model on the 'Billion Word' benchmark would take 384 GPU days and as much as $36,000 using AWS on-demand instances.
- Dask and Pandas: No Such Thing as Too Much Data, by Stephanie Kirmer - Mar 4, 2021.
Do you love pandas, but don't love it when you reach the limits of your memory or compute resources? Dask provides you with the option to use the pandas API with distributed data and computing. Learn how it works, how to use it, and why it’s worth the switch when you need it most.
- 9 Skills You Need to Become a Data Engineer, by Dorian Martin - Mar 4, 2021.
A data engineer is a fast-growing profession with amazing challenges and rewards. Which skills do you need to become a data engineer? In this post, we’ll take a look at both hard and soft skills.
- Evaluating Object Detection Models Using Mean Average Precision, by Ahmed Gad - Mar 3, 2021.
In this article we will see see how precision and recall are used to calculate the Mean Average Precision (mAP).
- 15 common mistakes data scientists make in Python (and how to fix them), by Gerold Csendes - Mar 3, 2021.
Writing Python code that works for your data science project and performs the task you expect is one thing. Ensuring your code is readable by others (including your future self), reproducible, and efficient are entirely different challenges that can be addressed by minimizing common bad practices in your development.
- Getting Started with Distributed Machine Learning with PyTorch and Ray, by Galarnyk, Liaw & Nishihara - Mar 3, 2021.
Ray is a popular framework for distributed Python that can be paired with PyTorch to rapidly scale machine learning applications.
- Speech to Text with Wav2Vec 2.0, by Dhilip Subramanian - Mar 2, 2021.
Facebook recently introduced and open-sourced their new framework for self-supervised learning of representations from raw audio data called Wav2Vec 2.0. Learn more about it and how to use it here.
- 3 Mathematical Laws Data Scientists Need To Know, by Cornellius Yudha Wijaya - Mar 2, 2021.
Machine learning and data science are founded on important mathematics in statistics and probability. A few interesting mathematical laws you should understand will especially help you perform better as a Data Scientist, including Benford's Law, the Law of Large Numbers, and Zipf's Law.
- The Ultimate Guide to Acing Coding Interviews for Data Scientists, by Emma Ding & Rob Wang - Mar 2, 2021.
This article covers understanding the 4 types of coding interview questions and preparing for them effectively.
- Google’s Model Search is a New Open Source Framework that Uses Neural Networks to Build Neural Networks, by Jesus Rodriguez - Mar 1, 2021.
The new framework brings state-of-the-art neural architecture search methods to TensorFlow.
- Top Stories, Feb 22-28: We Don’t Need Data Scientists, We Need Data Engineers; Data Science Learning Roadmap for 2021 - Mar 1, 2021.
Also: Powerful Exploratory Data Analysis in just two lines of code; Machine Learning Systems Design: A Free Stanford Course; Telling a Great Data Story: A Visualization Decision Tree
- Are You Still Using Pandas to Process Big Data in 2021? Here are two better options, by Roman Orac - Mar 1, 2021.
When its time to handle a lot of data -- so much that you are in the realm of Big Data -- what tools can you use to wrangle the data, especially in a notebook environment? Pandas doesn’t handle really Big Data very well, but two other libraries do. So, which one is better and faster?
- Top YouTube Channels for Data Science, by Matthew Mayo - Mar 1, 2021.
Have a look at the top 15 YouTube channels for data science by number of subscribers, along with some additional data on the channels to help you decide if they may have some content useful for you.