- Towards a Responsible and Ethical AI, by Vidhi Chugh - Jul 30, 2021.
It is not the technology at fault, but the intention.
- Data Monetization 101, by John Farrall - Jul 30, 2021.
The evolving marketplace of data now includes many firms that support a variety of needs from organizations looking to grow with data. This listing of the key players categorized by target market provides an interesting picture of this exciting industry sector.
- 10 Machine Learning Model Training Mistakes, by Sandeep Uttamchandani, Ph.D. - Jul 30, 2021.
These common ML model training mistakes are easy to overlook but costly to redeem.
- Online Master’s in Data Science from Northwestern, by Northwestern University - Jul 29, 2021.
Build statistical and analytical expertise as well as the management and leadership skills necessary to implement high-level, data-driven decisions in Northwestern's online Master of Science in Data Science program. Apply now!
- GitHub Copilot Open Source Alternatives, by Matthew Mayo - Jul 29, 2021.
GitHub's Copilot code generation tool is currently only available via approved request. Here are 4 Copilot alternatives that you can use in your programming today.
- MLOps Best Practices, by Siddharth Kashiramka - Jul 29, 2021.
Many technical challenges must be overcome to achieve successful delivery of machine learning solutions at scale. This article shares best practices we encountered while architecting and applying a model deployment platform within a large organization, including required functionality, the recommendation for a scalable deployment pattern, and techniques for testing and performance tuning models to maximize platform throughput.
- A Brief Introduction to the Concept of Data, by Angelica Lo Duca - Jul 29, 2021.
Every aspiring data scientist must know the concept of data and the kind of analysis they can run. This article introduces the concept of data (quantitative and qualitative) and the types of analysis.
- An AI-Based Framework Solution to Address Email Management Challenges, by Expert.ai - Jul 28, 2021.
Expert.ai’s Edge NL API is an on-premise API that can perform NLU tasks with no required training or extra work, offering advanced, out-of-the-box capabilities that address common use cases and can be easily customized to your specific needs.
- The Brutal Truth About Data Science, by Prad Upadrashta - Jul 28, 2021.
Many organizations approach data science as though it was a marketing tool — relabeling things that they already do as ‘data science’ as it involves the use of data. That is not real data science, and it completely misses the point of engaging in data science.
- dbt for Data Transformation – Hands-on Tutorial, by Essi Alizadeh - Jul 28, 2021.
The data build tool (dbt) is gaining in popularity and use, and this hands-on tutorial covers creating complex models, using variables and functions, running tests, generating docs, and many more features.
- Building Machine Learning Pipelines using Snowflake and Dask, by Daniel Foley - Jul 28, 2021.
In this post, I want to share some of the tools that I have been exploring recently and show you how I use them and how they helped improve the efficiency of my workflow. The two I will talk about in particular are Snowflake and Dask. Two very different tools but ones that complement each other well especially as part of the ML Lifecycle.
- ARTIFICIAL INTELLIGENCE (AI), A TEXTBOOK, by Charu Aggarwal - Jul 27, 2021.
This book covers the broader field of AI, carefully balancing coverage between classical AI (logic or deductive reasoning) and modern AI (inductive learning and neural networks).
- Python Data Structures Compared, by Matthew Mayo - Jul 27, 2021.
Let's take a look at 5 different Python data structures and see how they could be used to store data we might be processing in our everyday tasks, as well as the relative memory they use for storage and time they take to create and access.
- Machine Learning Skills – Update Yours This Summer, by Ahmad Anis - Jul 27, 2021.
The process of mastering new knowledge often requires multiple passes to ensure the information is deeply understood. If you already began your journey into machine learning and data science, then you are likely ready for a refresher on topics you previously covered. This eight-week self-learning path will help you recapture the foundations and prepare you for future success in applying these skills.
- Facebook Open Sources a Chatbot That Can Discuss Any Topic, by Jesus Rodriguez - Jul 27, 2021.
The new version expands the capabilities of its predecessor building a much more natural conversational experience.
- Top Stories, Jul 19-25: Top 6 Data Science Online Courses in 2021; 11 Important Probability Distributions Explained, by KDnuggets - Jul 26, 2021.
Also: Google’s Director of Research Advice for Learning Data Science; Geometric foundations of Deep Learning; How Can You Distinguish Yourself from Hundreds of Other Data Science Candidates?; Design patterns in machine learning
- Not Only for Deep Learning: How GPUs Accelerate Data Science & Data Analytics, by Kevin Vu - Jul 26, 2021.
Modern AI/ML systems’ success has been critically dependent on their ability to process massive amounts of raw data in a parallel fashion using task-optimized hardware. Can we leverage the power of GPU and distributed computing for regular data processing jobs too?
- 5 Mistakes I Wish I Had Avoided in My Data Science Career, by Tessa Xie - Jul 26, 2021.
Everyone makes mistakes, which can be a good thing when they lead to learning and improvements over time. But, we can also try to first learn from others to expedite our personal growth. To get started, consider these lessons learned the hard way, so you don’t have to.
- Why and how should you learn “Productive Data Science”?, by Tirthajyoti Sarkar - Jul 26, 2021.
What is Productive Data Science and what are some of its components?
- Top Python Data Science Interview Questions, by Nate Rosidi - Jul 23, 2021.
Six must-know technical concepts and two types of questions to test them.
- Full cross-validation and generating learning curves for time-series models, by Mehmet Suzen - Jul 23, 2021.
Standard cross-validation on time series data is not possible because the data model is sequential, which does not lend well to splitting the data into statistically useful training and validation sets. However, a new approach called Reconstructive Cross-validation may pave the way toward performing this type of important analysis for predictive models with temporal datasets.
- How to Use Kafka Connect to Create an Open Source Data Pipeline for Processing Real-Time Data, by Paul Brebner - Jul 23, 2021.
This article shows you how to create a real-time data pipeline using only pure open source technologies. These include Kafka Connect, Apache Kafka, Kibana and more.
- Overview of Albumentations: Open-source library for advanced image augmentations, by Olga Chernytska - Jul 22, 2021.
With code snippets on augmentations and integrations with PyTorch and Tensorflow pipelines.
- The Lost Art of Decile Analysis, by Venkat Raman - Jul 22, 2021.
The goal of classification is a primary and widely-used application of machine learning algorithms. However, if careful consideration through additional analysis is not taken into the subtlety in the results of an even an apparently straightforward binary classifier, then the deeper meaning of your prediction may be obscured.
- ColabCode: Deploying Machine Learning Models From Google Colab, by Kaustubh Gupta - Jul 22, 2021.
New to ColabCode? Learn how to use it to start a VS Code Server, Jupyter Lab, or FastAPI.
- MS in Analytics at Northwestern – Learn about the benefits of corporate sponsorship, by Northwestern University - Jul 21, 2021.
The MS in Analytics program at Northwestern invites you to an info session about project sponsorship, Aug 3 at 5 pm CT. Discover how your business can benefit from actionable machine learning solutions and insights developed by Northwestern students.
- The Best SOTA NLP Course is Free!, by Matthew Mayo - Jul 21, 2021.
Hugging Face has recently released a course on using its libraries and ecosystem for practical NLP, and it appears to be very comprehensive. Have a look for yourself.
- WHT: A Simpler Version of the fast Fourier Transform (FFT) you should know, by Sean O'Connor - Jul 21, 2021.
The fast Walsh Hadamard transform is a simple and useful algorithm for machine learning that was popular in the 1960s and early 1970s. This useful approach should be more widely appreciated and applied for its efficiency.
- Design patterns in machine learning, by Ágoston Török - Jul 21, 2021.
Can we abstract best practices to real design patterns yet?
- When to Retrain an Machine Learning Model? Run these 5 checks to decide on the schedule, by Dral & Samuylov - Jul 20, 2021.
Machine learning models degrade with time, and need to be regularly updated. In the article, we suggest how to approach retraining and plan for it in advance.
- 11 Important Probability Distributions Explained, by Terence Shin - Jul 20, 2021.
There are many distribution functions considered in statistics and machine learning, which can seem daunting to understand at first. Many are actually closely related, and with these intuitive explanations of the most important probability distributions, you can begin to appreciate the observations of data these distributions communicate.
- Understanding BERT with Hugging Face, by Kevin Vu - Jul 20, 2021.
We don’t really understand something before we implement it ourselves. So in this post, we will implement a Question Answering Neural Network using BERT and a Hugging Face Library.
- Top Stories, Jul 12-18: Top 6 Data Science Online Courses in 2021; Become an Analytics Engineer in 90 Days, by KDnuggets - Jul 19, 2021.
Also: Data Scientists and ML Engineers Are Luxury Employees; Geometric foundations of Deep Learning; How Can You Distinguish Yourself from Hundreds of Other Data Science Candidates?; A Learning Path To Becoming a Data Scientist
- How Much Memory is your Machine Learning Code Consuming?, by Tirthajyoti Sarkar - Jul 19, 2021.
Learn how to quickly check the memory footprint of your machine learning function/module with one line of command. Generate a nice report too.
- Advice for Learning Data Science from Google’s Director of Research, by Benjamin Obi Tayo - Jul 19, 2021.
Surfing the professional career wave in data science is a hot prospect for many looking to get their start in the world. The digital revolution continues to create many exciting new opportunities. But, jumping in too fast without fully establishing your foundational skills can be detrimental to your success, as is suggested by this advice for data science newbies from Peter Norvig, the Director of Research at Google.
- Why Saying “We Accept the Null Hypothesis” is Wrong: An Intuitive Explanation, by Venkat Raman - Jul 19, 2021.
“The opposite of ‘Rejecting the Null’ is ‘Accepting’ isn’t it?”. Well, it is not so simple as it is construed. We need to rise above antonyms and understand one crucial concept.
- How to Create Unbiased Machine Learning Models, by Philip Tannor - Jul 16, 2021.
In this post we discuss the concepts of bias and fairness in the Machine Learning world, and show how ML biases often reflect existing biases in society. Additionally, We discuss various methods for testing and enforcing fairness in ML models.
- High-Performance Deep Learning: How to train smaller, faster, and better models – Part 5, by Gaurav Menghani - Jul 16, 2021.
Training efficient deep learning models with any software tool is nothing without an infrastructure of robust and performant compute power. Here, current software and hardware ecosystems are reviewed that you might consider in your development when the highest performance possible is needed.
- Pushing No-Code Machine Learning to the Edge, by Devin Partida - Jul 16, 2021.
Discover the power of no-code machine learning, and what it can accomplish when pushed to edge devices.
- AWS Webinar: How are data-driven companies using ESG and sustainability data to make actionable decisions?, by Roidna - Jul 15, 2021.
In this virtual session, on Jul 29 @ 11AM PT, 2PM ET, our panel of experts will uncover how companies across several verticals use ESG data to move beyond the reporting benchmark, deepen business insights, and create competitive differentiation.
- 7 Open Source Libraries for Deep Learning Graphs, by Kevin Vu - Jul 15, 2021.
In this article we’ll go through 7 up-and-coming open source libraries for graph deep learning, ranked in order of increasing popularity.
- Top 6 Data Science Online Courses in 2021, by Natassha Selvaraj - Jul 15, 2021.
As an aspiring data scientist, it is easy to get overwhelmed by the abundance of resources available on the Internet. With these 6 online courses, you can develop yourself from a novice to experienced in less than a year, and prepare you with the skills necessary to land a job in data science.
- Date Processing and Feature Engineering in Python, by Matthew Mayo - Jul 15, 2021.
Have a look at some code to streamline the parsing and processing of dates in Python, including the engineering of some useful and common features.
- Shareable data analyses using templates, by Cedric Dussud - Jul 14, 2021.
We've been using shared data analyses in production for three years. Here's how you can create reusable templates for common metrics and analyses.
- Geometric foundations of Deep Learning, by Michael Bronstein, Joan Bruna, Taco Cohen, and PV - Jul 14, 2021.
Geometric Deep Learning is an attempt for geometric unification of a broad class of machine learning problems from the perspectives of symmetry and invariance. These principles not only underlie the breakthrough performance of convolutional neural networks and the recent success of graph neural networks but also provide a principled way to construct new types of problem-specific inductive biases.
- SQL, Syllogisms, and Explanations, by Adrian Walker - Jul 14, 2021.
Check out the Executable English Platform, for self-explaining applications written in English that you can run in your browser.
- Top June Stories: 5 Tasks To Automate With Python; Data Scientists Will be Extinct in 10 Years, by Gregory Piatetsky - Jul 13, 2021.
5 Tasks To Automate With Python; Data Scientists Will be Extinct in 10 Years: How to Generate Automated PDF Documents with Python; How I Doubled My Income with Data Science and Machine Learning.
- Building Tech Skills in 2021, by SAS - Jul 13, 2021.
With all the workforce changes last year, it is not surprising that employees lack the skills to meet new demands. To be ready for today’s challenges, companies need sound methods to assess what skills their employees have, the ability to identify the gaps, and a plan to upskill them for success. You can read the survey results here, along with predicted learning and development trends, and insights for upskilling, cross-skilling and reskilling your workforce.
- Streamlit Tips, Tricks, and Hacks for Data Scientists, by Kaveh Bakhtiyari - Jul 13, 2021.
Today, I am going to talk about a few tips that I learned within more than a year of using Streamlit, that you can also use to unleash your powerful DS/AI/ML (whatever they may be) applications.
- AGI and the Future of Humanity, by Charles Simon - Jul 13, 2021.
The possibilities for humanity's future very likely includes at least one in which computers will exceed human abilities. Artificial General Intelligence (AGI) does not necessarily have to be all doom and gloom. However, we must begin now to understand how this technical evolution might progress and consider what actions to take now to prepare.
- How Can You Distinguish Yourself from Hundreds of Other Data Science Candidates?, by Tirthajyoti Sarkar - Jul 13, 2021.
A few easy (and not-so-easy) ways to prove to employers that your skills and attitudes place you in a higher bracket.
- Top Stories, Jul 5-11: Data Scientists and ML Engineers Are Luxury Employees, by KDnuggets - Jul 12, 2021.
Also: Pandas not enough? Here are a few good alternatives to processing larger and faster data in Python; A Learning Path To Becoming a Data Scientist; 5 Lessons McKinsey Taught Me That Will Make You a Better Data Scientist; 5 Python Data Processing Tips & Code Snippets
- KDnuggets Top Blogs Rewards for June 2021, by Gregory Piatetsky - Jul 12, 2021.
These top blogs were winners of KDnuggets Top Blog Rewards Program for June: 5 Tasks To Automate With Python; Data Scientists Will be Extinct in 10 Years; How to Generate Automated PDF Documents with Python; How I Doubled My Income with Data Science and Machine Learning; Pandas vs SQL: When Data Scientists Should Use Each Tool; Top 10 Data Science Projects for Beginners.
- Abstraction and Data Science: Not a great combination, by Venkat Raman - Jul 12, 2021.
The article is about too much abstraction and how this programming concept when extended to Data Science makes Data Science non-intuitive.
- Become an Analytics Engineer in 90 Days, by Tuan Nguyen - Jul 12, 2021.
A new role of the Analytics Engineer is an exciting opportunity that crosses the skill sets of a Data Analyst and Data Engineer. Here, we describe how this position can evolve at an organization, and recommend self-learning resources that can be used to prepare for the multifaceted responsibilities.
- How to Tell if You Have Trained Your Model with Enough Data, by Charles Martin - Jul 12, 2021.
WeightWatcher is an open-source, diagnostic tool for evaluating the performance of (pre)-trained and fine-tuned Deep Neural Networks. It is based on state-of-the-art research into Why Deep Learning Works.
- Exploring the SwAV Method, by Antonio Ferraioli - Jul 9, 2021.
This post discusses the SwAV (Swapping Assignments between multiple Views of the same image) method from the paper “Unsupervised Learning of Visual Features by Contrasting Cluster Assignments” by M. Caron et al.
- High-Performance Deep Learning: How to train smaller, faster, and better models – Part 4, by Gaurav Menghani - Jul 9, 2021.
With the right software, hardware, and techniques at your fingertips, your capability to effectively develop high-performing models now hinges on leveraging automation to expedite the experimental process and building with the most efficient model architectures for your data.
- 5 Python Data Processing Tips & Code Snippets, by Matthew Mayo - Jul 9, 2021.
This is a small collection of Python code snippets that a beginner might find useful for data processing.
- A Lightning Fast Look at Single Line Exploratory Data Analysis, by Harsha Mandala - Jul 8, 2021.
Here's a very quick look at how you can perform EDA with a single line of code using D-Tale.
- Pandas not enough? Here are a few good alternatives to processing larger and faster data in Python, by DaurEd - Jul 8, 2021.
While the Pandas library remains a crucial workhorse in data processing and management for data science, some limitations exist that can impact efficiencies, especially with very large data sets. Here, a few interesting alternatives to Pandas are introduced to improve your large data handling performance.
- MLOps is an Engineering Discipline: A Beginner’s Overview, by Angad Gupta - Jul 8, 2021.
MLOps = ML + DEV + OPS. MLOps is the idea of combining the long-established practice of DevOps with the emerging field of Machine Learning.
- eBook: How to use third-party data to make smarter decisions, by Roidna - Jul 7, 2021.
Get yourself a copy of this eBook and learn how to use third-party data to make smarter decisions.
- Relax! Data Scientists will not go extinct in 10 years, but the role will change, by Gregory Piatetsky - Jul 7, 2021.
About 70% of KDnuggets readers think that the demand for Data Scientists will increase, and 50% think it will increase significantly. At the same time, over 90% think the role of Data Scientist will change. What will the Data Scientist role be in 10 years?
- How to Get Practical Data Science Experience to be Career-Ready, by Terence Shin - Jul 7, 2021.
Becoming a professional in the field of data science takes more than just book-smarts. You need to have experience with real-world data sets, frequently-used tools, and an intuition for solutions that you can only gain from hands-on experience. These resources will jump start developing your practical skills.
- How to Build An Image Classifier in Few Lines of Code with Flash, by Irfan Alghani Khalid - Jul 7, 2021.
Introducing Flash: The high-level deep learning framework for beginners.
- ROC Curve Explained, by Zolzaya Luvsandorj - Jul 6, 2021.
Learn to visualise a ROC curve in Python.
- A Learning Path To Becoming a Data Scientist, by Sara Metwalli - Jul 6, 2021.
Becoming a professional data scientist may not be as easy as "1... 2... 3...", but these 10 steps can be your self-learning roadmap to kickstarting your future in the exciting and ever-expanding field of data science.
- How To Transition From Data Freelancer to Data Entrepreneur (Almost Overnight), by Lillian Pierson - Jul 6, 2021.
Data freelancers trade hours for dollars while data entrepreneurs have found a way to make money while they sleep. Ready to make the transition? Keep reading to learn how to do it as SEAMLESSLY and PROFITABLY as possible.
- Top Stories, Jun 28 – Jul 4: 5 Lessons McKinsey Taught Me That Will Make You a Better Data Scientist, by KDnuggets - Jul 5, 2021.
Also: What will the demand for Data Scientists be in 10 years? Will Data Scientists be extinct?; Add A New Dimension To Your Photos Using Python; Managing Your Reusable Python Code as a Data Scientist; Data Scientists are from Mars and Software Developers are from Venus
- GitHub Copilot: Your AI pair programmer – what is all the fuss about?, by Matthew Mayo - Jul 5, 2021.
GitHub just released Copilot, a code completion tool on steroids dubbed your "AI pair programmer." Read more about it, and see what all the fuss is about.
- Data Scientists and ML Engineers Are Luxury Employees, by Adrien Biarnes - Jul 5, 2021.
Maybe it seems that everyone wants to become a data scientist and every organization wants to hire one as quickly as possible. However, a mismatch often exists between what companies tend to need and what ML practitioners want to do. So, it's time for the field to take another step toward maturity through an enhanced appreciation of the broad range of technical foundations for an organization to become data-driven.
- Predict Customer Churn (the right way) using PyCaret, by Moez Ali - Jul 5, 2021.
A step-by-step guide on how to predict customer churn the right way using PyCaret that actually optimizes the business objective and improves ROI.
- Semantic Search: Measuring Meaning From Jaccard to Bert, by James Briggs - Jul 2, 2021.
In this article, we’ll cover a few of the most interesting — and powerful — of these techniques — focusing specifically on semantic search. We’ll learn how they work, what they’re good at, and how we can implement them ourselves.
- High-Performance Deep Learning: How to train smaller, faster, and better models – Part 3, by Gaurav Menghani - Jul 2, 2021.
Now that you are ready to efficiently build advanced deep learning models with the right software and hardware tools, the techniques involved in implementing such efforts must be explored to improve model quality and obtain the performance that your organization desires.
- Prepare Behavioral Questions for Data Science Interviews, by Zijing Zhu - Jul 2, 2021.
This is part 5 of a series by the author which helps readers nail the data science interviews with confidence.
- How to Use NVIDIA GPU Accelerated Libraries, by Kevin Vu - Jul 1, 2021.
If you are wondering how you can take advantage of NVIDIA GPU accelerated libraries for your AI projects, this guide will help answer questions and get you started on the right path.
- Learning Data Science Through Social Media, by Susan Sivek - Jul 1, 2021.
Want your social media algorithms to show you actual algorithms? Spare a moment during your social media scrolling to learn a bit of data science. Here are suggestions for at-a-glance access to good ideas and tips on your favorite platforms.
- 5 Lessons McKinsey Taught Me That Will Make You a Better Data Scientist, by Tessa Xie - Jul 1, 2021.
How to stand out from your peers in the data world.