Blog / News
- The Seven Best ELT Tools for Data Warehouses, by Mozart Data [Prod] - Dec 1, 2021.
ELT helps to streamline the process of modern data warehousing and managing a business’ data. In this post, we’ll discuss some of the best ELT tools to help you clean and transfer important data to your data warehouse.
- 5 Practical Data Science Projects That Will Help You Solve Real Business Problems for 2022, by Terence Shin [Tuto] - Dec 1, 2021.
This curated list of data science projects offer real-life problems that will help you master skills to demonstration that you are technically sound and know how to conduct data science projects that add business value.
- Movie Recommendations with Spark Collaborative Filtering, by Rosaria Silipo [Tuto] - Dec 1, 2021.
Not sure what movie to watch? Ask your recommender system.
- KDnuggets™ News 21:n45, Dec 1: Most Common SQL Mistakes on Data Science Interviews; Why Machine Learning Engineers are Replacing Data Scientists, by KDnuggets - Dec 1, 2021.
Most Common SQL Mistakes on Data Science Interviews; Why Machine Learning Engineers are Replacing Data Scientists; Vote in new KDnuggets Poll: What Percentage of Your Machine Learning Models Have Been Deployed? KDnuggets: Personal History and Nuggets of Experience.
- Put Responsible AI into Practice—
attend the digital event on December 7, by Microsoft [Prod] - Nov 30, 2021.
Learn best practice guidelines for building AI solutions responsibly. Join AI experts from Microsoft and BCG at Put Responsible AI into Practice—a free Azure digital event on December 7.
- Sentiment Analysis API vs Custom Text Classification: Which one to choose?, by Jérémy Lambert [Tuto] - Nov 30, 2021.
In this article, we are going to compare the sentiment extraction performance between Sentiment Analysis engines and Custom Text classification engines. The idea is to show pros and cons of these two types of engines on a concrete dataset.
- KDnuggets: Personal History and Nuggets of Experience, by Gregory Piatetsky [Opin] - Nov 30, 2021.
After 28+ years of publishing and editing KDnuggets, I am retiring and transitioning KDnuggets to Matthew Mayo, who will become the new editor-in-chief. I want to share with you my story of KDnuggets and highlight some of the useful nuggets of experience I learned along this amazing journey.
- Clustering in Crowdsourcing: Methodology and Applications, by Daniil Likhobaba [Tuto] - Nov 30, 2021.
As a result of the efforts outlined in this article, we confirmed that clustering through crowdsourcing is indeed possible and works impressively well.
- Building Massively Scalable Machine Learning Pipelines with Microsoft Synapse ML, by Jesus Rodriguez [Tuto] - Nov 30, 2021.
The new platform provides a single API to abstract dozens of ML frameworks and databases.
- New Poll: What Percentage of Your Machine Learning Models Have Been Deployed?, by Eric Siegel [Opin] - Nov 29, 2021.
Take a moment to participate in the latest KDnuggets poll and let the community know what percentage of your machine learning models have been deployed.
- Why Machine Learning Engineers are Replacing Data Scientists, by Arthur Mello [Opin] - Nov 29, 2021.
The hiring run for data scientists continues along at a strong clip around the world. But, there are other emerging roles that are demonstrating key value to organizations that you should consider based on your existing or desired skill sets.
- Top Stories, Nov 22-28: Most Common SQL Mistakes on Data Science Interviews, by KDnuggets [Top ] - Nov 29, 2021.
Also: 19 Data Science Project Ideas for Beginners; How to Build a Knowledge Graph with Neo4J and Transformers; Data Scientists: How to Sell Your Project and Yourself; Where NLP is heading
- Sentiment Analysis with KNIME, by Thiel & Rudnitckaia [Tuto] - Nov 29, 2021.
Check out this tutorial on how to approach sentiment classification with supervised machine learning algorithms.
- How to Build a Knowledge Graph with Neo4J and Transformers, by Walid Amamou [Tuto] - Nov 26, 2021.
Learn to use custom Named Entity Recognition and Relation Extraction models.
- PyCaret 2.3.5 Is Here! Learn What’s New, by Moez Ali [Tuto] - Nov 26, 2021.
Read about the new functionalities added in PyCaret’s recent release.
- A Spreadsheet that Generates Python: The Mito JupyterLab Extension, by Roman Orac [Tuto] - Nov 25, 2021.
You can call Mito into your Jupyter Environment and each edit you make will generate the equivalent Python in the code cell below.
- Cartoon: Data Science for Thanksgiving, by Gregory Piatetsky [Opin] - Nov 25, 2021.
A classic KDnuggets Thanksgiving cartoon examines the predicament of one group of fowl Data Scientists.
- What’s the difference between a Data Scientist and a Data Analyst?, by Nisha Arya Ahmed [Opin] - Nov 25, 2021.
Find out the major differences between a Data Analyst and a Data Scientist, and read the author's pointers on what they would recommend you to do if you wish to make that transition from Data Analyst to Data Scientist.
- Can You Become a Data Scientist Online?, by 365 Data Science [Prod] - Nov 24, 2021.
Until November 29th, you can join over 1.5 million students around the globe and gain the skills of successful data science professionals with unlimited annual access to the 365 Data Science Program at 72% OFF. Read on to learn more!
- Top 4 Data Integration Tools for Modern Enterprises, by Ammar Ali [Tuto] - Nov 24, 2021.
Maintaining a centralized data repository can simplify your business intelligence initiatives. Here are four data integration tools that can make data more valuable for modern enterprises.
- Accelerating AI with MLOps, by Yochay Ettun [Opin] - Nov 24, 2021.
Companies are racing to use AI, but despite its vast potential, most AI projects fail. Examining and resolving operational issues upfront can help AI initiatives reach their full potential.
- Common Misconceptions About Differential Privacy, by Lipika Ramaswamy [Tuto] - Nov 24, 2021.
This article will clarify some common misconceptions about differential privacy and what it guarantees.
- Top Stories, Nov 15-21: 19 Data Science Project Ideas for Beginners, by KDnuggets [Top ] - Nov 23, 2021.
Also: How I Redesigned over 100 ETL into ELT Data Pipelines; Where NLP is heading; Don’t Waste Time Building Your Data Science Network; Data Scientists: How to Sell Your Project and Yourself
- Most Common SQL Mistakes on Data Science Interviews, by Nate Rosidi [Tuto] - Nov 23, 2021.
Sure, we all make mistakes -- which can be a bit more painful when we are trying to get hired -- so check out these typical errors applicants make while answering SQL questions during data science interviews.
- 5 Advanced Tips on Python Sequences, by Michael Berk [Tuto] - Nov 23, 2021.
Notes from Fluent Python by Luciano Ramalho.
- 5 Tips to Get Your First Data Scientist Job, by Renato Boemer [Opin] - Nov 22, 2021.
Read some of the key things the author has learned during the infamous job seeking stage.
- On-Device Deep Learning: PyTorch Mobile and TensorFlow Lite, by Dhruv Matani [Tuto] - Nov 22, 2021.
PyTorch and TensorFlow are the two leading AI/ML Frameworks. In this article, we take a look at their on-device counterparts PyTorch Mobile and TensorFlow Lite and examine them more deeply from the perspective of someone who wishes to develop and deploy models for use on mobile platforms.
- Dask DataFrame is not Pandas, by Hugo Shi [Tuto] - Nov 22, 2021.
This article is the second article of an ongoing series on using Dask in practice. Each article in this series will be simple enough for beginners, but provide useful tips for real work. The next article in the series is about parallelizing for loops, and other embarrassingly parallel operations with dask.delayed.
- 3 Differences Between Coding in Data Science and Machine Learning, by Nahla Davies [Tuto] - Nov 19, 2021.
The terms ‘data science’ and ‘machine learning’ are often used interchangeably. But while they are related, there are some glaring differences, so let’s take a look at the differences between the two disciplines, specifically as it relates to programming.
- Stop Blaming Humans for Bias in AI, by Ahmer Inam [Opin] - Nov 19, 2021.
Can artificial intelligence be rid of bias? This is an important question, and it’s equally important that we look in the right place for the answer.
- Difference between distributed learning versus federated learning algorithms, by Aishwarya Srinivasan [Tuto] - Nov 19, 2021.
Want to know the difference between distributed and federated learning? Read this article to find out.
- eBook: 101 Ways to Use Third-Party Data to Make Smarter Decisions, by Roidna [Prod] - Nov 18, 2021.
To guide you in becoming a data-driven organization, AWS Data Exchange has created a new eBook, 101 Ways to Use Third-Party Data to Make Smarter Decisions. Learn how to transform the ‘currency’ of data into actionable business insights.
- Build a Serverless News Data Pipeline using ML on AWS Cloud, by Maria Zentsov [Tuto] - Nov 18, 2021.
This is the guide on how to build a serverless data pipeline on AWS with a Machine Learning model deployed as a Sagemaker endpoint.
- Where NLP is heading, by Paul Barba [Opin] - Nov 18, 2021.
Natural language processing research and applications are moving forward rapidly. Several trends have emerged on this progress, and point to a future of more exciting possibilities and interesting opportunities in the field.
- Data Scientists: How to Sell Your Project and Yourself, by Ilro Lee [Opin] - Nov 18, 2021.
Follow this formula for the perfect elevator pitch.
- AI meets BI: Key capabilities to look for in a modern BI platform, by Zoho Analytics [Prod] - Nov 17, 2021.
With the customer at its heart, modern augmented BI platforms no longer require scripting/coding skills or the knowledge to build the back-end data models, empowering even laymen to harness the power of raw data. As a user, here are the top AI capabilities that you need to look for in BI software.
- Easy Synthetic Data in Python with Faker, by Matthew Mayo [Tuto] - Nov 17, 2021.
Faker is a Python library that generates fake data to supplement or take the place of real world data. See how it can be used for data science.
- Inside recommendations: how a recommender system recommends, by Sciforce [Tuto] - Nov 17, 2021.
We describe types of recommender systems, more specifically, algorithms and methods for content-based systems, collaborative filtering, and hybrid systems.
- Book Metadata and Cover Retrieval Using OCR and Google Books API, by Cadili & Rudnitckaia [Tuto] - Nov 17, 2021.
With KNIME extracting critical pieces of information from images becomes as easy as ABC.
- KDnuggets™ News 21:n44, Nov 17: Don’t Waste Time Building Your Data Science Network; 19 Data Science Project Ideas for Beginners, by KDnuggets - Nov 17, 2021.
Don’t Waste Time Building Your Data Science Network; 19 Data Science Project Ideas for Beginners; How I Redesigned over 100 ETL into ELT Data Pipelines; Anecdotes from 11 Role Models in Machine Learning; The Ultimate Guide To Different Word Embedding Techniques In NLP
- How to fast-track machine translation projects, by Defined [Prod] - Nov 16, 2021.
Data is the lifeblood of any successful machine learning model, and machine translation models are no exception. Without relevant and properly labelled data, even the most sophisticated model will be unable to achieve reliable results.
- Virtual Presentation Tips for Data Scientists, by Michael Berk [Tuto] - Nov 16, 2021.
Learn how to effectively communicate your work.
- 10 AI Project Ideas in Computer Vision, by Manika Nagpal [Tuto] - Nov 16, 2021.
The field of computer vision has seen the development of very powerful applications leveraging machine learning. These projects will introduce you to these techniques and guide you to more advanced practice to gain a deeper appreciation for the sophistication now available.
- Two Simple Things You Need to Steal from Agile for Data and Analytics Work, by Jon Loyens [Tuto] - Nov 16, 2021.
Peer Review and Definition of Done: small changes, BIG impact.
- KDnuggets Top Blogs Rewards for October 2021, by Gregory Piatetsky [Top ] - Nov 15, 2021.
The October blogs that won KDnuggets Rewards include: How I Tripled My Income With Data Science in 18 Months; What Google Recommends You do Before Taking Their Machine Learning or Data Science Course; How to Build Strong Data Science Portfolio as a Beginner; Data Scientist vs Data Engineer Salary.
- What Are NVIDIA NGC Containers & How to Get Started Using Them, by Kevin Vu [Tuto] - Nov 15, 2021.
NVIDIA, the pioneer in the GPU technologies and deep learning revolution, has come up with an excellent catalog of specialized containers that they call NGC Collections. In this article, we explore their basic usage and some variations.
- 19 Data Science Project Ideas for Beginners, by Zulie Rane [Tuto] - Nov 15, 2021.
This article features 19 data science projects for beginners, categorized into 7 full project tutorials, 5 places to come up with your own data science projects using data, and 7 skills-based data science projects.
- Top Stories, Nov 8-14: Don’t Waste Time Building Your Data Science Network, by KDnuggets [Top ] - Nov 15, 2021.
Also: Data Scientist Career Path from Novice to First Job; Design Patterns for Machine Learning Pipelines; What Google Recommends You do Before Taking Their Machine Learning or Data Science Course; Salary Breakdown of the Top Data Science Jobs
- How I Redesigned over 100 ETL into ELT Data Pipelines, by Nicholas Leong [Tuto] - Nov 15, 2021.
Learn how to level up your Data Pipelines!
- Anecdotes from 11 Role Models in Machine Learning, by Robert Munro [Opin] - Nov 12, 2021.
The skills needed to create good data are also the skills needed for good leadership.
- Deep Learning on your phone: PyTorch C++ API for use on Mobile Platforms, by Dhruv Matani [Tuto] - Nov 12, 2021.
The PyTorch Deep Learning framework has a C++ API for use on mobile platforms. This article shows an end-to-end demo of how to write a simple C++ application with Deep Learning capabilities using the PyTorch C++ API such that the same code can be built for use on mobile platforms (both Android and iOS).
- 25 Github Repositories Every Python Developer Should Know, by Abhay Parashar [Tuto] - Nov 12, 2021.
Check out these repositories to help you improve your data science skills.
- Top October Stories: How I Tripled My Income With Data Science in 18 Months; What Google Recommends You do Before Taking Their ML or DS Course, by KDnuggets [Top ] - Nov 11, 2021.
Also: How to Build Strong Data Science Portfolio as a Beginner; Data Science Portfolio Project Ideas That Can Get You Hired (Or Not); Exclusive: OpenAI summarizes KDnuggets
- Attend the Data Intelligence Summit to Learn from Data Thought Leaders, by Caserta [Prod] - Nov 11, 2021.
Join Caserta and fellow data and analytics leaders, Nov 17, as they help guide you on how, what and why you need to transform your data ecosystem to cloud-based modern analytics.
- What’s missing from self-serve BI and what we can do about it, by Benn Stancil [Opin] - Nov 11, 2021.
The notion of self-service BI tools caught an expectation that they could provide a magic formula for easily helping everyone understand all the data. But, such an end-result isn't occurring in practice. To identify a better approach, we need to take a step back and determine what problem is actually trying to be solved.
- Dream Come True: Building websites by thinking about them, by Ajay, Agarwal & Nema [Tuto] - Nov 11, 2021.
From the mind to the computer, make websites using your imagination!
- AWS Data Exchange Webinar: Maintain competitive edge with third-party financial services data, by Roidna [Prod] - Nov 10, 2021.
Join this webinar, Nov 11, to learn how leveraging third-party financial services data can facilitate faster, intelligence-based decision-making that propels your company's business outcomes and digital transformation.
- 5 Things That Set a Data Scientist Apart From Other Professions, by Matthew Mayo [Opin] - Nov 10, 2021.
Here are five things that help set the data scientist apart from other professions.
- The Ultimate Guide To Different Word Embedding Techniques In NLP, by Neeraj Agarwal [Tuto] - Nov 10, 2021.
A machine can only understand numbers. As a result, converting text to numbers, called embedding text, is an actively researched topic. In this article, we review different word embedding techniques for converting text into vectors.
- Don’t Waste Time Building Your Data Science Network, by Kurtis Pykes [Opin] - Nov 10, 2021.
Instead, focus on what matters.
- KDnuggets™ News 21:n43, Nov 10: Data Scientist Career Path from Novice to First Job; Neural Networks from a Bayesian Perspective, by KDnuggets - Nov 10, 2021.
Data Scientist Career Path: from Novice to First Job; Understand Neural Networks from a Bayesian Perspective; The Best Ways for Data Professionals to Market AWS Skills; Build Your Own Automated Machine Learning App.
- KDnuggets Top Blogs Rewards Program Resumes in December, by Gregory Piatetsky [Opin] - Nov 9, 2021.
After a pause, we will be resuming KDnuggets Top Blog Rewards Program, starting with blogs published on KDnuggets in December. The program will be bigger, with $3,000 (USD) divided among top 8 most viewed guest blogs. Original blogs rewarded at the rate of 3X of reposts. Submit your original blog to KDnuggets first !
- SAS Analytics Pro – now available for on-site or containerized cloud-native deployment – providing your entry point into SAS Viya, by SAS [Prod] - Nov 9, 2021.
Now, SAS Analytics Pro includes a new option for containerized cloud-native deployment. This makes SAS Analytics Pro a perfect entry point into SAS Viya.
- OpenAI’s Approach to Solve Math Word Problems, by Jesus Rodriguez [Tuto] - Nov 9, 2021.
OpenAI's latest research aims to solve math word problems. Let's dive a bit deeper into the ideas behind this new research.
- The Common Misconceptions About Machine Learning, by Abid Ali Awan [Opin] - Nov 9, 2021.
Beginners in the field can often have many misconceptions about machine learning that sometimes can be a make-it-or-break-it moment for the individual switching careers or starting fresh. This article clearly describes the ground truth realities about learning new ML skills and eventually working professionally as a machine learning engineer.
- What Comes After HDF5? Seeking a Data Storage Format for Deep Learning, by Davit Buniatyan [Tuto] - Nov 9, 2021.
In this article we are discussing that HDF5 is one of the most popular and reliable formats for non-tabular, numerical data. But this format is not optimized for deep learning work. This article suggests what kind of ML native data format should be to truly serve the needs of modern data scientists.
- SigOpt AI & HPC Summit, Nov 16 – Virtual and Free, by Sigopt [Prod] - Nov 8, 2021.
Learn how PayPal, AWS, Intel, Accenture, MIT and Stanford apply experimentation to build better AI at the free SigOpt AI & HPC Summit.
- POS Tagging, Explained, by Filiberto Emanuele [Tuto] - Nov 8, 2021.
Learn about the strengths of part-of-speech tagging, and about how a strong POS tagger can contribute to natural language understanding.
- Top Stories, Nov 1-7: What Google Recommends You do Before Taking Their Machine Learning or Data Science Course, by KDnuggets [Top ] - Nov 8, 2021.
Also: Design Patterns for Machine Learning Pipelines; Data Scientist Career Path from Novice to First Job; Salary Breakdown of the Top Data Science Jobs; ORDAINED: The Python Project Template
- 7 Top Open Source Datasets to Train Natural Language Processing (NLP) & Text Models, by Kevin Vu [Tuto] - Nov 8, 2021.
With a lot of excitement and research around NLP, there are growing opportunities to apply these technologies to real-world scenarios. It's not trivial to become familiar with NLP and these open-source data sets can help you increase your skills.
- Federated Learning: Google’s Take, by Aishwarya Srinivasan [Opin] - Nov 8, 2021.
This blog will be focusing on the work Google has been doing in the Federated Learning space.
- Build Your Own Automated Machine Learning App, by Matthew Mayo [Tuto] - Nov 5, 2021.
In this article, we will create an automated machine learning web app you can actually use.
- Machine Learning Safety: Unsolved Problems, by Dan Hendrycks [Opin] - Nov 5, 2021.
There remain critical challenges in machine learning that, if left resolved, could lead to unintended consequences and unsafe use of AI in the future. As an important and active area of research, roadmaps are being developed to help guide continued ML research and use toward meaningful and robust applications.
- The Best Ways for Data Professionals to Market AWS Skills in 2022, by Devin Partida [Opin] - Nov 5, 2021.
Knowing your way around Amazon Web Services (AWS) is increasingly useful. Here are five ways to market your AWS skills in today’s job market.
- Toloka 101 Live Demo: Learn how to get reliable training data for machine learning, Nov 11, by Toloka [Prod] - Nov 4, 2021.
Toloka is a crowdsourced data labeling platform that handles data collection and annotation projects for machine learning at any scale. In this Nov 11 Live Demo, Learn how to get reliable training data for machine learning.
- A First Principles Theory of Generalization, by Jesus Rodriguez [Opin] - Nov 4, 2021.
Some new research from University of California, Berkeley shades some new light into how to quantify neural networks knowledge.
- AI Infinite Training & Maintaining Loop, by Roey Mechrez [Tuto] - Nov 4, 2021.
Productizing AI is an infrastructure orchestration problem. In planning your solution design, you should use continuous monitoring, retraining, and feedback to ensure stability and sustainability.
- NLP for Business in the Time of BERTera: Seven Misplaced Passions, by Anand Ramanathan [Tuto] - Nov 4, 2021.
This article is a brief summary of our observations on some common client misperceptions with respect to recent developments in NLP, especially the use of large-scale models and datasets.
- 7 of The Coolest Machine Learning Topics of 2021 at ODSC West, by ODSC [Prod] - Nov 3, 2021.
At our upcoming event this November 16th-18th in San Francisco, ODSC West 2021 will feature a plethora of talks, workshops, and training sessions on machine learning topics, deep learning, NLP, MLOps, and so on. You can register now for 20% off all ticket types, or register for a free AI Expo Pass to see what some big names in AI are doing now.
- Visual Scoring Techniques for Classification Models, by Maarit Widmann [Tuto] - Nov 3, 2021.
Read this article assessing a model performance in a broader context.
- Data Scientist Career Path from Novice to First Job, by Nate Rosidi [Tuto] - Nov 3, 2021.
If you are beginning your data science journey, then you must be prepared to plan it out as a step-by-step process that will guide you from being a total newbie to getting your first job as a data scientist. These tips and educational resources should be useful for you and add confidence as you take that first big step.
- Neural Networks from a Bayesian Perspective, by Zeldes & Naor [Tuto] - Nov 3, 2021.
This article looks at neural networks from a Bayesian perspective.
- KDnuggets™ News 21:n42, Nov 3: Google Recommendations Before Taking Their Machine Learning Course; Guide to Data Science Jobs, by KDnuggets - Nov 3, 2021.
What Google Recommends You do Before Taking Their Machine Learning or Data Science Course; A Guide to 14 Different Data Science Jobs; Analyze Python Code in Jupyter Notebooks; Machine Learning Model Development and Model Operations: Principles and Practices; Want to Join a Bank? Everything Data Scientists Need to Know About Working in Fintech
- Three reasons to self-host your product analytics, by PostHog [Prod] - Nov 2, 2021.
Want three reasons to avoid the cloud and host your own analytics platform? More data, more control, more secure.
- ORDAINED: The Python Project Template, by Bryan Patrick Wood [Tuto] - Nov 2, 2021.
Recently I decided to take the time to better understand the Python packaging ecosystem and create a project boilerplate template as an improvement over copying a directory tree and doing find and replace.
- Design Patterns for Machine Learning Pipelines, by David Buniatyan [Tuto] - Nov 2, 2021.
ML pipeline design has undergone several evolutions in the past decade with advances in memory and processor performance, storage systems, and the increasing scale of data sets. We describe how these design patterns changed, what processes they went through, and their future direction.
- Salary Breakdown of the Top Data Science Jobs, by Matthew Przybyla [Tuto] - Nov 2, 2021.
Machine Learning vs NLP vs Data Engineer vs Data Scientist, and what it means to be in each role.
- Top Stories, Oct 25-31: How I Tripled My Income With Data Science in 18 Months; Machine Learning Model Development and Model Operations: Principles and Practices, by KDnuggets [Top ] - Nov 1, 2021.
Also: What Google Recommends You do Before Taking Their Machine Learning or Data Science Course; Learn To Reproduce Papers: Beginner’s Guide; 365 Data Science courses free until 18 November; A Guide to 14 Different Data Science Jobs
- Advanced PyTorch Lightning with TorchMetrics and Lightning Flash, by Kevin Vu [Tuto] - Nov 1, 2021.
In this tutorial we will be diving deeper into two additional tools you should be using: TorchMetrics and Lightning Flash. TorchMetrics unsurprisingly provides a modular approach to define and track useful metrics across batches and devices, while Lightning Flash offers a suite of functionality facilitating more efficient transfer learning and data handling, and a recipe book of state-of-the-art approaches to typical deep learning problems.
- Top 5 Time Series Methods, by Pranay Dave [Tuto] - Nov 1, 2021.
Data that varies in time can offer powerful applications and use cases for data scientists to analyze. This overview considers the top techniques you can learn to understand and gain insight from time-series data.