- 5 Things That Make My Job as a Data Scientist Easier - Aug 23, 2021.
After working as a Data Scientist for a year, I am here to share some things I learnt along the way that I feel are helpful and have increased my efficiency. Hopefully some of these tips can help you in your journey :)
- ROC Curve Explained - Jul 6, 2021.
Learn to visualise a ROC curve in Python.
- KDnuggets™ News 21:n18, May 12: Data Preparation in SQL, with Cheat Sheet!; Rebuilding 7 Python Projects - May 12, 2021.
Data Preparation in SQL, with Cheat Sheet!; Rebuilding My 7 Python Projects; Applying Python’s Explode Function to Pandas DataFrames; Essential Linear Algebra for Data Science and Machine Learning; Similarity Metrics in NLP
- Similarity Metrics in NLP - May 10, 2021.
This post covers the use of euclidean distance, dot product, and cosine similarity as NLP similarity metrics.
- Metric Matters, Part 2: Evaluating Regression Models - Mar 23, 2021.
In this second part review of the many options available for choosing metrics to evaluate machine learning models, learn how to select the most appropriate metric for your analysis of regression models.
- Metric Matters, Part 1: Evaluating Classification Models - Mar 16, 2021.
You have many options when choosing metrics for evaluating your machine learning models. Select the right one for your situation with this guide that considers metrics for classification models.
- 4 Machine Learning Concepts I Wish I Knew When I Built My First Model - Mar 9, 2021.
Diving into building your first machine learning model will be an adventure -- one in which you will learn many important lessons the hard way. However, by following these four tips, your first and subsequent models will be put on a path toward excellence.
- Evaluating Object Detection Models Using Mean Average Precision - Mar 3, 2021.
In this article we will see see how precision and recall are used to calculate the Mean Average Precision (mAP).
- KDnuggets™ News 21:n08, Feb 24: Powerful Exploratory Data Analysis in just two lines of code; Cartoon: Data Scientist vs Data Engineer - Feb 24, 2021.
Powerful Exploratory Data Analysis in just two lines of code; Cartoon: Data Scientist vs Data Engineer; Evaluating Deep Learning Models: The Confusion Matrix, Accuracy, Precision, and Recall; Feature Store as a Foundation for Machine Learning; Approaching (Almost) Any Machine Learning Problem
- Evaluating Deep Learning Models: The Confusion Matrix, Accuracy, Precision, and Recall - Feb 19, 2021.
This tutorial discusses the confusion matrix, and how the precision, recall and accuracy are calculated, and how they relate to evaluating deep learning models.
- How to Create Custom Real-time Plots in Deep Learning - Dec 14, 2020.
How to generate real-time visualizations of custom metrics while training a deep learning model using Keras callbacks.
- Essential Math for Data Science: Integrals And Area Under The Curve - Nov 25, 2020.
In this article, you’ll learn about integrals and the area under the curve using the practical data science example of the area under the ROC curve used to compare the performances of two machine learning models.
- Simple Python Package for Comparing, Plotting & Evaluating Regression Models - Nov 25, 2020.
This package is aimed to help users plot the evaluation metric graph with single line code for different widely used regression model metrics comparing them at a glance. With this utility package, it also significantly lowers the barrier for the practitioners to evaluate the different machine learning algorithms in an amateur fashion by applying it to their everyday predictive regression problems.
- Most Popular Distance Metrics Used in KNN and When to Use Them - Nov 11, 2020.
For calculating distances KNN uses a distance metric from the list of available metrics. Read this article for an overview of these metrics, and when they should be considered for use.
- Goodhart’s Law for Data Science and what happens when a measure becomes a target? - Oct 14, 2020.
When developing analytics and algorithms to better understand a business target, unintended biases can sneak in that ensure desired outcomes are obtained. Guiding your work with multiple metrics in mind can help avoid such consequences of Goodhart's Law.
- 5 Concepts Every Data Scientist Should Know - Oct 2, 2020.
Once a Data Scientist, there are certain skills you will apply each and every day of your career. Some of these might be common techniques you learned during your education, while others may develop fully only after you become more established in your organization. Continuing to hone these skills will provide you with valuable professional benefits.
- Metrics to Use to Evaluate Deep Learning Object Detectors - Aug 6, 2020.
It's important to understand which metric should be used to evaluate trained object detectors and which one is more important. Is mAP alone enough to evaluate the objector models? Can the same metric be used to evaluate object detectors on validation set and test set?
- PyTorch Multi-GPU Metrics Library and More in New PyTorch Lightning Release - Jul 2, 2020.
PyTorch Lightning, a very light-weight structure for PyTorch, recently released version 0.8.1, a major milestone. With incredible user adoption and growth, they are continuing to build tools to easily do AI research.
- 3 Key Data Science Questions to Ask Your Big Data - Jun 3, 2020.
The process of understanding your data begins by asking 3 questions at the highest level, and then iteratively asking hundreds of cascading questions to get deeper insights.
- Model Evaluation Metrics in Machine Learning - May 28, 2020.
A detailed explanation of model evaluation metrics to evaluate a classification machine learning model.
- More Performance Evaluation Metrics for Classification Problems You Should Know - Apr 3, 2020.
When building and optimizing your classification model, measuring how accurately it predicts your expected outcome is crucial. However, this metric alone is never the entire story, as it can still offer misleading results. That's where these additional performance evaluations come into play to help tease out more meaning from your model.
- Why you should NOT use MS MARCO to evaluate semantic search - Apr 2, 2020.
If we want to investigate the power and limitations of semantic vectors (pre-trained or not), we should ideally prioritize datasets that are less biased towards term-matching signals. This piece shows that the MS MARCO dataset is more biased towards those signals than we expected and that the same issues are likely present in many other datasets due to similar data collection designs.
- A simple and interpretable performance measure for a binary classifier - Mar 4, 2020.
Binary classification tasks are the bread and butter of machine learning. However, the standard statistic for its performance is a mathematical tool that is difficult to interpret -- the ROC-AUC. Here, a performance measure is introduced that simply considers the probability of making a correct binary classification.
- Recommender System Metrics: Comparing Apples, Oranges and Bananas - Feb 11, 2020.
This article will discuss a sometimes-overlooked aspect of what distinguishes recommender systems from other machine learning tasks: added uncertainties of measuring them.
- The 5 Most Useful Techniques to Handle Imbalanced Datasets - Jan 22, 2020.
This post is about explaining the various techniques you can use to handle imbalanced datasets.
- Beginner’s Guide to K-Nearest Neighbors in R: from Zero to Hero - Jan 3, 2020.
This post presents a pipeline of building a KNN model in R with various measurement metrics.
- Top KDnuggets tweets, Dec 04-10: AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments in 2019 and Key Trends for 2020 - Dec 11, 2019.
AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments and Key Trends; Down with technical debt! Clean #Python for #DataScientists; Calculate Similarity - the most relevant Metrics in a Nutshell.
- Top KDnuggets tweets, Oct 16-22: How YouTube is Recommending Your Next Video - Oct 23, 2019.
Also: The 5 Classification Evaluation Metrics Every Data Scientist Must Know; How to Recognize a Good Data Scientist Job From a Bad One; How to Easily Deploy Machine Learning Models Using Flask.
- The 5 Classification Evaluation Metrics Every Data Scientist Must Know - Oct 16, 2019.
This post is about various evaluation metrics and how and when to use them.
- KDnuggets™ News 19:n39, Oct 16: Key Ideas in Document Embedding; The problem with metrics is a big problem for AI - Oct 16, 2019.
This week on KDnuggets: Beyond Word Embedding: Key Ideas in Document Embedding; The problem with metrics is a big problem for AI; Activation maps for deep learning models in a few lines of code; There is No Such Thing as a Free Lunch; 8 Paths to Getting a Machine Learning Job Interview; and much, much more.
- Upcoming Webinar, Machine Learning Vital Signs: Metrics and Monitoring Models in Production - Oct 11, 2019.
In this upcoming webinar on Oct 23 @ 10 AM PT, learn why you should invest time in monitoring your machine learning models, the dangers of not paying attention to how a model’s performance can change over time, metrics you should be gathering for each model and what they tell you, and much more.
- The problem with metrics is a big problem for AI - Oct 11, 2019.
The practice of optimizing metrics is not new nor unique to AI, yet AI can be particularly efficient (even too efficient!) at doing so.
- Clustering Metrics Better Than the Elbow Method - Oct 1, 2019.
We show what metric to use for visualizing and determining an optimal number of clusters much better than the usual practice — elbow method.
- 6 bits of advice for Data Scientists - Sep 25, 2019.
As a data scientist, you can get lost in your daily dives into the data. Consider this advice to be certain to follow in your work for being diligent and more impactful for your organization.
- 6 Key Concepts in Andrew Ng’s “Machine Learning Yearning” - Aug 12, 2019.
If you are diving into AI and machine learning, Andrew Ng's book is a great place to start. Learn about six important concepts covered to better understand how to use these tools from one of the field's best practitioners and teachers.
- Advanced Keras — Constructing Complex Custom Losses and Metrics - Apr 8, 2019.
In this tutorial I cover a simple trick that will allow you to construct custom loss functions in Keras which can receive arguments other than
- Comparison of the Text Distance Metrics - Jan 7, 2019.
There are many different approaches of how to compare two texts (strings of characters). Each has its own advantages and disadvantages and is good only for a range of specific use cases.
- Using Confusion Matrices to Quantify the Cost of Being Wrong - Oct 11, 2018.
The terms ‘true condition’ (‘positive outcome’) and ‘predicted condition’ (‘negative outcome’) are used when discussing Confusion Matrices. This means that you need to understand the differences (and eventually the costs associated) with Type I and Type II Errors.
- Receiver Operating Characteristic Curves Demystified (in Python) - Jul 20, 2018.
In this blog, I will reveal, step by step, how to plot an ROC curve using Python. After that, I will explain the characteristics of a basic ROC curve.
- Analyzing Personalization Results - Jun 27, 2018.
The 4th part of this series will help answer the following questions: “Should I improve something or make changes to the system? Can it work more effectively? Can I squeeze the lion’s share of it?”
- Choosing the Right Metric for Evaluating Machine Learning Models — Part 2 - Jun 19, 2018.
This will focus on commonly used metrics in classification, why should we prefer some over others with context.
- 7 Useful Suggestions from Andrew Ng “Machine Learning Yearning” - May 8, 2018.
Machine Learning Yearning is a book by AI and Deep Learning guru Andrew Ng, focusing on how to make machine learning algorithms work and how to structure machine learning projects. Here we present 7 very useful suggestions from the book.
- KDnuggets™ News 18:n18, May 2: Blockchain Explained in 7 Python Functions; Data Science Dirty Secret; Choosing the Right Evaluation Metric - May 2, 2018.
Also: Building Convolutional Neural Network using NumPy from Scratch; Data Science Interview Guide; Implementing Deep Learning Methods and Feature Engineering for Text Data: The GloVe Model; Jupyter Notebook for Beginners: A Tutorial
- Operational Machine Learning: Seven Considerations for Successful MLOps - Apr 30, 2018.
In this article, we describe seven key areas to take into account for successful operationalization and lifecycle management (MLOps) of your ML initiatives
- Choosing the Right Metric for Evaluating Machine Learning Models – Part 1 - Apr 27, 2018.
Each machine learning model is trying to solve a problem with a different objective using a different dataset and hence, it is important to understand the context before choosing a metric.
- Machine Learning Model Metrics - Jan 23, 2018.
In this article we explore how to calculate machine learning model metrics, using the example of fraud detection. We'll see lots of different ways that we can try to understand just how good our learned model is.
Pages: 1 2
- Learning Curves for Machine Learning - Jan 17, 2018.
But how do we diagnose bias and variance in the first place? And what actions should we take once we've detected something? In this post, we'll learn how to answer both these questions using learning curves.
Pages: 1 2
- The danger in comparing your campaign performance against an average - Oct 26, 2017.
Performance measurement is only meaningful when compared against a benchmark. While “average” is a good, and easy to understand metric, it could be very deceptive.
- Kanri Distance Calculator Free License Version with Demo - Oct 23, 2017.
Kanri invites you to a demo where you can receive a free version of the Kanri Distance Calculator, analytics software that takes big data and individualizes results down to individual participant.
- The Marketing Metrics and Analytics Summit, Chicago, Sep 26-27 - Jul 26, 2017.
The summit will provide you with all the practical know-how you need to take your organization's marketing measurement game to the next level.
- Kanri Distance Calculator(tm) – patented solution applying power of Big Data to an Individual - Mar 21, 2017.
Kanri combination of patented statistical and process methods provide a powerful ability to evaluate large data, tells users the exact distance from target, and variable contributions for participant. Free trial and 88% KDnuggets discount for the first 100 buyers.
- Analytics 101: Comparing KPIs - Mar 20, 2017.
Different business units in the organisation have different behaviours (e.g. turnover rate) and they can’t be compared with each other. So, how can we tell whether the changes in their behaviour are reasons for concern?
- The Top 5 KPIs to Consider When Measuring Your Campaign - Feb 28, 2017.
When it comes to measuring marketing campaign performance or analysing customers in any business, below top 5 Key Performance Indicators (KPIs) needs to be used to strategically drive the business.
- Marketing Metrics and Analytics Summit, New York, Apr 26-27 – KDnuggets Offer - Feb 2, 2017.
Designed to be at the intersection of marketing, data science, and analytics, this summit will discuss common challenges and pain points, discover new cutting-edge technology tools and solutions, and to connect and network. Use discount code KDN15 to save.
- The Best Metric to Measure Accuracy of Classification Models - Dec 7, 2016.
Measuring accuracy of model for a classification problem (categorical output) is complex and time consuming compared to regression problems (continuous output). Let’s understand key testing metrics with example, for a classification problem.
Pages: 1 2
- Metrics Gone Wrong – How Companies Are Optimizing The Wrong Way - Apr 20, 2016.
A critique of the over-abundant and misguided pursuit of metric completeness, and how it can result in incorrect "optimization."
- Lift Analysis – A Data Scientist’s Secret Weapon - Mar 22, 2016.
Gain insight into using lift analysis as a metric for doing data science. Understand how to use it for evaluating the performance and quality of a machine learning model.
- Employee Engagement – a Tricky Metric for Predictive Analytics - Feb 22, 2016.
Predictive analytics for workforce has developed significantly in recent times. Here we focus on an important discovery about Employee Engagement metric – why it is tricky.
- Early bird ends for 3 analytics events, Deadline: Feb. 5 - Jan 26, 2016.
Lights, camera, and analytics action! Early bird rates end Feb 5th for three incredible analytics conferences in San Francisco. Save with KDnuggets code KDN150.
- Interview: Joseph Babcock, Netflix on Curiosity and Courage – Key for Success in Data Science - Jun 17, 2015.
We discuss discovery vs. personalization, advice, trends, desired skills in data scientists, and more.
- Interview: Sheridan Hitchens, Auction.com on Customer Lifetime Value as the Cornerstone for Marketing Analytics - May 15, 2015.
We discuss Customer Lifetime Value (CLV) metric, maturity level for the CLV metric, different models for calculating it, challenges in designing strategy based on CLV and tackling attribution.
- Interview: Haile Owusu, Mashable on Riding the Wave of Viral Content - Apr 29, 2015.
We discuss Mashable’s milestones, data-driven digital publishing, digital media tracking, viral prediction, and Mashable Velocity.
- Sports Analytics Innovation Summit 2014 San Francisco: Day 2 Highlights - Oct 11, 2014.
Highlights from the presentations by Analytics leaders from San Francisco Giants, New York University and LA Dodgers on day 2 of Sports Analytics Innovation Summit 2014 in San Francisco.
- Interview: Arpit Gupta, CEO, Actionable Analytics on Enterprise Challenges in Big Data and Cloud - Aug 24, 2014.
We discuss Actionable Analytics start-up, enterprise challenges in Big Data, relationship with cloud computing, metrics vs. insights, Big Data expectations and more.
- Interview: Pallas Horwitz, Blue Shell Games on Why Data Science is So Critical for Gaming Studios - Aug 14, 2014.
We discuss the role of data science at Blue Shell Games, the importance of "Lean Data", key metrics for online games, cross-product projects and optimizing meeting the data needs across an organization.
- Top KDnuggets tweets, Aug 8-10: Forget SQL vs NoSQL. New trend is HTAP: Hybrid Transaction/Analytical Processing - Aug 11, 2014.
Forget SQL vs NoSQL. New trend is HTAP: Hybrid Transaction/Analytical Processing; Metrics that Matter - The Key to Perfect Dashboards; Machine Learning Tutorial: The Max Entropy Text Classifier ; Six Thinking Hats and the Life of a Data Scientist.
- Metrics that Matter – The Key to Perfect Dashboards - Aug 9, 2014.
Create the perfect data visualization dashboards by learning what metrics matter most to your users and displaying them prominently within the design of the dashboard.
- Interview: Aparna Pujar, eBay on Evolution of Behavior Analytics for User Engagement - Jul 25, 2014.
We discuss Behavior Analytics vs. Web Analytics, important metrics for user engagement, challenges of behavior insights domain, future of multi-screen analytics, key soft skill and more.
- Interview: Cliff Lyon, Stubhub on Mastering Recommendation & Personalization Analytics Part 2 - Jul 19, 2014.
We discuss current trends, future vision, interesting correlations, privacy concerns, and advice for Data Science practitioners.
- Interview: Cliff Lyon, Stubhub on Mastering the Art of Recommendation and Personalization Analytics - Jul 18, 2014.
We discuss challenges in designing recommendation and personalization systems, how to select the right metrics, and learning regarding presentation of recommendation on different channels.
- Manufacturing Analytics Summit 2014 Chicago: Day 2 Highlights - Jul 17, 2014.
Highlights from the presentations by Analytics leaders from World Fuel Services, Vigilent Corporation, Caterpillar and SunEdison on day 2 of Manufacturing Analytics Summit 2014 in Chicago.
- Interview: Lloyd Tabb, Chairman & CTO, Looker on Front-line Analytics and Data Democratization - Jun 9, 2014.
We discuss the capabilities of Looker, data democratization across organization, change in the tools being used by analytics-savvy business managers, front-line analytics, competitive landscape and more.
- Amazon: Sr. Business Intelligence Engineer, Video Advertising - Feb 18, 2014.
An outstanding BI engineer to design how our data will be stored and used, extract meaning from billions of data points, and automate processes to feed the right data into our machine learning engine.
- Viewpoint: Social Media Analysis: What is missing - Feb 15, 2014.
Social Media Analysis is a powerful tool if we discover customer sentiment from millions of online sources and not just go behind the numbers. Businesses are using the power of social media to gain a better understanding of their markets.