- Undersampling Will Change the Base Rates of Your Model’s Predictions - Dec 17, 2020.
In classification problems, the proportion of cases in each class largely determines the base rate of the predictions produced by the model. Therefore if you use sampling techniques that change this proportion, there is a good chance you will want to rescale / calibrate your predictions before using them in the wild.
- Simple & Intuitive Ensemble Learning in R - Dec 2, 2020.
Read about metaEnsembleR, an R package for heterogeneous ensemble meta-learning (classification and regression) that is fully-automated.
- Essential Data Science Tips: How to Use One-Vs-Rest and One-Vs-One for Multi-Class Classification - Aug 6, 2020.
Classification, as a predictive model, involves aligning each class label to examples. Algorithms designed for binary classification cannot be applied to multi-class classification problems. For such situations, heuristic methods come in handy.
- Spam Filter in Python: Naive Bayes from Scratch - Jul 8, 2020.
In this blog post, learn how to build a spam filter using Python and the multinomial Naive Bayes algorithm, with a goal of classifying messages with a greater than 80% accuracy.
- A Classification Project in Machine Learning: a gentle step-by-step guide - Jun 17, 2020.
Classification is a core technique in the fields of data science and machine learning that is used to predict the categories to which data should belong. Follow this learning guide that demonstrates how to consider multiple classification models to predict data scrapped from the web.
- Model Evaluation Metrics in Machine Learning - May 28, 2020.
A detailed explanation of model evaluation metrics to evaluate a classification machine learning model.
- More Performance Evaluation Metrics for Classification Problems You Should Know - Apr 3, 2020.
When building and optimizing your classification model, measuring how accurately it predicts your expected outcome is crucial. However, this metric alone is never the entire story, as it can still offer misleading results. That's where these additional performance evaluations come into play to help tease out more meaning from your model.
- A simple and interpretable performance measure for a binary classifier - Mar 4, 2020.
Binary classification tasks are the bread and butter of machine learning. However, the standard statistic for its performance is a mathematical tool that is difficult to interpret -- the ROC-AUC. Here, a performance measure is introduced that simply considers the probability of making a correct binary classification.
- Linear to Logistic Regression, Explained Step by Step - Mar 3, 2020.
Logistic Regression is a core supervised learning technique for solving classification problems. This article goes beyond its simple code to first understand the concepts behind the approach, and how it all emerges from the more basic technique of Linear Regression.
- Classify A Rare Event Using 5 Machine Learning Algorithms - Jan 15, 2020.
Which algorithm works best for unbalanced data? Are there any tradeoffs?
- Idiot’s Guide to Precision, Recall, and Confusion Matrix - Jan 13, 2020.
Building Machine Learning models is fun, but making sure we build the best ones is what makes a difference. Follow this quick guide to appreciate how to effectively evaluate a classification model, especially for projects where accuracy alone is not enough.
- Beginners Guide to the Three Types of Machine Learning - Nov 13, 2019.
The following article is an introduction to classification and regression — which are known as supervised learning — and unsupervised learning — which in the context of machine learning applications often refers to clustering — and will include a walkthrough in the popular python library scikit-learn.
- Designing Your Neural Networks - Nov 4, 2019.
Check out this step-by-step walk through of some of the more confusing aspects of neural nets to guide you to making smart decisions about your neural network architecture.
- Reddit Post Classification - Sep 18, 2019.
This article covers the implementation of a data scraping and natural language processing project which had two parts: scrape as many posts from Reddit’s API as allowed &then use classification models to predict the origin of the posts.
- Understanding Decision Trees for Classification in Python - Aug 21, 2019.
This tutorial covers decision trees for classification also known as classification trees, including the anatomy of classification trees, how classification trees make predictions, using scikit-learn to make classification trees, and hyperparameter tuning.
- 7 Steps to Mastering Intermediate Machine Learning with Python — 2019 Edition - Jun 3, 2019.
This is the second part of this new learning path series for mastering machine learning with Python. Check out these 7 steps to help master intermediate machine learning with Python!
- Neural Networks seem to follow a puzzlingly simple strategy to classify images - Mar 5, 2019.
We explain why state-of-the-art Deep Neural Networks can still recognize scrambled images perfectly well and how this helps to uncover a puzzlingly simple strategy that DNNs seem to use to classify natural images.
- Using Caret in R to Classify Term Deposit Subscriptions for a Bank - Feb 4, 2019.
This article uses direct marketing campaign data from a Portuguese banking institution to predict if a customer will subscribe for a term deposit. We’ll be working with R’s Caret package to achieve this.
- 7 Steps to Mastering Basic Machine Learning with Python — 2019 Edition - Jan 29, 2019.
With a new year upon us, I thought it would be a good time to revisit the concept and put together a new learning path for mastering machine learning with Python. With these 7 steps you can master basic machine learning with Python!
- The Essence of Machine Learning - Dec 28, 2018.
And so now, as an exercise in what may seem to be semantics, let's explore some 30,000 feet definitions of what machine learning is.
- Synthetic Data Generation: A must-have skill for new data scientists - Dec 27, 2018.
A brief rundown of methods/packages/ideas to generate synthetic data for self-driven data science projects and deep diving into machine learning methods.
Pages: 1 2
- Solve any Image Classification Problem Quickly and Easily - Dec 13, 2018.
This article teaches you how to use transfer learning to solve image classification problems. A practical example using Keras and its pre-trained models is given for demonstration purposes.
Pages: 1 2
- KDnuggets™ News 18:n42, Nov 7: The Most in Demand Skills for Data Scientists; How Machines Understand Our Language: Intro to NLP - Nov 7, 2018.
Also: Machine Learning Classification: A Dataset-based Pictorial; Quantum Machine Learning: A look at myths, realities, and future projections; Multi-Class Text Classification Model Comparison and Selection; Top 13 Python Deep Learning Libraries
- Unfolding Naive Bayes From Scratch - Sep 25, 2018.
Whether you are a beginner in Machine Learning or you have been trying hard to understand the Super Natural Machine Learning Algorithms and you still feel that the dots do not connect somehow, this post is definitely for you!
Pages: 1 2
- AI Knowledge Map: How To Classify AI Technologies - Aug 31, 2018.
What follows is then an effort to draw an architecture to access knowledge on AI and follow emergent dynamics, a gateway of pre-existing knowledge on the topic that will allow you to scout around for additional information and eventually create new knowledge on AI.
- Dimensionality Reduction : Does PCA really improve classification outcome? - Jul 13, 2018.
In this post, I am going to verify this statement using a Principal Component Analysis ( PCA ) to try to improve the classification performance of a neural network over a dataset.
- Choosing the Right Metric for Evaluating Machine Learning Models — Part 2 - Jun 19, 2018.
This will focus on commonly used metrics in classification, why should we prefer some over others with context.
- Understanding What is Behind Sentiment Analysis – Part 2 - Apr 20, 2018.
Fine-tuning our sentiment classifier...
- Understanding What is Behind Sentiment Analysis – Part 1 - Apr 13, 2018.
Build your first sentiment classifier in 3 steps.
- Using Tensorflow Object Detection to do Pixel Wise Classification - Mar 29, 2018.
Tensorflow recently added new functionality and now we can extend the API to determine pixel by pixel location of objects of interest. So when would we need this extra granularity?
- 5 Things You Need to Know about Sentiment Analysis and Classification - Mar 23, 2018.
We take a look at the important things you need to know about sentiment analysis, including social media, classification, evaluation metrics and how to visualise the results.
- Hierarchical Classification – a useful approach for predicting thousands of possible categories - Mar 12, 2018.
A detailed look at the flat and hierarchical classification approach to dealing with multi-class classification problems.
- Logistic Regression: A Concise Technical Overview - Feb 16, 2018.
Interested in learning the concepts behind Logistic Regression (LogR)? Looking for a concise introduction to LogR? This article is for you. Includes a Python implementation and links to an R script as well.
- 3 different types of machine learning - Nov 1, 2017.
In this extract from “Python Machine Learning” a top data scientist Sebastian Raschka explains 3 main types of machine learning: Supervised, Unsupervised and Reinforcement Learning. Use code PML250KDN to save 50% off the book cost.
Pages: 1 2
- KDnuggets™ News 17:n29, Aug 2: Machine Learning Exercises in Python; 8 Reasons Why Many Big Data Analytics Solutions Fail - Aug 2, 2017.
Machine Learning Exercises in Python: An Introductory Tutorial Series; The BI & Data Analysis Conundrum: 8 Reasons Why Many Big Data Analytics Solutions Fail to Deliver Value; The Internet of Things: An Introductory Tutorial Series; How to squeeze the most from your training data
- The Machine Learning Abstracts: Classification - Jul 27, 2017.
Classification is the process of categorizing or “classifying” some items into a predefined set of categories or “classes”. It is exactly the same even when a machine does so. Let’s dive a little deeper.
- Machine Learning Crash Course: Part 1 - May 24, 2017.
This post, the first in a series of ML tutorials, aims to make machine learning accessible to anyone willing to learn. We’ve designed it to give you a solid understanding of how ML algorithms work as well as provide you the knowledge to harness it in your projects.
- Webinar: Improve Your CLASSIFICATION with CART(r) and RandomForests(r), Mar 29 - Mar 27, 2017.
We discuss the advantages of tree based techniques, including automatic variable selection, variable interactions, nonlinear relationships, outliers, and missing values.
- 7 More Steps to Mastering Machine Learning With Python - Mar 1, 2017.
This post is a follow-up to last year's introductory Python machine learning post, which includes a series of tutorials for extending your knowledge beyond the original.
Pages: 1 2
- What I Learned Implementing a Classifier from Scratch in Python - Feb 28, 2017.
In this post, the author implements a machine learning algorithm from scratch, without the use of a library such as scikit-learn, and instead writes all of the code in order to have a working binary classifier algorithm.
- 17 More Must-Know Data Science Interview Questions and Answers - Feb 15, 2017.
17 new must-know Data Science Interview questions and answers include lessons from failure to predict 2016 US Presidential election and Super Bowl LI comeback, understanding bias and variance, why fewer predictors might be better, and how to make a model more robust to outliers.
Pages: 1 2
- The Costs of Misclassifications - Dec 14, 2016.
Importance of correct classification and hazards of misclassification are subjective or we can say varies on case-to-case. Lets see how cost of misclassification is measured from monetary perspective.
- Data Science Basics: What Types of Patterns Can Be Mined From Data? - Dec 14, 2016.
Why do we mine data? This post is an overview of the types of patterns that can be gleaned from data mining, and some real world examples of said patterns.
- The Best Metric to Measure Accuracy of Classification Models - Dec 7, 2016.
Measuring accuracy of model for a classification problem (categorical output) is complex and time consuming compared to regression problems (continuous output). Let’s understand key testing metrics with example, for a classification problem.
Pages: 1 2
- arXiv Paper Spotlight: Automated Inference on Criminality Using Face Images - Dec 7, 2016.
This recent paper addresses the use of still facial images in an attempt to differentiate criminals from non-criminals, doing so with the help of 4 different classifiers. Results are as troubling as they are unsettling.
- Random Forests® in Python - Dec 2, 2016.
Random forest is a highly versatile machine learning method with numerous applications ranging from marketing to healthcare and insurance. This is a post about random forests using Python.
- Introduction to Machine Learning for Developers - Nov 28, 2016.
Whether you are integrating a recommendation system into your app or building a chat bot, this guide will help you get started in understanding the basics of machine learning.
Pages: 1 2
- Neighbors Know Best: (Re) Classifying an Underappreciated Beer - Nov 24, 2016.
A look at beer features to determine whether a specific brew might be better served (pun intended) by being classified under a different style. kNN analysis supported with in-post plots and linked iPython notebook.
- Artificial Intelligence Classification Matrix - Nov 3, 2016.
There might be several different ways to think around machine intelligence startups; too narrow of a framework might be counterproductive given the flexibility of the sector and the facility of transitioning from one group to another. Check out this categorization matrix.
- MLDB: The Machine Learning Database - Oct 17, 2016.
MLDB is an opensource database designed for machine learning. Send it commands over a RESTful API to store data, explore it using SQL, then train machine learning models and expose them as APIs.
- The Evolution of Classification, Oct 19, Oct 26 Webinars - Oct 7, 2016.
Join us for this two part webinar series on the Evolution of Classification, presented by Senior Scientist, Mikhail Golovnya.
- Neural Designer: Predictive Analytics Software - Sep 26, 2016.
Neural Designer advanced neural network algorithms, combined with a simple user interface and fast performance, make it a great tool for data scientists. Download free 15-day trial version.
- A Primer on Logistic Regression – Part I - Aug 24, 2016.
Gain an understanding of logistic regression - what it is, and when and how to use it - in this post.
Pages: 1 2
- Improving Nudity Detection and NSFW Image Recognition - Jun 25, 2016.
This post discussed improvements made in a tricky machine learning classification problem: nude and/or NSFW, or not?
- Machine Learning Key Terms, Explained - May 25, 2016.
An overview of 12 important machine learning concepts, presented in a no frills, straightforward definition style.
Pages: 1 2
- KDnuggets™ News 16:n16, May 4: How to Remove Duplicates from Large Data; Datasets over Algorithms; When Automation goes too far - May 4, 2016.
How to Remove Duplicates in Large Datasets; The Development of Classification as a Learning Machine; Datasets Over Algorithms; Cartoon: When Automation Goes Too Far, and more.
- The Development of Classification as a Learning Machine - Apr 29, 2016.
An explanation of how classification developed as a learning machine, from LDA to the perceptron, on to logistic regression, and through to support vector machines.
- Salford Predictive Modeler 8: Faster. More Machine Learning. Better results - Apr 4, 2016.
Take a giant step forward with SPM 8: Download and try it for yourself just released version 8 and get better results.
- What Dog Breed is That? Let AI “fetch” it for you! - Feb 25, 2016.
Recently released AI app identifies dog breed information from pictures and mixes some fun too.
- Amazon Machine Learning: Nice and Easy or Overly Simple? - Feb 17, 2016.
Amazon Machine Learning is a predictive analytics service with binary/multiclass classification and linear regression features. The service is fast, offers a simple workflow but lacks model selection features and has slow execution times.
- Data Analytics Boosting Digital Engagement at Australian Open 2016 - Jan 25, 2016.
Advanced analytics and visualization is enhancing fan experience and operational excellence at Australian Open 2016
- What questions can data science answer? - Jan 1, 2016.
There are only five questions machine learning can answer: Is this A or B? Is this weird? How much/how many? How is it organized? What should I do next? We examine these questions in detail and what it implies for data science.
Pages: 1 2
- ebook: Learning Apache Mahout Classification - May 15, 2015.
If you are a data scientist with Hadoop experience and interest in machine learning, this book is for you. Learn about different classification in Apache Mahout and build your own classifiers.
- Fundamental methods of Data Science: Classification, Regression And Similarity Matching - Jan 12, 2015.
Data classification, regression, and similarity matching underpin many of the fundamental algorithms in data science to solve business problems like consumer response prediction and product recommendation.
- “Vite fait, bien fait” – Averaging improves both accuracy and speed of time series classification - Dec 21, 2014.
Time series classification using k-nearest neighbors and dynamic time warping can be improved in many practical applications in both speed and accuracy using averaging.
- Upcoming Webcasts on Analytics, Big Data, Data Science – Oct 7 and beyond - Oct 6, 2014.
Evolution of Classification, Billion Dollar Fraud Detection, Big Data Visualization, Deep Learning on Apache Spark, and more.
- One-handed Keystroke Biometric Identification Competition - Oct 2, 2014.
Build a biometric keystroke classifier in this new competition to help identify the features that best predict one-handed typing samples. The prize for first place is a fingerprint scanner.
- Upcoming Webcasts on Analytics, Big Data, Data Science – Sep 30 and beyond - Sep 29, 2014.
Not all graph databases are created equal, Evolution of Classification, Governing Big Data, Big Data Visualization, Best Practices for Applying Advanced Analytics in Hadoop, and more.
- Upcoming Webcasts on Analytics, Big Data, Data Science – Sep 22 and beyond - Sep 22, 2014.
Future of Hadoop Analytics, What Works: Open Source Analytics Software, Data Mining: Failure To Launch, Evolution of Classification, Not all Graph Databases are created equal, Best Practices for Applying Advanced Analytics in Hadoop, and more.
- Data Analytics for Business Leaders Explained - Sep 22, 2014.
Learn about a variety of different approaches to data analytics and their advantages and limitations from a business leader's perspective in part 1 of this post on data analytics techniques.
- Interview: Vita Markman, LinkedIn on Discovering Customer Insights through Sentiment Mining - Aug 5, 2014.
We discuss examples of discovery through sentiment mining, current trends, innovative applications, important soft skills, and more.
- Top KDnuggets tweets, Aug 1-3: Open Source Data Science Masters plan - Aug 4, 2014.
Open Source #DataScience Masters plan, with courses from Coursera, Stanford, edX; Book: Data Classification: Algorithms and Applications; Markov Chains, key #MachineLearning technique, nice visual explanation; Data Science with #Python: Part 1.
- Interview: Vita Markman, LinkedIn on Practical Solutions for Sentiment Mining Challenges - Aug 4, 2014.
We discuss sentiment data models, significance of linguistic features, handling the noise in social conversations, industry challenges, important use cases and the appropriateness of over-simplified binary classification.
- Book: Data Classification: Algorithms and Applications - Aug 2, 2014.
Learn a wide variety of data classification techniques and their methods, domains, and variations in this comprehensive survey of the area of data classification.
- Interview: Kavita Ganesan, FindiLike on Building Decision Support Systems based on User Opinions - Jul 27, 2014.
We discuss the founding story of FindiLike, Opinion-driven Decision Support Systems (ODSS), challenges in analyzing user opinions, future of Sentiment Analysis, favorite books and more.
- Book: Data Classification: Algorithms and Applications - Jun 14, 2014.
This new book explores the underlying algorithms of classification and applications in text, multimedia, social network, biological data, and other domains. 25% off with KDnuggets discount.
- Interview: Vasanth Kumar, Principal Data Scientist, Live Nation - May 2, 2014.
We discuss challenges in analyzing bursty data, real-time classification, relevance of statistics and advice for newcomers to Data Science.