- The Architecture Used at LinkedIn to Improve Feature Management in Machine Learning Models - May 11, 2020.
The new typed feature schema streamlined the reusability of features across thousands of machine learning models.
- Interpretability: Cracking open the black box, Part 2 - Dec 11, 2019.
The second part in a series on leveraging techniques to take a look inside the black box of AI, this guide considers post-hoc interpretation that is useful when the model is not transparent.
- 5 Great New Features in Latest Scikit-learn Release - Dec 10, 2019.
From not sweating missing values, to determining feature importance for any estimator, to support for stacking, and a new plotting API, here are 5 new features of the latest release of Scikit-learn which deserve your attention.
- An Eight-Step Checklist for An Analytics Project - Nov 6, 2019.
Follow these eight headings of an audit sheet that business analysts should address before submitting the results of their analytics project. One recommended approach is to rewrite each step as a question, answer it, and then attach it to your project.
- KDnuggets™ News 19:n41, Oct 30: Feature Selection: Beyond feature importance?; Time Series Analysis Using KNIME and Spark - Oct 30, 2019.
This week in KDnuggets: Feature Selection: Beyond feature importance?; Time Series Analysis: A Simple Example with KNIME and Spark; 5 Advanced Features of Pandas and How to Use Them; How to Measure Foot Traffic Using Data Analytics; Introduction to Natural Language Processing (NLP); and much, much more!
- Feature Selection: Beyond feature importance? - Oct 24, 2019.
In this post, you will see 3 different techniques of how to do Feature Selection to your datasets and how to build an effective predictive model.
- Proptech and the proper use of technology for house sales prediction - Aug 22, 2019.
Using the ATTOM dataset, we extracted data on sales transactions in the USA, loans, and estimated values of property. We developed an optimal prediction model from correlations in the time and status of ownership as well as the time of the year of sales fluctuations.
- Feature selection by random search in Python - Aug 6, 2019.
Feature selection is one of the most important tasks in machine learning. Learn how to use a simple random search in Python to get good results in less time.
- Opening Black Boxes: How to leverage Explainable Machine Learning - Aug 1, 2019.
A machine learning model that predicts some outcome provides value. One that explains why it made the prediction creates even more value for your stakeholders. Learn how Interpretable and Explainable ML technologies can help while developing your model.
- The Hitchhiker’s Guide to Feature Extraction - Jun 3, 2019.
Check out this collection of tricks and code for Kaggle and everyday work.
- 7 Steps to Mastering Intermediate Machine Learning with Python — 2019 Edition - Jun 3, 2019.
This is the second part of this new learning path series for mastering machine learning with Python. Check out these 7 steps to help master intermediate machine learning with Python!
- A Quick Guide to Feature Engineering - Feb 11, 2019.
Feature engineering plays a key role in machine learning, data mining, and data analytics. This article provides a general definition for feature engineering, together with an overview of the major issues, approaches, and challenges of the field.
- Implementing Automated Machine Learning Systems with Open Source Tools - Oct 25, 2018.
What if you want to implement an automated machine learning pipeline of your very own, or automate particular aspects of a machine learning pipeline? Rest assured that there is no need to reinvent any wheels.
- Step Forward Feature Selection: A Practical Example in Python - Jun 18, 2018.
When it comes to disciplined approaches to feature selection, wrapper methods are those which marry the feature selection process to the type of model being built, evaluating feature subsets in order to detect the model performance between features, and subsequently select the best performing subset.
- How (dis)similar are my train and test data? - Jun 7, 2018.
This articles examines a scenario where your machine learning model can fail.
- Multi-objective Optimization for Feature Selection - Dec 5, 2017.
By having the model analyze the important signals, we can focus on the right set of attributes for optimization. As a side effect, less attributes also mean that you can train your models faster, making them less complex and easier to understand.
- Evolutionary Algorithms for Feature Selection - Nov 29, 2017.
Feature selection is a very important technique in machine learning. In this post we discuss one of the most common optimization algorithms for multi-modal fitness landscapes - evolutionary algorithms.
- Automated Feature Engineering for Time Series Data - Nov 20, 2017.
We introduce a general framework for developing time series models, generating features and preprocessing the data, and exploring the potential to automate this process in order to apply advanced machine learning algorithms to almost any time series problem.
- Basic Concepts of Feature Selection - Nov 15, 2017.
Feature selection is a key part of data science but is it still relevant in the age of support vector machines (SVMs) and Deep Learning? Yes, absolutely. We explain why.
- KDnuggets™ News 17:n23, Jun 14: The Practice of Machine Learning, Data Science Implementation, and Feature Selection - Jun 14, 2017.
A Practical Guide to Machine Learning; Your Checklist to Get Data Science Implemented in Production; The Practical Importance of Feature Selection; Machine Learning in Real Life: Tales from the Trenches.
- The Practical Importance of Feature Selection - Jun 12, 2017.
Feature selection is useful on a variety of fronts: it is the best weapon against the Curse of Dimensionality; it can reduce overall training times; and it is a powerful defense against overfitting, increasing generalizability.
- Must-Know: Why it may be better to have fewer predictors in Machine Learning models? - Apr 4, 2017.
There are a few reasons why it might be a better idea to have fewer predictor variables rather than having many of them. Read on to find out more.
- Kanri Distance Calculator(tm) – patented solution applying power of Big Data to an Individual - Mar 21, 2017.
Kanri combination of patented statistical and process methods provide a powerful ability to evaluate large data, tells users the exact distance from target, and variable contributions for participant. Free trial and 88% KDnuggets discount for the first 100 buyers.
- 17 More Must-Know Data Science Interview Questions and Answers, Part 2 - Feb 22, 2017.
The second part of 17 new must-know Data Science Interview questions and answers covers overfitting, ensemble methods, feature selection, ground truth in unsupervised learning, the curse of dimensionality, and parallel algorithms.
- Identifying Variables That Might Be Better Predictors - Feb 2, 2017.
This blog serves to expand on the approach that the data science team uses to identify (and quantify) which variables and metrics are better predictors of performance.
- Data Analytics Models in Quantitative Finance and Risk Management - Dec 13, 2016.
We review how key data science algorithms, such as regression, feature selection, and Monte Carlo, are used in financial instrument pricing and risk management.
- Clustering Key Terms, Explained - Oct 18, 2016.
Getting started with Data Science or need a refresher? Clustering is among the most used tools of Data Scientists. Check out these 10 Clustering-related terms and their concise definitions.
- Data Mining Tip: How to Use High-cardinality Attributes in a Predictive Model - Aug 29, 2016.
High-cardinality nominal attributes can pose an issue for inclusion in predictive models. There exist a few ways to accomplish this, however, which are put forward here.
- MDL Clustering: Unsupervised Attribute Ranking, Discretization, and Clustering - Aug 26, 2016.
MDL Clustering is a free software suite for unsupervised attribute ranking, discretization, and clustering based on the Minimum Description Length principle and built on the Weka Data Mining platform.
- Approaching (Almost) Any Machine Learning Problem - Aug 18, 2016.
If you're looking for an overview of how to approach (almost) any machine learning problem, this is a good place to start. Read on as a Kaggle competition veteran shares his pipelines and approach to problem-solving.
Pages: 1 2
- Contest 2nd Place: Automating Data Science - Aug 3, 2016.
This post discusses some considerations, options, and opportunities for automating aspects of data science and machine learning. It is the second place recipient (tied) in the recent KDnuggets blog contest.
- And the Winner is… Stepwise Regression - Aug 1, 2016.
This post evaluates several methods for automating the feature selection process in large-scale linear regression models and show that for marketing applications the winner is Stepwise regression.
- Nutrition & Principal Component Analysis: A Tutorial - Jun 16, 2016.
A great overview of Principal Component Analysis (PCA), with an example application in the field of nutrition.
Pages: 1 2
- Data Science of Variable Selection: A Review - Jun 7, 2016.
There are as many approaches to selecting features as there are statisticians since every statistician and their sibling has a POV or a paper on the subject. This is an overview of some of these approaches.
- scikit-feature: Open-Source Feature Selection Repository in Python - Mar 3, 2016.
scikit-feature is an open-source feature selection repository in python, with around 40 popular algorithms in feature selection research. It is developed by Data Mining and Machine Learning Lab at Arizona State University.