
The 20 Python Packages You Need For Machine Learning and Data Science - Oct 14, 2021.
Do you do Python? Do you do data science and machine learning? Then, you need to do these crucial Python libraries that enable nearly all you will want to do.
Data Science, Keras, Machine Learning, Matplotlib, numpy, Pandas, Plotly, Python, PyTorch, scikit-learn, TensorFlow
- AutoML: An Introduction Using Auto-Sklearn and Auto-PyTorch - Oct 11, 2021.
AutoML is a broad category of techniques and tools for applying automated search to your automated search and learning to your learning. In addition to Auto-Sklearn, the Freiburg-Hannover AutoML group has also developed an Auto-PyTorch library. We’ll use both of these as our entry point into AutoML in the following simple tutorial.
Automated Machine Learning, AutoML, Python, PyTorch, scikit-learn
- 30 Most Asked Machine Learning Questions Answered - Aug 3, 2021.
There is always a lot to learn in machine learning. Whether you are new to the field or a seasoned practitioner and ready for a refresher, understanding these key concepts will keep your skills honed in the right direction.
Beginners, Interview Questions, Machine Learning, Regression, scikit-learn
- A Comprehensive Guide to Ensemble Learning – Exactly What You Need to Know - May 6, 2021.
This article covers ensemble learning methods, and exactly what you need to know in order to understand and implement them.
CatBoost, Ensemble Methods, Machine Learning, Python, random forests algorithm, scikit-learn, XGBoost
- Gradient Boosted Decision Trees – A Conceptual Explanation - Apr 30, 2021.
Gradient boosted decision trees involves implementing several models and aggregating their results. These boosted models have become popular thanks to their performance in machine learning competitions on Kaggle. In this article, we’ll see what gradient boosted decision trees are all about.
CatBoost, Decision Trees, Gradient Boosting, Machine Learning, Python, scikit-learn, XGBoost
The Most In-Demand Skills for Data Scientists in 2021 - Apr 15, 2021.
If you are preparing to make a career as a Data Scientist or are looking for opportunities to skill-up in your current role, this analysis of in-demand skills for 2021, based on over 15,000 Data Scientist job postings, should offer you a good idea as to which programming languages and software tools are increasing and decreasing in importance.
AWS, Data Science Skills, Python, PyTorch, R, scikit-learn, SQL, TensorFlow

Top 10 Python Libraries Data Scientists should know in 2021 - Mar 24, 2021.
So many Python libraries exist that offer powerful and efficient foundations for supporting your data science work and machine learning model development. While the list may seem overwhelming, there are certain libraries you should focus your time on, as they are some of the most commonly used today.
Data Science, Keras, numpy, Pandas, Python, scikit-learn, Seaborn, TensorFlow
- KDnuggets™ News 21:n12, Mar 24: More Data Science Cheat sheets; Top YouTube Channels for Machine Learning - Mar 24, 2021.
Happy with your job or not? Either way,
vote in KDnuggets Poll on Data Job Satisfaction to help us understand the current situation.
In this issue, More data science cheatsheets; How to create your data science portfolio; The best machine learning frameworks and extensions for scikit-learn; Top youtube channels for machine learning; dbt, the ETL and ELT disrupter;
Cheat Sheet, Machine Learning, Portfolio, scikit-learn, Youtube
The Best Machine Learning Frameworks & Extensions for Scikit-learn - Mar 22, 2021.
Learn how to use a selection of packages to extend the functionality of Scikit-learn estimators.
Machine Learning, Python, scikit-learn
- KDnuggets™ News 21:n10, Mar 10: More Resources for Women in AI, Data Science, and Machine Learning; Speeding up Scikit-Learn Model Training - Mar 10, 2021.
More Resources for Women in AI, Data Science, and Machine Learning; Speeding up Scikit-Learn Model Training; Dask and Pandas: No Such Thing as Too Much Data; 9 Skills You Need to Become a Data Engineer; 8 Women in AI Who Are Striving to Humanize the World
AI, Dask, Data Engineer, Data Science, Machine Learning, Modeling, Pandas, scikit-learn, Training, Women
- Speeding up Scikit-Learn Model Training - Mar 5, 2021.
If your scikit-learn models are taking a bit of time to train, then there are several techniques you can use to make the processing more efficient. From optimizing your model configuration to leveraging libraries to speed up training through parallelization, you can build the best scikit-learn model possible in the least amount of time.
Distributed Computing, Machine Learning, Optimization, scikit-learn
- Bayesian Hyperparameter Optimization with tune-sklearn in PyCaret - Mar 5, 2021.
PyCaret, a low code Python ML library, offers several ways to tune the hyper-parameters of a created model. In this post, I'd like to show how Ray Tune is integrated with PyCaret, and how easy it is to leverage its algorithms and distributed computing to achieve results superior to default random search method.
Bayesian, Hyperparameter, Machine Learning, Optimization, PyCaret, Python, scikit-learn
- Distributed and Scalable Machine Learning [Webinar] - Feb 17, 2021.
Mike McCarty and Gil Forsyth work at the Capital One Center for Machine Learning, where they are building internal PyData libraries that scale with Dask and RAPIDS. For this webinar, Feb 23 @ 2 pm PST, 5pm EST, they’ll join Hugo Bowne-Anderson and Matthew Rocklin to discuss their journey to scale data science and machine learning in Python.
Capital One, Dask, Distributed, Machine Learning, Python, scikit-learn, XGBoost
- How to Speed up Scikit-Learn Model Training - Feb 11, 2021.
Scikit-Learn is an easy to use a Python library for machine learning. However, sometimes scikit-learn models can take a long time to train. The question becomes, how do you create the best scikit-learn model in the least amount of time?
Distributed Systems, Hyperparameter, Machine Learning, Optimization, Parallelism, Python, scikit-learn, Training
Build Your First Data Science Application - Feb 4, 2021.
Check out these seven Python libraries to make your first data science MVP application.
API, Data Science, Jupyter, Keras, numpy, Pandas, Plotly, Python, PyTorch, scikit-learn
- KDnuggets™ News 21:n04, Jan 27: The Ultimate Scikit-Learn Machine Learning Cheatsheet; Building a Deep Learning Based Reverse Image Search - Jan 27, 2021.
The Ultimate Scikit-Learn Machine Learning Cheatsheet; Building a Deep Learning Based Reverse Image Search; Data Engineering — the Cousin of Data Science, is Troublesome; Going Beyond the Repo: GitHub for Career Growth in AI & Machine Learning; Popular Machine Learning Interview Questions
Cheat Sheet, Data Engineering, Data Science, Deep Learning, GitHub, Image Recognition, Machine Learning, scikit-learn, Search
The Ultimate Scikit-Learn Machine Learning Cheatsheet - Jan 25, 2021.
With the power and popularity of the scikit-learn for machine learning in Python, this library is a foundation to any practitioner's toolset. Preview its core methods with this review of predictive modelling, clustering, dimensionality reduction, feature importance, and data transformation.
Cheat Sheet, Machine Learning, scikit-learn
K-Means 8x faster, 27x lower error than Scikit-learn in 25 lines - Jan 15, 2021.
K-means clustering is a powerful algorithm for similarity searches, and Facebook AI Research's faiss library is turning out to be a speed champion. With only a handful of lines of code shared in this demonstration, faiss outperforms the implementation in scikit-learn in speed and accuracy.
Algorithms, K-means, Machine Learning, scikit-learn
- How to use Machine Learning for Anomaly Detection and Conditional Monitoring - Dec 16, 2020.
This article explains the goals of anomaly detection and outlines the approaches used to solve specific use cases for anomaly detection and condition monitoring.
Anomaly Detection, Machine Learning, Python, scikit-learn, Unsupervised Learning
- 5 Most Useful Machine Learning Tools every lazy full-stack data scientist should use - Nov 18, 2020.
If you consider yourself a Data Scientist who can take any project from data curation to solution deployment, then you know there are many tools available today to help you get the job done. The trouble is that there are too many choices. Here is a review of five sets of tools that should turn you into the most efficient full-stack data scientist possible.
Data Science Tools, Data Scientist, GitHub, Heroku, Machine Learning, Postgres, PyCharm, PyTorch, scikit-learn, Streamlit
- Most Popular Distance Metrics Used in KNN and When to Use Them - Nov 11, 2020.
For calculating distances KNN uses a distance metric from the list of available metrics. Read this article for an overview of these metrics, and when they should be considered for use.
K-nearest neighbors, Metrics, scikit-learn
- Feature Ranking with Recursive Feature Elimination in Scikit-Learn - Oct 19, 2020.
This article covers using scikit-learn to obtain the optimal number of features for your machine learning project.
Feature Selection, Machine Learning, Python, scikit-learn
Modern Data Science Skills: 8 Categories, Core Skills, and Hot Skills - Sep 8, 2020.
We analyze the results of the Data Science Skills poll, including 8 categories of skills, 13 core skills that over 50% of respondents have, the emerging/hot skills that data scientists want to learn, and what is the top skill that Data Scientists want to learn.
Communication, Data Preparation, Data Science Skills, Data Visualization, Excel, GitHub, Mathematics, Poll, Python, Reinforcement Learning, scikit-learn, SQL, Statistics
- 10 Things You Didn’t Know About Scikit-Learn - Sep 3, 2020.
Check out these 10 things you didn’t know about Scikit-Learn... until now.
Machine Learning, Python, scikit-learn
- Why would you put Scikit-learn in the browser? - Jul 22, 2020.
Honestly? I don’t know. But I do think WebAssembly is a good target for ML/AI deployment (in the browser and beyond).
Deployment, Development, scikit-learn, Virtualization
- Simplified Mixed Feature Type Preprocessing in Scikit-Learn with Pipelines - Jun 16, 2020.
There is a quick and easy way to perform preprocessing on mixed feature type data in Scikit-Learn, which can be integrated into your machine learning pipelines.
Data Preprocessing, Pipeline, Python, scikit-learn
- Dataset Splitting Best Practices in Python - May 26, 2020.
If you are splitting your dataset into training and testing data you need to keep some things in mind. This discussion of 3 best practices to keep in mind when doing so includes demonstration of how to implement these particular considerations in Python.
Datasets, Python, scikit-learn, Training Data, Validation
- 5 Great New Features in Scikit-learn 0.23 - May 15, 2020.
Check out 5 new features of the latest Scikit-learn release, including the ability to visualize estimators in notebooks, improvements to both k-means and gradient boosting, some new linear model implementations, and sample weight support for a pair of existing regressors.
Gradient Boosting, Jupyter, K-means, Machine Learning, Python, Regression, scikit-learn
- Introduction to the K-nearest Neighbour Algorithm Using Examples - Apr 1, 2020.
Read this concise summary of KNN, a supervised and pattern classification learning algorithm which helps us find which class the new input belongs to when k nearest neighbours are chosen and distance is calculated between them.
Algorithms, K-nearest neighbors, Machine Learning, Python, scikit-learn
- Practical Hyperparameter Optimization - Feb 13, 2020.
An introduction on how to fine-tune Machine and Deep Learning models using techniques such as: Random Search, Automated Hyperparameter Tuning and Artificial Neural Networks Tuning.
Automated Machine Learning, AutoML, Deep Learning, Hyperparameter, Machine Learning, Optimization, Python, scikit-learn
- 5 Great New Features in Latest Scikit-learn Release - Dec 10, 2019.
From not sweating missing values, to determining feature importance for any estimator, to support for stacking, and a new plotting API, here are 5 new features of the latest release of Scikit-learn which deserve your attention.
Data Preparation, Data Preprocessing, Ensemble Methods, Feature Selection, Gradient Boosting, K-nearest neighbors, Machine Learning, Missing Values, Python, scikit-learn, Visualization
- Beginners Guide to the Three Types of Machine Learning - Nov 13, 2019.
The following article is an introduction to classification and regression — which are known as supervised learning — and unsupervised learning — which in the context of machine learning applications often refers to clustering — and will include a walkthrough in the popular python library scikit-learn.
Beginners, Classification, Machine Learning, Python, Regression, scikit-learn, Supervised Learning, Unsupervised Learning
- KDnuggets™ News 19:n41, Oct 30: Feature Selection: Beyond feature importance?; Time Series Analysis Using KNIME and Spark - Oct 30, 2019.
This week in KDnuggets: Feature Selection: Beyond feature importance?; Time Series Analysis: A Simple Example with KNIME and Spark; 5 Advanced Features of Pandas and How to Use Them; How to Measure Foot Traffic Using Data Analytics; Introduction to Natural Language Processing (NLP); and much, much more!
Apache Spark, Data Analytics, Feature Selection, Knime, NLP, Pandas, Python, scikit-learn, Time Series
- How to Extend Scikit-learn and Bring Sanity to Your Machine Learning Workflow - Oct 29, 2019.
In this post, learn how to extend Scikit-learn code to make your experiments easier to maintain and reproduce.
Machine Learning, Python, scikit-learn, Software Engineering, Workflow
- Scikit-Learn & More for Synthetic Dataset Generation for Machine Learning - Sep 19, 2019.
While mature algorithms and extensive open-source libraries are widely available for machine learning practitioners, sufficient data to apply these techniques remains a core challenge. Discover how to leverage scikit-learn and other tools to generate synthetic data appropriate for optimizing and fine-tuning your models.
Dataset, Machine Learning, scikit-learn, Synthetic Data
- KDnuggets™ News 19:n35, Sep 18: Which Data Science Skills are core and which are hot/emerging ones?; There is No Free Lunch in Data Science Features - Sep 18, 2019.
Check the results of KDnuggets' latest poll to find out which data science skills are core and which are hot/emerging ones; why is there no free lunch in data science?; training Scikit-learn 100x faster; poking fun at unsupervised machine learning; exploring the case for ensemble learning. All this and much more this week on KDnuggets.
Data Science, Data Science Skills, Ensemble Methods, scikit-learn, Training, Unsupervised Learning
Train sklearn 100x Faster - Sep 11, 2019.
As compute gets cheaper and time to market for machine learning solutions becomes more critical, we’ve explored options for speeding up model training. One of those solutions is to combine elements from Spark and scikit-learn into our own hybrid solution.
Distributed Systems, Machine Learning, Python, scikit-learn, Training
- Scikit-Learn vs mlr for Machine Learning - Sep 10, 2019.
How does the scikit-learn machine learning library for Python compare to the mlr package for R? Following along with a machine learning workflow through each approach, and see if you can gain a competitive advantage by knowing both frameworks.
Exxact, Machine Learning, R, scikit-learn
- Understanding Decision Trees for Classification in Python - Aug 21, 2019.
This tutorial covers decision trees for classification also known as classification trees, including the anatomy of classification trees, how classification trees make predictions, using scikit-learn to make classification trees, and hyperparameter tuning.
Classification, Decision Trees, Python, scikit-learn
How to Learn Python for Data Science the Right Way - Jun 14, 2019.
The biggest mistake you can make while learning Python for data science is to learn Python programming from courses meant for programmers. Avoid this mistake, and learn Python the right way by following this approach.
Advice, Data Science, Jupyter, Matplotlib, Pandas, Python, scikit-learn, StatsModels
What you need to know: The Modern Open-Source Data Science/Machine Learning Ecosystem - Jun 10, 2019.
We identify the 6 tools in the modern open-source Data Science ecosystem, examine the Python vs R question, and determine which tools are used the most with Deep Learning and Big Data.
Anaconda, Apache Spark, Big Data Software, Deep Learning, Excel, Keras, Poll, Python, R, RapidMiner, scikit-learn, Software, SQL, Tableau, TensorFlow
7 Steps to Mastering Intermediate Machine Learning with Python — 2019 Edition - Jun 3, 2019.
This is the second part of this new learning path series for mastering machine learning with Python. Check out these 7 steps to help master intermediate machine learning with Python!
7 Steps, Classification, Cross-validation, Dimensionality Reduction, Feature Engineering, Feature Selection, Image Classification, K-nearest neighbors, Machine Learning, Modeling, Naive Bayes, numpy, Pandas, PCA, Python, scikit-learn, Transfer Learning

Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis - May 30, 2019.
Python continues to lead the top Data Science platforms, but R and RapidMiner hold their share; Almost 50% have used Deep Learning tools; SQL is steady; Consolidation continues.
Pages: 1 2
Anaconda, Apache Spark, Deep Learning, Excel, Keras, Poll, Python, R, RapidMiner, scikit-learn, Software, SQL, TensorFlow
- Naive Bayes: A Baseline Model for Machine Learning Classification Performance - May 7, 2019.
We can use Pandas to conduct Bayes Theorem and Scikitlearn to implement the Naive Bayes Algorithm. We take a step by step approach to understand Bayes and implementing the different options in Scikitlearn.
Pages: 1 2
Algorithms, Data Science, Machine Learning, Naive Bayes, Python, scikit-learn, Statistics
- Unleash a faster Python on your data - Apr 18, 2019.
Intel’s optimized Python packages deliver quick repeatable results compared to standard Python packages. Intel offers optimized Scikit-learn, Numpy, and SciPy to help data scientists get rapid results on their Intel® hardware. Download now.
Data Science, Intel, numpy, Python, scikit-learn, SciPy
- A Beginner’s Guide to Linear Regression in Python with Scikit-Learn - Mar 29, 2019.
What linear regression is and how it can be implemented for both two variables and multiple variables using Scikit-Learn, which is one of the most popular machine learning libraries for Python.
Pages: 1 2
Beginners, Linear Regression, Python, scikit-learn
- Feature Reduction using Genetic Algorithm with Python - Mar 25, 2019.
This tutorial discusses how to use the genetic algorithm (GA) for reducing the feature vector extracted from the Fruits360 dataset in Python mainly using NumPy and Sklearn.
Pages: 1 2
Deep Learning, Feature Engineering, Genetic Algorithm, Neural Networks, numpy, Python, scikit-learn
- Top KDnuggets tweets, Feb 13-19: Intro to Scikit Learn: The Gold Standard of Python ML; The Essential Data Science Venn Diagram - Feb 20, 2019.
Also: Cartoon: #MachineLearning Problems in 2118 #ValentinesDay; A must-read tutorial when you are starting your journey with #DeepLearning.
Data Science, Deep Learning, scikit-learn, Top tweets, Valentine's Day, Venn Diagram
Python Data Science for Beginners - Feb 20, 2019.
Python’s syntax is very clean and short in length. Python is open-source and a portable language which supports a large standard library. Buy why Python for data science? Read on to find out more.
Beginners, Data Science, Matplotlib, numpy, Pandas, Python, scikit-learn, SciPy
- KDnuggets™ News 19:n08, Feb 20: The Gold Standard of Python Machine Learning; The Analytics Engineer – new role in the data team - Feb 20, 2019.
Intro to scikit-learn; how to set up a Python ML environment; why there should be a new role in the Data Science team; how to learn one of the hardest parts of being a Data Scientist; and how explainable is BERT?
BERT, Python, scikit-learn
An Introduction to Scikit Learn: The Gold Standard of Python Machine Learning - Feb 13, 2019.
If you’re going to do Machine Learning in Python, Scikit Learn is the gold standard. Scikit-learn provides a wide selection of supervised and unsupervised learning algorithms. Best of all, it’s by far the easiest and cleanest ML library.
Machine Learning, Python, scikit-learn
- Automated Machine Learning in Python - Jan 18, 2019.
An organization can also reduce the cost of hiring many experts by applying AutoML in their data pipeline. AutoML also reduces the amount of time it would take to develop and test a machine learning model.
Automated Machine Learning, AutoML, H2O, Keras, Machine Learning, Python, scikit-learn
A Guide to Decision Trees for Machine Learning and Data Science - Dec 24, 2018.
What makes decision trees special in the realm of ML models is really their clarity of information representation. The “knowledge” learned by a decision tree through training is directly formulated into a hierarchical structure.
Algorithms, Data Science, Decision Trees, Machine Learning, Python, scikit-learn
- KDnuggets™ News 18:n41, Oct 31: Introduction to Deep Learning with Keras; Easy Named Entity Recognition with Scikit-Learn - Oct 31, 2018.
Also: Generative Adversarial Networks - Paper Reading Road Map; How I Learned to Stop Worrying and Love Uncertainty; Implementing Automated Machine Learning Systems with Open Source Tools; Notes on Feature Preprocessing: The What, the Why, and the How
Automated Machine Learning, Data Preprocessing, Deep Learning, Generative Adversarial Network, Keras, NLP, Python, scikit-learn
- Notes on Feature Preprocessing: The What, the Why, and the How - Oct 26, 2018.
This article covers a few important points related to the preprocessing of numeric data, focusing on the scaling of feature values, and the broad question of dealing with outliers.
Data Preparation, Data Preprocessing, numpy, Python, scikit-learn, SciPy
- Unleash a Faster Python on Your Data - Oct 2, 2018.
Intel provides optimized Scikit-learn, the most used Python package for classical machine learning. Get faster scikit-learn through Intel® Distribution for Python*
Analytics, Intel, Python, scikit-learn
- Iterative Initial Centroid Search via Sampling for k-Means Clustering - Sep 12, 2018.
Thinking about ways to find a better set of initial centroid positions is a valid approach to optimizing the k-means clustering process. This post outlines just such an approach.
Clustering, K-means, Python, Sampling, scikit-learn
- Deploying scikit-learn Models at Scale - Aug 29, 2018.
Find out how to serve your scikit-learn model in an auto-scaling, serverless environment! Today, we’ll take a trained scikit-learn model and deploy it on Cloud ML Engine.
Cloud, Google, Google Cloud, Machine Learning, Python, scikit-learn
- Multi-Class Text Classification with Scikit-Learn - Aug 27, 2018.
The vast majority of text classification articles and tutorials on the internet are binary text classification such as email spam filtering and sentiment analysis. Real world problem are much more complicated than that.
NLP, Python, scikit-learn, Text Classification, Text Mining
- Building Reliable Machine Learning Models with Cross-validation - Aug 9, 2018.
Cross-validation is frequently used to train, measure and finally select a machine learning model for a given dataset because it helps assess how the results of a model will generalize to an independent data set in practice.
Comet.ml, Cross-validation, Machine Learning, Modeling, scikit-learn
- [ebook] Apache Spark™ Under the Hood - Jun 27, 2018.
Learn how to install and run Spark yourself; A summary of Spark core architecture and concepts; Spark powerful language APIs and how you can use them.
Apache Spark, Databricks, ebook, PyTorch, R, scikit-learn, TensorFlow
Top 20 Python Libraries for Data Science in 2018 - Jun 27, 2018.
Our selection actually contains more than 20 libraries, as some of them are alternatives to each other and solve the same problem. Therefore we have grouped them as it's difficult to distinguish one particular leader at the moment.
Pages: 1 2
Bokeh, Data Science, Keras, Matplotlib, NLTK, numpy, Pandas, Plotly, Python, PyTorch, scikit-learn, SciPy, Seaborn, TensorFlow, XGBoost
The 6 components of Open-Source Data Science/ Machine Learning Ecosystem; Did Python declare victory over R? - Jun 6, 2018.
We find 6 tools form the modern open source Data Science / Machine Learning ecosystem; examine whether Python declared victory over R; and review which tools are most associated with Deep Learning and Big Data.
Anaconda, Apache Spark, Data Science, Keras, Machine Learning, Open Source, Poll, Python, R, RapidMiner, Scala, scikit-learn, TensorFlow
- How I Unknowingly Contributed To Open Source - Apr 24, 2018.
This article explains what is meant by the term 'open source' and why all data scientists should be a part of it.
fast.ai, GitHub, Jeremy Howard, Open Source, Python, scikit-learn
Top 20 Python AI and Machine Learning Open Source Projects - Feb 20, 2018.
We update the top AI and Machine Learning projects in Python. Tensorflow has moved to the first place with triple-digit growth in contributors. Scikit-learn dropped to 2nd place, but still has a very large base of contributors.
GitHub, Machine Learning, Open Source, Python, scikit-learn, TensorFlow
- Introduction to Python Ensembles - Feb 9, 2018.
In this post, we'll take you through the basics of ensembles — what they are and why they work so well — and provide a hands-on tutorial for building basic ensembles.
Pages: 1 2
Decision Trees, Ensemble Methods, Machine Learning, Python, random forests algorithm, ROC-AUC, scikit-learn, XGBoost
5 Machine Learning Projects You Should Not Overlook - Feb 8, 2018.
It's about that time again... 5 more machine learning or machine learning-related projects you may not yet have heard of, but may want to consider checking out!
Bayesian, Gradient Boosting, Keras, Machine Learning, Overlook, PHP, Python, scikit-learn
- KDnuggets™ News 18:n05, Jan 31: Feynman Technique to become a Data Scientist; 4 Big Data Trends for 2018; Data Scientist – best job in America - Jan 31, 2018.
Also How To Grow As A Data Scientist; A Beginner Guide to Data Engineering; Exclusive Interview: Doug Laney on Big Data and Infonomics
Advice, Data Engineering, Data Scientist, scikit-learn, Trends
- Using AutoML to Generate Machine Learning Pipelines with TPOT - Jan 29, 2018.
This post will take a different approach to constructing pipelines. Certainly the title gives away this difference: instead of hand-crafting pipelines and hyperparameter optimization, and performing model selection ourselves, we will instead automate these processes.
Automated Machine Learning, Hyperparameter, Optimization, Pipeline, Python, scikit-learn, Workflow
- Managing Machine Learning Workflows with Scikit-learn Pipelines Part 3: Multiple Models, Pipelines, and Grid Searches - Jan 24, 2018.
In this post, we will be using grid search to optimize models built from a number of different types estimators, which we will then compare and properly evaluate the best hyperparameters that each model has to offer.
Data Preprocessing, Hyperparameter, Optimization, Pipeline, Python, scikit-learn, Workflow
- Managing Machine Learning Workflows with Scikit-learn Pipelines Part 2: Integrating Grid Search - Jan 19, 2018.
Another simple yet powerful technique we can pair with pipelines to improve performance is grid search, which attempts to optimize model hyperparameter combinations.
Data Preprocessing, Hyperparameter, Optimization, Pipeline, Python, scikit-learn, Workflow
- Managing Machine Learning Workflows with Scikit-learn Pipelines Part 1: A Gentle Introduction - Dec 7, 2017.
Scikit-learn's Pipeline class is designed as a manageable way to apply a series of data transformations followed by the application of an estimator.
Data Preprocessing, Pipeline, Python, scikit-learn, Workflow
- Choosing an Open Source Machine Learning Library: TensorFlow, Theano, Torch, scikit-learn, Caffe - Nov 8, 2017.
Open Source is the heart of innovation and rapid evolution of technologies, these days. Here we discuss how to choose open source machine learning tools for different use cases.
Pages: 1 2
Caffe, Machine Learning, Open Source, scikit-learn, TensorFlow, Theano, Torch
- Visualizing Cross-validation Code - Sep 5, 2017.
Cross-validation helps to improve your prediction using the K-Fold strategy. What is K-Fold you asked? Check out this post for a visualized explanation.
Cross-validation, Machine Learning, Python, scikit-learn
- Introducing Dask-SearchCV: Distributed hyperparameter optimization with Scikit-Learn - May 12, 2017.
We introduce a new library for doing distributed hyperparameter optimization with Scikit-Learn estimators. We compare it to the existing Scikit-Learn implementations, and discuss when it may be useful compared to other approaches.
Dask, Distributed Computing, Distributed Systems, Machine Learning, Optimization, scikit-learn
The Guerrilla Guide to Machine Learning with Python - May 1, 2017.
Here is a bare bones take on learning machine learning with Python, a complete course for the quick study hacker with no time (or patience) to spare.
Deep Learning, Machine Learning, Pandas, Python, scikit-learn, Sebastian Raschka
- Top KDnuggets tweets, Apr 19-25: 10 Free Must-Read Books for Machine Learning and Data Science - Apr 26, 2017.
Also Practical #DeepLearning For Coders-18 hours of free lessons; Different views of #Machinelearning #cartoon #humor; Scikit-learn #MachineLearning classification algorithms.
Deep Learning, Free ebook, Machine Learning, scikit-learn, Top tweets
5 Machine Learning Projects You Can No Longer Overlook, April - Apr 13, 2017.
It's about that time again... 5 more machine learning or machine learning-related projects you may not yet have heard of, but may want to consider checking out. Find tools for data exploration, topic modeling, high-level APIs, and feature selection herein.
Data Exploration, Deep Learning, Java, Machine Learning, Neural Networks, Overlook, Python, Scala, scikit-learn, Topic Modeling
- Email Spam Filtering: An Implementation with Python and Scikit-learn - Mar 17, 2017.
This post is an overview of a spam filtering implementation using Python and Scikit-learn. The results of 2 classifiers are contrasted and compared: multinomial Naive Bayes and support vector machines.
Machine Learning, Python, scikit-learn
- K-Means & Other Clustering Algorithms: A Quick Intro with Python - Mar 8, 2017.
In this intro cluster analysis tutorial, we'll check out a few algorithms in Python so you can get a basic understanding of the fundamentals of clustering on a real dataset.
Clustering, K-means, Python, scikit-learn
- A Simple XGBoost Tutorial Using the Iris Dataset - Mar 7, 2017.
This is an overview of the XGBoost machine learning algorithm, which is fast and shows good results. This example uses multiclass prediction with the Iris dataset from Scikit-learn.
Python, scikit-learn, XGBoost
- Top /r/MachineLearning Posts, February: Oxford Deep NLP Course; Data Visualization for Scikit-learn Results - Mar 6, 2017.
Oxford Deep NLP Course; scikit-plot: Data Visualization for Scikit-learn Results; Machine Learning at Berkeley's ML Crash Course: Neural Networks; Predicting parking difficulty with machine learning; TensorFlow 1.0 Release
Data Visualization, Deep Learning, Google, Machine Learning, Natural Language Processing, NLP, Oxford, Reddit, scikit-learn, TensorFlow
7 More Steps to Mastering Machine Learning With Python - Mar 1, 2017.
This post is a follow-up to last year's introductory Python machine learning post, which includes a series of tutorials for extending your knowledge beyond the original.
Pages: 1 2
7 Steps, Classification, Clustering, Deep Learning, Ensemble Methods, Gradient Boosting, Machine Learning, Python, scikit-learn, Sebastian Raschka
- Moving from R to Python: The Libraries You Need to Know - Feb 24, 2017.
Are you considering making a move from R to Python? Here are the libraries you need to know, how they stack up to their R contemporaries, and why you should learn them.
Jupyter, Pandas, Programming, Python, R, scikit-learn, Yhat
- What is a Support Vector Machine, and Why Would I Use it? - Feb 23, 2017.
Support Vector Machine has become an extremely popular algorithm. In this post I try to give a simple explanation for how it works and give a few examples using the the Python Scikits libraries.
Python, scikit-learn, Support Vector Machines, SVM, Yhat
- Learn how to Develop and Deploy a Gradient Boosting Machine Model - Jan 20, 2017.
GBM is one the hottest machine learning methods. Learn how to create GBM using SciKit-Learn and Python and
understand the steps required to transform features, train, and deploy a GBM.
Gradient Boosting, Open Data Group, Python, scikit-learn
- Top KDnuggets tweets, Jan 04-10: Cartoon: When Self-Driving Car takes you too far; A massive collection of free programming books - Jan 11, 2017.
Also AI #DataScience #MachineLearning: Main Developments 2016, Key Trends 2017; Scikit-Learn Cheat Sheet: #Python #MachineLearning
2017 Predictions, Free ebook, Programming, scikit-learn, Self-Driving Car
5 Machine Learning Projects You Can No Longer Overlook, January - Jan 2, 2017.
There are a lot of popular machine learning projects out there, but many more that are not. Which of these are actively developed and worth checking out? Here is an offering of 5 such projects, the most recent in an ongoing series.
Boosting, C++, Data Preparation, Decision Trees, Machine Learning, Neural Networks, Optimization, Overlook, Pandas, Python, scikit-learn
- Introduction to Machine Learning for Developers - Nov 28, 2016.
Whether you are integrating a recommendation system into your app or building a chat bot, this guide will help you get started in understanding the basics of machine learning.
Pages: 1 2
Beginners, Classification, Clustering, Machine Learning, Pandas, Python, R, scikit-learn, Software Developer
Top 20 Python Machine Learning Open Source Projects, updated - Nov 21, 2016.
Open Source is the heart of innovation and rapid evolution of technologies, these days. This article presents you Top 20 Python Machine Learning Open Source Projects of 2016 along with very interesting insights and trends found during the analysis.
GitHub, Machine Learning, Open Source, Python, scikit-learn
- Automated Machine Learning: An Interview with Randy Olson, TPOT Lead Developer - Oct 28, 2016.
Read an insightful interview with Randy Olson, Senior Data Scientist at University of Pennsylvania Institute for Biomedical Informatics, and lead developer of TPOT, an open source Python tool that intelligently automates the entire machine learning process.
Automated Data Science, Automated Machine Learning, Machine Learning, Python, scikit-learn
- KDnuggets™ News 16:n38, Oct 26: Free Machine Learning EBooks; Neural Networks in Python with Scikit-learn - Oct 26, 2016.
5 EBooks to Read Before Getting into A Machine Learning Career; A Beginner's Guide to Neural Networks with Python and Scikit-learn 0.18!; New Poll: What was the largest dataset you analyzed / data mined?; Jupyter Notebook Best Practices for Data Science
Free ebook, Machine Learning, Neural Networks, Poll, Python, scikit-learn
A Beginner’s Guide to Neural Networks with Python and SciKit Learn 0.18! - Oct 20, 2016.
This post outlines setting up a neural network in Python using Scikit-learn, the latest version of which now has built in support for Neural Network models.
Pages: 1 2
Beginners, Machine Learning, Neural Networks, Python, scikit-learn
Automated Data Science & Machine Learning: An Interview with the Auto-sklearn Team - Oct 4, 2016.
This is an interview with the authors of the recent winning KDnuggets Automated Data Science and Machine Learning blog contest entry, which provided an overview of the Auto-sklearn project. Learn more about the authors, the project, and automated data science.
Automated, Automated Data Science, Automated Machine Learning, Competition, Machine Learning, scikit-learn
- O’Reilly Live Training–Real-time. Real experts. Real learning. - Sep 26, 2016.
Get intensive, hands-on training from O'Reilly's expert network on critical data topics - from SQL fundamentals to distributed computing; enterprise strategy to data science at scale.
Apache Spark, Courses, Distributed Systems, Hadoop, O'Reilly, scikit-learn, SQL
- Top Machine Learning Projects for Julia - Aug 19, 2016.
Julia is gaining traction as a legitimate alternative programming language for analytics tasks. Learn more about these 5 machine learning related projects.
Deep Learning, Julia, Machine Learning, Open Source, scikit-learn
- Contest Winner: Winning the AutoML Challenge with Auto-sklearn - Aug 5, 2016.
This post is the first place prize recipient in the recent KDnuggets blog contest. Auto-sklearn is an open-source Python tool that automatically determines effective machine learning pipelines for classification and regression datasets. It is built around the successful scikit-learn library and won the recent AutoML challenge.
Automated, Automated Data Science, Automated Machine Learning, Competition, Hyperparameter, scikit-learn, Weka
- Would You Survive the Titanic? A Guide to Machine Learning in Python Part 1 - Jul 25, 2016.
Check out the first of a 3 part introductory series on machine learning in Python, fueled by the Titanic dataset. This is a great place to start for a machine learning newcomer.
Machine Learning, Python, scikit-learn, Titanic
- Semi-supervised Feature Transfer: The Practical Benefit of Deep Learning Today? - Jul 12, 2016.
This post evaluates four different strategies for solving a problem with machine learning, where customized models built from semi-supervised "deep" features using transfer learning outperform models built from scratch, and rival state-of-the-art methods.
Pages: 1 2 3
API, Deep Learning, indico, Machine Learning, scikit-learn, Sentiment Analysis
- 5 Machine Learning Projects You Can No Longer Overlook - May 19, 2016.
We all know the big machine learning projects out there: Scikit-learn, TensorFlow, Theano, etc. But what about the smaller niche projects that are actively developed, providing useful services to users? Here are 5 such projects.
Data Cleaning, Deep Learning, Machine Learning, Open Source, Overlook, Pandas, Python, scikit-learn, Theano
- TPOT: A Python Tool for Automating Data Science - May 13, 2016.
TPOT is an open-source Python data science automation tool, which operates by optimizing a series of feature preprocessors and models, in order to maximize cross-validation accuracy on data sets.
Pages: 1 2
Automated Data Science, Automated Machine Learning, Hyperparameter, Machine Learning, Python, scikit-learn
Scikit Flow: Easy Deep Learning with TensorFlow and Scikit-learn - Feb 12, 2016.
Scikit Learn is a new easy-to-use interface for TensorFlow from Google based on the Scikit-learn fit/predict model. Does it succeed in making deep learning more accessible?
Deep Learning, Google, Matthew Mayo, Python, scikit-learn, TensorFlow
- Auto-Scaling scikit-learn with Spark - Feb 11, 2016.
Databricks gives us an overview of the spark-sklearn library, which automatically and seamlessly distributes model tuning on a Spark cluster, without impacting workflow.
Apache Spark, Databricks, Open Source, scikit-learn
- Scikit-learn and Python Stack Tutorials: Introduction, Implementing Classifiers - Jan 18, 2016.
A small collection of introductory scikit-learn and Python stack tutorials for those with an existing understanding of machine learning looking to jump right into using a new set of tools.
IPython, Python, scikit-learn, Tutorials
Top 10 Machine Learning Projects on Github - Dec 14, 2015.
The top 10 machine learning projects on Github include a number of libraries, frameworks, and education resources. Have a look at the tools others are using, and the resources they are learning from.
Pages: 1 2
GitHub, Machine Learning, Matthew Mayo, Open Source, scikit-learn, Top 10
- Top New Features in Orange 3 Data Mining Platform - Dec 10, 2015.
The main technical advantage of Orange 3 is its integration with NumPy and SciPy libraries. Other improvements include reading online data, working through queries for SQL and pre-processing.
Pages: 1 2
Data Mining, Data Visualization, numpy, Orange, Python, scikit-learn
- Make Beautiful Interactive Data Visualizations Easily, Dec 15 Webinar - Dec 7, 2015.
Learn how to use Bokeh interactive visualization framework for open data science to create rich, interactive visualizations in the browser, without writing a line of JavaScript, HTML, or CSS.
Anaconda, Bokeh, Continuum Analytics, Data Visualization, scikit-learn
- 7 Steps to Mastering Machine Learning With Python - Nov 19, 2015.
There are many Python machine learning resources freely available online. Where to begin? How to proceed? Go from zero to Python machine learning hero in 7 steps!
Pages: 1 2
7 Steps, Anaconda, Caffe, Deep Learning, Machine Learning, Matthew Mayo, Python, scikit-learn, Theano
- R vs Python: head to head data analysis - Oct 13, 2015.
The epic battle between R vs Python goes on. Here we are comparing both of them in terms of generic tasks of data scientist’s like reading CSV, finding data summary, PCA, model building, plotting, and many more.
Pages: 1 2 3
Data Visualization, Python, Python vs R, R, scikit-learn, Vik Paruchuri
- Top 10 Quora Data Science Writers and Their Best Advice - Sep 17, 2015.
Top Quora data science writers give their advice on pursuing a career in the field, approaching interviews, and selecting appropriate technologies.
Data Science, Quora, scikit-learn, Top 10
- NYC Data Science Academy courses & bootcamps in Data Engineering, Data Science, R, Python, and Machine Learning - Jul 31, 2015.
Upcoming training from NYC Data Science Academy: 6-Week Intensive Data Engineering Bootcamp, 12-Week Data Science Bootcamp, courses in R, Python, Data Science and Machine Learning, and more.
Apache Spark, Bootcamp, Data Science Education, Hadoop, Machine Learning, New York City, NY, NYC Data Science Academy, Python, R, scikit-learn
- Continually Updated Data Science IPython Notebooks - Jul 13, 2015.
Continually updated Data Science IPython Notebooks: Spark, Hadoop MapReduce, HDFS, AWS, Kaggle, scikit-learn, matplotlib, pandas, NumPy, SciPy, and various command lines.
GitHub, IPython, Python, scikit-learn
- Top 20 Python Machine Learning Open Source Projects - Jun 1, 2015.
We examine top Python Machine learning open source projects on Github, both in terms of contributors and commits, and identify most popular and most active ones.
GitHub, Machine Learning, Open Source, Python, scikit-learn
- Top /r/MachineLearning Posts, Apr 5-11: Amazon Machine Learning, Numerical Optimization, and Conditional Random Fields - Apr 14, 2015.
Amazon Machine Learning as a Service, Numerical Optimization, Extracting data from NYTimes recipes, Intro to Machine Learning with sci-kit, and more.
Amazon, Deep Learning, Kaggle, Machine Learning, Probability, Python, Reddit, scikit-learn
- Top /r/MachineLearning Posts, Mar 29-Apr 4: Andrew Ng AMA, Deep Learning for NLP, and OpenCL Convnets - Apr 10, 2015.
Andrew Ng's upcoming AMA, scikit-learn updates, Richard Socher's Deep Learning NLP videos, Criteo's huge new dataset, and convolutional neural networks on OpenCL are the top topics discussed this week on /r/MachineLearning.
Andrew Ng, Convolutional Neural Networks, Datasets, Deep Learning, NLP, Python, Reddit, scikit-learn
- NYC Data Science Courses, Bootcamps, Meetups - Mar 17, 2015.
NYC Data Science Academy spring schedule includes 3 classes, 3 Meetups, 7 bootcamp events on Data Science, R, Python, Machine Learning, scikit-learn, and related topics.
Bootcamp, Knewton, Machine Learning, Meetup, New York City, NY, NYC Data Science Academy, Python, R, scikit-learn
- Machine Learning Table of Elements Decoded - Mar 11, 2015.
Machine learning packages for Python, Java, Big Data, Lua/JS/Clojure, Scala, C/C++, CV/NLP, and R/Julia are represented using a cute but ill-fitting metaphor of a periodic table. We extract the useful links.
Big Data Software, Java, Julia, Machine Learning, NLP, Python, R, Scala, scikit-learn, Weka
- Top /r/MachineLearning Posts, Mar 1-7: Stanford Deep Learning for NLP, Machine Learning with Scikit-learn - Mar 9, 2015.
This week on /r/MachineLearning, we have a new NLP-focused deep learning course from Stanford, an introduction to scikit-learn, visualization of music collections, an implementation of DeepMind, and NLP using deep learning and Torch.
Deep Learning, DeepMind, Facebook, GPU, Python, Reddit, scikit-learn, Torch
- Open Source Tools for Machine Learning - Dec 17, 2014.
Open source machine learning software makes it easier to implement machine learning solutions on single computers and at scale, and the diversity of packages provide more options for implementers.
Free Data Mining Software, Free Software, Open Source, scikit-learn, Weka
- Top KDnuggets tweets, Dec 8-9: On the effects Analytics bring to enterprises; Use IBM #WatsonAnalytics to Crunch Data For Free - Dec 10, 2014.
On the effects Analytics bring to enterprises; Anyone Can Now Use IBM #WatsonAnalytics to Crunch Data For Free; Economists are NOT nonpartisan - @FiveThirtyEight quantifies their bias; Geoff Hinton AMA: Neural Networks, the Brain, and Machine Learning.
Alan Turing, FiveThirtyEight, Geoff Hinton, IBM Watson, KPMG, Pinterest, Python, scikit-learn
- Top KDnuggets tweets, Sep 3-9: What is Big Data – definitions from thought leaders - Sep 12, 2014.
What Is #BigData? Definitions from 40+ thought leaders; Fewer companies are hiring Data Scientists but #DataScience is still hot; Choosing the right estimator scikit-learn #CheatSheet; How do Twitter Analytics show followers gender, when they dont ask?
Big Data, scikit-learn, Twitter
- Top KDnuggets tweets, Aug 4-5: Ensemble Methods, a brief history; Data Scientist role shifting - Aug 6, 2014.
Ensemble Methods are the backbone of #MachineLearning - a brief history; Data Scientist role shifting, with companies focusing on Developers; To add #MachineLearning for Python, scikit-learn; for Hadoop: Mahout; Meet Fortune 2014 #BigData All-Stars: data scientists, entrepreneurs, CEOs.
Apache Mahout, Data Scientist, Ensemble Methods, Machine Learning, scikit-learn
- Top KDnuggets tweets, Jun 6-8: Statistical-learning tutorial w. scikit-learn; Data science vs the hunch - Jun 9, 2014.
A tutorial on statistical learning with with scikit-learn ; Data science vs the hunch: When data contradicts manager gut instinct; Stanford University: Data Analyst ; Data Lakes vs Data Warehouses.
Data Lakes, Data Science, Hunch, scikit-learn, Stanford, Tutorial
- Top KDnuggets tweets, Apr 16-17 - Apr 19, 2014.
Scikit-Learn: a great python library for machine learning; A map of where nobody lives in the US; Apache Spark, the hot new trend in Big Data ; NYU @aghose on Est. Demand for Mobile Apps - Learn more: NYU Stern MS in Biz Analytics.
Apache Spark, MS in Business Analytics, NYU, Python, scikit-learn, US Census
- Top KDnuggets tweets, Mar 10-11: Deep Learning overview, free book; Best machine learning interview questions - Mar 12, 2014.
Deep Learning: Methods and Application, free book from Microsoft; Best interview questions to evaluate a machine learning researcher; Good list of Machine Learning Libraries in Python: scikit-learn, pandas, Theano, NLTK.
Dancing, Deep Learning, Healthcare, Interview Questions, Machine Learning, Python, scikit-learn