- How to Deal with Categorical Data for Machine Learning - May 24, 2021.
Check out this guide to implementing different types of encoding for categorical data, including a cheat sheet on when to use what type.
- A Comprehensive Guide to Ensemble Learning – Exactly What You Need to Know - May 6, 2021.
This article covers ensemble learning methods, and exactly what you need to know in order to understand and implement them.
- Gradient Boosted Decision Trees – A Conceptual Explanation - Apr 30, 2021.
Gradient boosted decision trees involves implementing several models and aggregating their results. These boosted models have become popular thanks to their performance in machine learning competitions on Kaggle. In this article, we’ll see what gradient boosted decision trees are all about.
- The Most In-Demand Skills for Data Scientists in 2021 - Apr 15, 2021.
If you are preparing to make a career as a Data Scientist or are looking for opportunities to skill-up in your current role, this analysis of in-demand skills for 2021, based on over 15,000 Data Scientist job postings, should offer you a good idea as to which programming languages and software tools are increasing and decreasing in importance.
- Top 10 Python Libraries Data Scientists should know in 2021 - Mar 24, 2021.
So many Python libraries exist that offer powerful and efficient foundations for supporting your data science work and machine learning model development. While the list may seem overwhelming, there are certain libraries you should focus your time on, as they are some of the most commonly used today.
- KDnuggets™ News 21:n12, Mar 24: More Data Science Cheat sheets; Top YouTube Channels for Machine Learning - Mar 24, 2021.
Happy with your job or not? Either way, vote in KDnuggets Poll on Data Job Satisfaction
to help us understand the current situation.
In this issue, More data science cheatsheets; How to create your data science portfolio; The best machine learning frameworks and extensions for scikit-learn; Top youtube channels for machine learning; dbt, the ETL and ELT disrupter;
- The Best Machine Learning Frameworks & Extensions for Scikit-learn - Mar 22, 2021.
Learn how to use a selection of packages to extend the functionality of Scikit-learn estimators.
- KDnuggets™ News 21:n10, Mar 10: More Resources for Women in AI, Data Science, and Machine Learning; Speeding up Scikit-Learn Model Training - Mar 10, 2021.
More Resources for Women in AI, Data Science, and Machine Learning; Speeding up Scikit-Learn Model Training; Dask and Pandas: No Such Thing as Too Much Data; 9 Skills You Need to Become a Data Engineer; 8 Women in AI Who Are Striving to Humanize the World
- Speeding up Scikit-Learn Model Training - Mar 5, 2021.
If your scikit-learn models are taking a bit of time to train, then there are several techniques you can use to make the processing more efficient. From optimizing your model configuration to leveraging libraries to speed up training through parallelization, you can build the best scikit-learn model possible in the least amount of time.
- Bayesian Hyperparameter Optimization with tune-sklearn in PyCaret - Mar 5, 2021.
PyCaret, a low code Python ML library, offers several ways to tune the hyper-parameters of a created model. In this post, I'd like to show how Ray Tune is integrated with PyCaret, and how easy it is to leverage its algorithms and distributed computing to achieve results superior to default random search method.
- Distributed and Scalable Machine Learning [Webinar] - Feb 17, 2021.
Mike McCarty and Gil Forsyth work at the Capital One Center for Machine Learning, where they are building internal PyData libraries that scale with Dask and RAPIDS. For this webinar, Feb 23 @ 2 pm PST, 5pm EST, they’ll join Hugo Bowne-Anderson and Matthew Rocklin to discuss their journey to scale data science and machine learning in Python.
- How to Speed up Scikit-Learn Model Training - Feb 11, 2021.
Scikit-Learn is an easy to use a Python library for machine learning. However, sometimes scikit-learn models can take a long time to train. The question becomes, how do you create the best scikit-learn model in the least amount of time?
- Build Your First Data Science Application - Feb 4, 2021.
Check out these seven Python libraries to make your first data science MVP application.
- KDnuggets™ News 21:n04, Jan 27: The Ultimate Scikit-Learn Machine Learning Cheatsheet; Building a Deep Learning Based Reverse Image Search - Jan 27, 2021.
The Ultimate Scikit-Learn Machine Learning Cheatsheet; Building a Deep Learning Based Reverse Image Search; Data Engineering — the Cousin of Data Science, is Troublesome; Going Beyond the Repo: GitHub for Career Growth in AI & Machine Learning; Popular Machine Learning Interview Questions
- The Ultimate Scikit-Learn Machine Learning Cheatsheet - Jan 25, 2021.
With the power and popularity of the scikit-learn for machine learning in Python, this library is a foundation to any practitioner's toolset. Preview its core methods with this review of predictive modelling, clustering, dimensionality reduction, feature importance, and data transformation.
- K-Means 8x faster, 27x lower error than Scikit-learn in 25 lines - Jan 15, 2021.
K-means clustering is a powerful algorithm for similarity searches, and Facebook AI Research's faiss library is turning out to be a speed champion. With only a handful of lines of code shared in this demonstration, faiss outperforms the implementation in scikit-learn in speed and accuracy.
- How to use Machine Learning for Anomaly Detection and Conditional Monitoring - Dec 16, 2020.
This article explains the goals of anomaly detection and outlines the approaches used to solve specific use cases for anomaly detection and condition monitoring.
- 5 Most Useful Machine Learning Tools every lazy full-stack data scientist should use - Nov 18, 2020.
If you consider yourself a Data Scientist who can take any project from data curation to solution deployment, then you know there are many tools available today to help you get the job done. The trouble is that there are too many choices. Here is a review of five sets of tools that should turn you into the most efficient full-stack data scientist possible.
- Most Popular Distance Metrics Used in KNN and When to Use Them - Nov 11, 2020.
For calculating distances KNN uses a distance metric from the list of available metrics. Read this article for an overview of these metrics, and when they should be considered for use.
- Feature Ranking with Recursive Feature Elimination in Scikit-Learn - Oct 19, 2020.
This article covers using scikit-learn to obtain the optimal number of features for your machine learning project.
- Modern Data Science Skills: 8 Categories, Core Skills, and Hot Skills - Sep 8, 2020.
We analyze the results of the Data Science Skills poll, including 8 categories of skills, 13 core skills that over 50% of respondents have, the emerging/hot skills that data scientists want to learn, and what is the top skill that Data Scientists want to learn.
- 10 Things You Didn’t Know About Scikit-Learn - Sep 3, 2020.
Check out these 10 things you didn’t know about Scikit-Learn... until now.
- Why would you put Scikit-learn in the browser? - Jul 22, 2020.
Honestly? I don’t know. But I do think WebAssembly is a good target for ML/AI deployment (in the browser and beyond).
- Simplified Mixed Feature Type Preprocessing in Scikit-Learn with Pipelines - Jun 16, 2020.
There is a quick and easy way to perform preprocessing on mixed feature type data in Scikit-Learn, which can be integrated into your machine learning pipelines.
- Centroid Initialization Methods for k-means Clustering - Jun 10, 2020.
This article is the first in a series of articles looking at the different aspects of k-means clustering, beginning with a discussion on centroid initialization.
- Dataset Splitting Best Practices in Python - May 26, 2020.
If you are splitting your dataset into training and testing data you need to keep some things in mind. This discussion of 3 best practices to keep in mind when doing so includes demonstration of how to implement these particular considerations in Python.
- Sparse Matrix Representation in Python - May 19, 2020.
Leveraging sparse matrix representations for your data when appropriate can spare you memory storage. Have a look at the reasons why, see how to create sparse matrices in Python using Scipy, and compare the memory requirements for standard and sparse representations of the same data.
- 5 Great New Features in Scikit-learn 0.23 - May 15, 2020.
Check out 5 new features of the latest Scikit-learn release, including the ability to visualize estimators in notebooks, improvements to both k-means and gradient boosting, some new linear model implementations, and sample weight support for a pair of existing regressors.
- Introduction to the K-nearest Neighbour Algorithm Using Examples - Apr 1, 2020.
Read this concise summary of KNN, a supervised and pattern classification learning algorithm which helps us find which class the new input belongs to when k nearest neighbours are chosen and distance is calculated between them.
- Practical Hyperparameter Optimization - Feb 13, 2020.
An introduction on how to fine-tune Machine and Deep Learning models using techniques such as: Random Search, Automated Hyperparameter Tuning and Artificial Neural Networks Tuning.
- 5 Great New Features in Latest Scikit-learn Release - Dec 10, 2019.
From not sweating missing values, to determining feature importance for any estimator, to support for stacking, and a new plotting API, here are 5 new features of the latest release of Scikit-learn which deserve your attention.
- Beginners Guide to the Three Types of Machine Learning - Nov 13, 2019.
The following article is an introduction to classification and regression — which are known as supervised learning — and unsupervised learning — which in the context of machine learning applications often refers to clustering — and will include a walkthrough in the popular python library scikit-learn.
- KDnuggets™ News 19:n41, Oct 30: Feature Selection: Beyond feature importance?; Time Series Analysis Using KNIME and Spark - Oct 30, 2019.
This week in KDnuggets: Feature Selection: Beyond feature importance?; Time Series Analysis: A Simple Example with KNIME and Spark; 5 Advanced Features of Pandas and How to Use Them; How to Measure Foot Traffic Using Data Analytics; Introduction to Natural Language Processing (NLP); and much, much more!
- How to Extend Scikit-learn and Bring Sanity to Your Machine Learning Workflow - Oct 29, 2019.
In this post, learn how to extend Scikit-learn code to make your experiments easier to maintain and reproduce.
- Scikit-Learn & More for Synthetic Dataset Generation for Machine Learning - Sep 19, 2019.
While mature algorithms and extensive open-source libraries are widely available for machine learning practitioners, sufficient data to apply these techniques remains a core challenge. Discover how to leverage scikit-learn and other tools to generate synthetic data appropriate for optimizing and fine-tuning your models.
- KDnuggets™ News 19:n35, Sep 18: Which Data Science Skills are core and which are hot/emerging ones?; There is No Free Lunch in Data Science Features - Sep 18, 2019.
Check the results of KDnuggets' latest poll to find out which data science skills are core and which are hot/emerging ones; why is there no free lunch in data science?; training Scikit-learn 100x faster; poking fun at unsupervised machine learning; exploring the case for ensemble learning. All this and much more this week on KDnuggets.
- Train sklearn 100x Faster - Sep 11, 2019.
As compute gets cheaper and time to market for machine learning solutions becomes more critical, we’ve explored options for speeding up model training. One of those solutions is to combine elements from Spark and scikit-learn into our own hybrid solution.
- Scikit-Learn vs mlr for Machine Learning - Sep 10, 2019.
How does the scikit-learn machine learning library for Python compare to the mlr package for R? Following along with a machine learning workflow through each approach, and see if you can gain a competitive advantage by knowing both frameworks.
- Understanding Decision Trees for Classification in Python - Aug 21, 2019.
This tutorial covers decision trees for classification also known as classification trees, including the anatomy of classification trees, how classification trees make predictions, using scikit-learn to make classification trees, and hyperparameter tuning.
- How to Learn Python for Data Science the Right Way - Jun 14, 2019.
The biggest mistake you can make while learning Python for data science is to learn Python programming from courses meant for programmers. Avoid this mistake, and learn Python the right way by following this approach.
- What you need to know: The Modern Open-Source Data Science/Machine Learning Ecosystem - Jun 10, 2019.
We identify the 6 tools in the modern open-source Data Science ecosystem, examine the Python vs R question, and determine which tools are used the most with Deep Learning and Big Data.
- 7 Steps to Mastering Intermediate Machine Learning with Python — 2019 Edition - Jun 3, 2019.
This is the second part of this new learning path series for mastering machine learning with Python. Check out these 7 steps to help master intermediate machine learning with Python!
- Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis - May 30, 2019.
Python continues to lead the top Data Science platforms, but R and RapidMiner hold their share; Almost 50% have used Deep Learning tools; SQL is steady; Consolidation continues.
Pages: 1 2
- Naive Bayes: A Baseline Model for Machine Learning Classification Performance - May 7, 2019.
We can use Pandas to conduct Bayes Theorem and Scikitlearn to implement the Naive Bayes Algorithm. We take a step by step approach to understand Bayes and implementing the different options in Scikitlearn.
Pages: 1 2
- Unleash a faster Python on your data - Apr 18, 2019.
Intel’s optimized Python packages deliver quick repeatable results compared to standard Python packages. Intel offers optimized Scikit-learn, Numpy, and SciPy to help data scientists get rapid results on their Intel® hardware. Download now.
- A Beginner’s Guide to Linear Regression in Python with Scikit-Learn - Mar 29, 2019.
What linear regression is and how it can be implemented for both two variables and multiple variables using Scikit-Learn, which is one of the most popular machine learning libraries for Python.
Pages: 1 2
- Feature Reduction using Genetic Algorithm with Python - Mar 25, 2019.
This tutorial discusses how to use the genetic algorithm (GA) for reducing the feature vector extracted from the Fruits360 dataset in Python mainly using NumPy and Sklearn.
Pages: 1 2
- Top KDnuggets tweets, Feb 13-19: Intro to Scikit Learn: The Gold Standard of Python ML; The Essential Data Science Venn Diagram - Feb 20, 2019.
Also: Cartoon: #MachineLearning Problems in 2118 #ValentinesDay; A must-read tutorial when you are starting your journey with #DeepLearning.
- Python Data Science for Beginners - Feb 20, 2019.
Python’s syntax is very clean and short in length. Python is open-source and a portable language which supports a large standard library. Buy why Python for data science? Read on to find out more.
- KDnuggets™ News 19:n08, Feb 20: The Gold Standard of Python Machine Learning; The Analytics Engineer – new role in the data team - Feb 20, 2019.
Intro to scikit-learn; how to set up a Python ML environment; why there should be a new role in the Data Science team; how to learn one of the hardest parts of being a Data Scientist; and how explainable is BERT?
- An Introduction to Scikit Learn: The Gold Standard of Python Machine Learning - Feb 13, 2019.
If you’re going to do Machine Learning in Python, Scikit Learn is the gold standard. Scikit-learn provides a wide selection of supervised and unsupervised learning algorithms. Best of all, it’s by far the easiest and cleanest ML library.
- Automated Machine Learning in Python - Jan 18, 2019.
An organization can also reduce the cost of hiring many experts by applying AutoML in their data pipeline. AutoML also reduces the amount of time it would take to develop and test a machine learning model.
- A Guide to Decision Trees for Machine Learning and Data Science - Dec 24, 2018.
What makes decision trees special in the realm of ML models is really their clarity of information representation. The “knowledge” learned by a decision tree through training is directly formulated into a hierarchical structure.
- KDnuggets™ News 18:n41, Oct 31: Introduction to Deep Learning with Keras; Easy Named Entity Recognition with Scikit-Learn - Oct 31, 2018.
Also: Generative Adversarial Networks - Paper Reading Road Map; How I Learned to Stop Worrying and Love Uncertainty; Implementing Automated Machine Learning Systems with Open Source Tools; Notes on Feature Preprocessing: The What, the Why, and the How
- Notes on Feature Preprocessing: The What, the Why, and the How - Oct 26, 2018.
This article covers a few important points related to the preprocessing of numeric data, focusing on the scaling of feature values, and the broad question of dealing with outliers.
- Unleash a Faster Python on Your Data - Oct 2, 2018.
Intel provides optimized Scikit-learn, the most used Python package for classical machine learning. Get faster scikit-learn through Intel® Distribution for Python*
- Iterative Initial Centroid Search via Sampling for k-Means Clustering - Sep 12, 2018.
Thinking about ways to find a better set of initial centroid positions is a valid approach to optimizing the k-means clustering process. This post outlines just such an approach.
- Deploying scikit-learn Models at Scale - Aug 29, 2018.
Find out how to serve your scikit-learn model in an auto-scaling, serverless environment! Today, we’ll take a trained scikit-learn model and deploy it on Cloud ML Engine.
- Multi-Class Text Classification with Scikit-Learn - Aug 27, 2018.
The vast majority of text classification articles and tutorials on the internet are binary text classification such as email spam filtering and sentiment analysis. Real world problem are much more complicated than that.
- Building Reliable Machine Learning Models with Cross-validation - Aug 9, 2018.
Cross-validation is frequently used to train, measure and finally select a machine learning model for a given dataset because it helps assess how the results of a model will generalize to an independent data set in practice.
- [ebook] Apache Spark™ Under the Hood - Jun 27, 2018.
Learn how to install and run Spark yourself; A summary of Spark core architecture and concepts; Spark powerful language APIs and how you can use them.
- Top 20 Python Libraries for Data Science in 2018 - Jun 27, 2018.
Our selection actually contains more than 20 libraries, as some of them are alternatives to each other and solve the same problem. Therefore we have grouped them as it's difficult to distinguish one particular leader at the moment.
Pages: 1 2
- The 6 components of Open-Source Data Science/ Machine Learning Ecosystem; Did Python declare victory over R? - Jun 6, 2018.
We find 6 tools form the modern open source Data Science / Machine Learning ecosystem; examine whether Python declared victory over R; and review which tools are most associated with Deep Learning and Big Data.
- How I Unknowingly Contributed To Open Source - Apr 24, 2018.
This article explains what is meant by the term 'open source' and why all data scientists should be a part of it.
- Top 20 Python AI and Machine Learning Open Source Projects - Feb 20, 2018.
We update the top AI and Machine Learning projects in Python. Tensorflow has moved to the first place with triple-digit growth in contributors. Scikit-learn dropped to 2nd place, but still has a very large base of contributors.
- Introduction to Python Ensembles - Feb 9, 2018.
In this post, we'll take you through the basics of ensembles — what they are and why they work so well — and provide a hands-on tutorial for building basic ensembles.
Pages: 1 2
- 5 Machine Learning Projects You Should Not Overlook - Feb 8, 2018.
It's about that time again... 5 more machine learning or machine learning-related projects you may not yet have heard of, but may want to consider checking out!
- KDnuggets™ News 18:n05, Jan 31: Feynman Technique to become a Data Scientist; 4 Big Data Trends for 2018; Data Scientist – best job in America - Jan 31, 2018.
Also How To Grow As A Data Scientist; A Beginner Guide to Data Engineering; Exclusive Interview: Doug Laney on Big Data and Infonomics
- Using AutoML to Generate Machine Learning Pipelines with TPOT - Jan 29, 2018.
This post will take a different approach to constructing pipelines. Certainly the title gives away this difference: instead of hand-crafting pipelines and hyperparameter optimization, and performing model selection ourselves, we will instead automate these processes.
- Managing Machine Learning Workflows with Scikit-learn Pipelines Part 3: Multiple Models, Pipelines, and Grid Searches - Jan 24, 2018.
In this post, we will be using grid search to optimize models built from a number of different types estimators, which we will then compare and properly evaluate the best hyperparameters that each model has to offer.
- Managing Machine Learning Workflows with Scikit-learn Pipelines Part 2: Integrating Grid Search - Jan 19, 2018.
Another simple yet powerful technique we can pair with pipelines to improve performance is grid search, which attempts to optimize model hyperparameter combinations.
- Managing Machine Learning Workflows with Scikit-learn Pipelines Part 1: A Gentle Introduction - Dec 7, 2017.
Scikit-learn's Pipeline class is designed as a manageable way to apply a series of data transformations followed by the application of an estimator.
- Choosing an Open Source Machine Learning Library: TensorFlow, Theano, Torch, scikit-learn, Caffe - Nov 8, 2017.
Open Source is the heart of innovation and rapid evolution of technologies, these days. Here we discuss how to choose open source machine learning tools for different use cases.
Pages: 1 2
- Visualizing Cross-validation Code - Sep 5, 2017.
Cross-validation helps to improve your prediction using the K-Fold strategy. What is K-Fold you asked? Check out this post for a visualized explanation.
- Simplifying Decision Tree Interpretability with Python & Scikit-learn - May 19, 2017.
This post will look at a few different ways of attempting to simplify decision tree representation and, ultimately, interpretability. All code is in Python, with Scikit-learn being used for the decision tree modeling.
- Introducing Dask-SearchCV: Distributed hyperparameter optimization with Scikit-Learn - May 12, 2017.
We introduce a new library for doing distributed hyperparameter optimization with Scikit-Learn estimators. We compare it to the existing Scikit-Learn implementations, and discuss when it may be useful compared to other approaches.
- The Guerrilla Guide to Machine Learning with Python - May 1, 2017.
Here is a bare bones take on learning machine learning with Python, a complete course for the quick study hacker with no time (or patience) to spare.
- Top KDnuggets tweets, Apr 19-25: 10 Free Must-Read Books for Machine Learning and Data Science - Apr 26, 2017.
Also Practical #DeepLearning For Coders-18 hours of free lessons; Different views of #Machinelearning #cartoon #humor; Scikit-learn #MachineLearning classification algorithms.
- 5 Machine Learning Projects You Can No Longer Overlook, April - Apr 13, 2017.
It's about that time again... 5 more machine learning or machine learning-related projects you may not yet have heard of, but may want to consider checking out. Find tools for data exploration, topic modeling, high-level APIs, and feature selection herein.
- Email Spam Filtering: An Implementation with Python and Scikit-learn - Mar 17, 2017.
This post is an overview of a spam filtering implementation using Python and Scikit-learn. The results of 2 classifiers are contrasted and compared: multinomial Naive Bayes and support vector machines.
- K-Means & Other Clustering Algorithms: A Quick Intro with Python - Mar 8, 2017.
In this intro cluster analysis tutorial, we'll check out a few algorithms in Python so you can get a basic understanding of the fundamentals of clustering on a real dataset.
- A Simple XGBoost Tutorial Using the Iris Dataset - Mar 7, 2017.
This is an overview of the XGBoost machine learning algorithm, which is fast and shows good results. This example uses multiclass prediction with the Iris dataset from Scikit-learn.
- Top /r/MachineLearning Posts, February: Oxford Deep NLP Course; Data Visualization for Scikit-learn Results - Mar 6, 2017.
Oxford Deep NLP Course; scikit-plot: Data Visualization for Scikit-learn Results; Machine Learning at Berkeley's ML Crash Course: Neural Networks; Predicting parking difficulty with machine learning; TensorFlow 1.0 Release
- 7 More Steps to Mastering Machine Learning With Python - Mar 1, 2017.
This post is a follow-up to last year's introductory Python machine learning post, which includes a series of tutorials for extending your knowledge beyond the original.
Pages: 1 2
- Moving from R to Python: The Libraries You Need to Know - Feb 24, 2017.
Are you considering making a move from R to Python? Here are the libraries you need to know, how they stack up to their R contemporaries, and why you should learn them.
- What is a Support Vector Machine, and Why Would I Use it? - Feb 23, 2017.
Support Vector Machine has become an extremely popular algorithm. In this post I try to give a simple explanation for how it works and give a few examples using the the Python Scikits libraries.
- Learn how to Develop and Deploy a Gradient Boosting Machine Model - Jan 20, 2017.
GBM is one the hottest machine learning methods. Learn how to create GBM using SciKit-Learn and Python and
understand the steps required to transform features, train, and deploy a GBM.
- Top KDnuggets tweets, Jan 04-10: Cartoon: When Self-Driving Car takes you too far; A massive collection of free programming books - Jan 11, 2017.
Also AI #DataScience #MachineLearning: Main Developments 2016, Key Trends 2017; Scikit-Learn Cheat Sheet: #Python #MachineLearning
- 5 Machine Learning Projects You Can No Longer Overlook, January - Jan 2, 2017.
There are a lot of popular machine learning projects out there, but many more that are not. Which of these are actively developed and worth checking out? Here is an offering of 5 such projects, the most recent in an ongoing series.
- Introduction to Machine Learning for Developers - Nov 28, 2016.
Whether you are integrating a recommendation system into your app or building a chat bot, this guide will help you get started in understanding the basics of machine learning.
Pages: 1 2
- Top 20 Python Machine Learning Open Source Projects, updated - Nov 21, 2016.
Open Source is the heart of innovation and rapid evolution of technologies, these days. This article presents you Top 20 Python Machine Learning Open Source Projects of 2016 along with very interesting insights and trends found during the analysis.
- Automated Machine Learning: An Interview with Randy Olson, TPOT Lead Developer - Oct 28, 2016.
Read an insightful interview with Randy Olson, Senior Data Scientist at University of Pennsylvania Institute for Biomedical Informatics, and lead developer of TPOT, an open source Python tool that intelligently automates the entire machine learning process.
- KDnuggets™ News 16:n38, Oct 26: Free Machine Learning EBooks; Neural Networks in Python with Scikit-learn - Oct 26, 2016.
5 EBooks to Read Before Getting into A Machine Learning Career; A Beginner's Guide to Neural Networks with Python and Scikit-learn 0.18!; New Poll: What was the largest dataset you analyzed / data mined?; Jupyter Notebook Best Practices for Data Science
- A Beginner’s Guide to Neural Networks with Python and SciKit Learn 0.18! - Oct 20, 2016.
This post outlines setting up a neural network in Python using Scikit-learn, the latest version of which now has built in support for Neural Network models.
Pages: 1 2
- Automated Data Science & Machine Learning: An Interview with the Auto-sklearn Team - Oct 4, 2016.
This is an interview with the authors of the recent winning KDnuggets Automated Data Science and Machine Learning blog contest entry, which provided an overview of the Auto-sklearn project. Learn more about the authors, the project, and automated data science.
- O’Reilly Live Training–Real-time. Real experts. Real learning. - Sep 26, 2016.
Get intensive, hands-on training from O'Reilly's expert network on critical data topics - from SQL fundamentals to distributed computing; enterprise strategy to data science at scale.
- Top Machine Learning Projects for Julia - Aug 19, 2016.
Julia is gaining traction as a legitimate alternative programming language for analytics tasks. Learn more about these 5 machine learning related projects.
- Contest Winner: Winning the AutoML Challenge with Auto-sklearn - Aug 5, 2016.
This post is the first place prize recipient in the recent KDnuggets blog contest. Auto-sklearn is an open-source Python tool that automatically determines effective machine learning pipelines for classification and regression datasets. It is built around the successful scikit-learn library and won the recent AutoML challenge.
- Would You Survive the Titanic? A Guide to Machine Learning in Python Part 1 - Jul 25, 2016.
Check out the first of a 3 part introductory series on machine learning in Python, fueled by the Titanic dataset. This is a great place to start for a machine learning newcomer.
- Semi-supervised Feature Transfer: The Practical Benefit of Deep Learning Today? - Jul 12, 2016.
This post evaluates four different strategies for solving a problem with machine learning, where customized models built from semi-supervised "deep" features using transfer learning outperform models built from scratch, and rival state-of-the-art methods.
Pages: 1 2 3
- 5 Machine Learning Projects You Can No Longer Overlook - May 19, 2016.
We all know the big machine learning projects out there: Scikit-learn, TensorFlow, Theano, etc. But what about the smaller niche projects that are actively developed, providing useful services to users? Here are 5 such projects.
- TPOT: A Python Tool for Automating Data Science - May 13, 2016.
TPOT is an open-source Python data science automation tool, which operates by optimizing a series of feature preprocessors and models, in order to maximize cross-validation accuracy on data sets.
Pages: 1 2
- Scikit Flow: Easy Deep Learning with TensorFlow and Scikit-learn - Feb 12, 2016.
Scikit Learn is a new easy-to-use interface for TensorFlow from Google based on the Scikit-learn fit/predict model. Does it succeed in making deep learning more accessible?
- Auto-Scaling scikit-learn with Spark - Feb 11, 2016.
Databricks gives us an overview of the spark-sklearn library, which automatically and seamlessly distributes model tuning on a Spark cluster, without impacting workflow.
- Scikit-learn and Python Stack Tutorials: Introduction, Implementing Classifiers - Jan 18, 2016.
A small collection of introductory scikit-learn and Python stack tutorials for those with an existing understanding of machine learning looking to jump right into using a new set of tools.
- Top 10 Machine Learning Projects on Github - Dec 14, 2015.
The top 10 machine learning projects on Github include a number of libraries, frameworks, and education resources. Have a look at the tools others are using, and the resources they are learning from.
Pages: 1 2
- Top New Features in Orange 3 Data Mining Platform - Dec 10, 2015.
The main technical advantage of Orange 3 is its integration with NumPy and SciPy libraries. Other improvements include reading online data, working through queries for SQL and pre-processing.
Pages: 1 2
- Make Beautiful Interactive Data Visualizations Easily, Dec 15 Webinar - Dec 7, 2015.
- 7 Steps to Mastering Machine Learning With Python - Nov 19, 2015.
There are many Python machine learning resources freely available online. Where to begin? How to proceed? Go from zero to Python machine learning hero in 7 steps!
Pages: 1 2
- R vs Python: head to head data analysis - Oct 13, 2015.
The epic battle between R vs Python goes on. Here we are comparing both of them in terms of generic tasks of data scientist’s like reading CSV, finding data summary, PCA, model building, plotting, and many more.
Pages: 1 2 3
- Top 10 Quora Data Science Writers and Their Best Advice - Sep 17, 2015.
Top Quora data science writers give their advice on pursuing a career in the field, approaching interviews, and selecting appropriate technologies.
- NYC Data Science Academy courses & bootcamps in Data Engineering, Data Science, R, Python, and Machine Learning - Jul 31, 2015.
Upcoming training from NYC Data Science Academy: 6-Week Intensive Data Engineering Bootcamp, 12-Week Data Science Bootcamp, courses in R, Python, Data Science and Machine Learning, and more.
- Continually Updated Data Science IPython Notebooks - Jul 13, 2015.
Continually updated Data Science IPython Notebooks: Spark, Hadoop MapReduce, HDFS, AWS, Kaggle, scikit-learn, matplotlib, pandas, NumPy, SciPy, and various command lines.
- Top 20 Python Machine Learning Open Source Projects - Jun 1, 2015.
We examine top Python Machine learning open source projects on Github, both in terms of contributors and commits, and identify most popular and most active ones.
- Top /r/MachineLearning Posts, Apr 5-11: Amazon Machine Learning, Numerical Optimization, and Conditional Random Fields - Apr 14, 2015.
Amazon Machine Learning as a Service, Numerical Optimization, Extracting data from NYTimes recipes, Intro to Machine Learning with sci-kit, and more.
- Top /r/MachineLearning Posts, Mar 29-Apr 4: Andrew Ng AMA, Deep Learning for NLP, and OpenCL Convnets - Apr 10, 2015.
Andrew Ng's upcoming AMA, scikit-learn updates, Richard Socher's Deep Learning NLP videos, Criteo's huge new dataset, and convolutional neural networks on OpenCL are the top topics discussed this week on /r/MachineLearning.
- NYC Data Science Courses, Bootcamps, Meetups - Mar 17, 2015.
NYC Data Science Academy spring schedule includes 3 classes, 3 Meetups, 7 bootcamp events on Data Science, R, Python, Machine Learning, scikit-learn, and related topics.
- Machine Learning Table of Elements Decoded - Mar 11, 2015.
Machine learning packages for Python, Java, Big Data, Lua/JS/Clojure, Scala, C/C++, CV/NLP, and R/Julia are represented using a cute but ill-fitting metaphor of a periodic table. We extract the useful links.
- Top /r/MachineLearning Posts, Mar 1-7: Stanford Deep Learning for NLP, Machine Learning with Scikit-learn - Mar 9, 2015.
This week on /r/MachineLearning, we have a new NLP-focused deep learning course from Stanford, an introduction to scikit-learn, visualization of music collections, an implementation of DeepMind, and NLP using deep learning and Torch.
- Open Source Tools for Machine Learning - Dec 17, 2014.
Open source machine learning software makes it easier to implement machine learning solutions on single computers and at scale, and the diversity of packages provide more options for implementers.
- Top KDnuggets tweets, Dec 8-9: On the effects Analytics bring to enterprises; Use IBM #WatsonAnalytics to Crunch Data For Free - Dec 10, 2014.
On the effects Analytics bring to enterprises; Anyone Can Now Use IBM #WatsonAnalytics to Crunch Data For Free; Economists are NOT nonpartisan - @FiveThirtyEight quantifies their bias; Geoff Hinton AMA: Neural Networks, the Brain, and Machine Learning.
- Top KDnuggets tweets, Sep 3-9: What is Big Data – definitions from thought leaders - Sep 12, 2014.
What Is #BigData? Definitions from 40+ thought leaders; Fewer companies are hiring Data Scientists but #DataScience is still hot; Choosing the right estimator scikit-learn #CheatSheet; How do Twitter Analytics show followers gender, when they dont ask?
- Top KDnuggets tweets, Aug 4-5: Ensemble Methods, a brief history; Data Scientist role shifting - Aug 6, 2014.
Ensemble Methods are the backbone of #MachineLearning - a brief history; Data Scientist role shifting, with companies focusing on Developers; To add #MachineLearning for Python, scikit-learn; for Hadoop: Mahout; Meet Fortune 2014 #BigData All-Stars: data scientists, entrepreneurs, CEOs.
- Top KDnuggets tweets, Jun 6-8: Statistical-learning tutorial w. scikit-learn; Data science vs the hunch - Jun 9, 2014.
A tutorial on statistical learning with with scikit-learn ; Data science vs the hunch: When data contradicts manager gut instinct; Stanford University: Data Analyst ; Data Lakes vs Data Warehouses.
- Top KDnuggets tweets, Apr 16-17 - Apr 19, 2014.
Scikit-Learn: a great python library for machine learning; A map of where nobody lives in the US; Apache Spark, the hot new trend in Big Data ; NYU @aghose on Est. Demand for Mobile Apps - Learn more: NYU Stern MS in Biz Analytics.
- Top KDnuggets tweets, Mar 10-11: Deep Learning overview, free book; Best machine learning interview questions - Mar 12, 2014.
Deep Learning: Methods and Application, free book from Microsoft; Best interview questions to evaluate a machine learning researcher; Good list of Machine Learning Libraries in Python: scikit-learn, pandas, Theano, NLTK.