- Managing Machine Learning Cycles: Five Learnings from comparing Data Science Experimentation/ Collaboration Tools - Jan 29, 2020.
Machine learning projects require handling different versions of data, source code, hyperparameters, and environment configuration. Numerous tools are on the market for managing this variety, and this review features important lessons learned from an ongoing evaluation of the current landscape.
- Live Webinar: Learn how to build better machine learning pipelines - Jan 6, 2020.
In this webinar, Jan 15 @ 12PM EST, we'll offer solutions to the common challenges data scientists and data engineers face when building a machine learning pipeline. Register now to attend live or to watch a recording afterwards.
- Build Pipelines with Pandas Using pdpipe - Dec 13, 2019.
We show how to build intuitive and useful pipelines with Pandas DataFrame using a wonderful little library called pdpipe.
- Spark NLP 101: LightPipeline - Nov 27, 2019.
A Pipeline is specified as a sequence of stages, and each stage is either a Transformer or an Estimator. These stages are run in order, and the input DataFrame is transformed as it passes through each stage. Now let’s see how this can be done in Spark NLP using Annotators and Transformers.
- Automated Machine Learning Project Implementation Complexities - Nov 22, 2019.
To demonstrate the implementation complexity differences along the AutoML highway, let's have a look at how 3 specific software projects approach the implementation of just such an AutoML "solution," namely Keras Tuner, AutoKeras, and automl-gs.
- Testing Your Machine Learning Pipelines - Nov 14, 2019.
Let’s take a look at traditional testing methodologies and how we can apply these to our data/ML pipelines.
- 5 Step Guide to Scalable Deep Learning Pipelines with d6tflow - Sep 16, 2019.
How to turn a typical pytorch script into a scalable d6tflow DAG for faster research & development.
- Data Pipelines, Luigi, Airflow: Everything you need to know - Mar 27, 2019.
This post focuses on the workflow management system (WMS) Airflow: what it is, what can you do with it, and how it differs from Luigi.
- Webinar: The Value-Based Return on Creating a High-Quality Data Pipeline,
Sep 12 - Sep 7, 2018.
Learn why data quality and data integration are key to delivering meaningful, actionable results, and how to develop data and analytics strategies that offer visibility into healthcare cost and quality.
- Manage your Machine Learning Lifecycle with MLflow – Part 1 - Jul 5, 2018.
Reproducibility, good management and tracking experiments is necessary for making easy to test other’s work and analysis. In this first part we will start learning with simple examples how to record and query experiments, packaging Machine Learning models so they can be reproducible and ran on any platform using MLflow.
- KDnuggets™ News 18:n22, Jun 6: 10 More Free Must-Read Books for Machine Learning and Data Science; Beginner Guide to Data Science Pipeline - Jun 6, 2018.
Summer. Time to sit back and unwind. Or get your hands on some free machine learning and data science books and learn! Here is a great selection to get started.
- A Beginner’s Guide to the Data Science Pipeline - May 29, 2018.
On one end was a pipe with an entrance and at the other end an exit. The pipe was also labeled with five distinct letters: "O.S.E.M.N."
- Deep Learning With Apache Spark: Part 1 - Apr 18, 2018.
First part on a full discussion on how to do Distributed Deep Learning with Apache Spark. This part: What is Spark, basics on Spark+DL and a little more.
- A Beginner’s Guide to Data Engineering – Part II - Mar 15, 2018.
In this post, I share more technical details on how to build good data pipelines and highlight ETL best practices. Primarily, I will use Python, Airflow, and SQL for our discussion.
Pages: 1 2
- Using AutoML to Generate Machine Learning Pipelines with TPOT - Jan 29, 2018.
This post will take a different approach to constructing pipelines. Certainly the title gives away this difference: instead of hand-crafting pipelines and hyperparameter optimization, and performing model selection ourselves, we will instead automate these processes.
- A Beginner’s Guide to Data Engineering – Part I - Jan 25, 2018.
Data Engineering: The Close Cousin of Data Science.
Pages: 1 2
- Managing Machine Learning Workflows with Scikit-learn Pipelines Part 3: Multiple Models, Pipelines, and Grid Searches - Jan 24, 2018.
In this post, we will be using grid search to optimize models built from a number of different types estimators, which we will then compare and properly evaluate the best hyperparameters that each model has to offer.
- KDnuggets™ News 18:n04, Jan 24: TensorFlow vs XGBoost; Machine Learning Pipelines in Python; Semi-Supervised Machine Learning - Jan 24, 2018.
Gradient Boosting in TensorFlow vs XGBoost; Managing Machine Learning Workflows with Scikit-learn Pipelines Part 2; Using Genetic Algorithm for Optimizing Recurrent Neural Networks; The Value of Semi-Supervised Machine Learning; Comparing Machine Learning as a Service: Amazon, Microsoft Azure, Google Cloud AI
- Managing Machine Learning Workflows with Scikit-learn Pipelines Part 2: Integrating Grid Search - Jan 19, 2018.
Another simple yet powerful technique we can pair with pipelines to improve performance is grid search, which attempts to optimize model hyperparameter combinations.
- Managing Machine Learning Workflows with Scikit-learn Pipelines Part 1: A Gentle Introduction - Dec 7, 2017.
Scikit-learn's Pipeline class is designed as a manageable way to apply a series of data transformations followed by the application of an estimator.
- How to Build a Data Science Pipeline - Jul 14, 2017.
Start with y. Concentrate on formalizing the predictive problem, building the workflow, and turning it into production rather than optimizing your predictive model. Once the former is done, the latter is easy.