- Difference between distributed learning versus federated learning algorithms - Nov 19, 2021.
Want to know the difference between distributed and federated learning? Read this article to find out.
Algorithms, Distributed Systems, Federated Learning
- How to Speed Up Pandas with Modin - Mar 10, 2021.
The Modin library has the ability to scale your pandas workflows by changing one line of code and integration with the Python ecosystem and Ray clusters. This tutorial goes over how to get started with Modin and how it can speed up your pandas workflows.
Data Science, Distributed Systems, Modin, Pandas, Python, Workflow
- Getting Started with Distributed Machine Learning with PyTorch and Ray - Mar 3, 2021.
Ray is a popular framework for distributed Python that can be paired with PyTorch to rapidly scale machine learning applications.
Distributed Systems, Machine Learning, Python, PyTorch
- How to Speed up Scikit-Learn Model Training - Feb 11, 2021.
Scikit-Learn is an easy to use a Python library for machine learning. However, sometimes scikit-learn models can take a long time to train. The question becomes, how do you create the best scikit-learn model in the least amount of time?
Distributed Systems, Hyperparameter, Machine Learning, Optimization, Parallelism, Python, scikit-learn, Training
- Train sklearn 100x Faster - Sep 11, 2019.
As compute gets cheaper and time to market for machine learning solutions becomes more critical, we’ve explored options for speeding up model training. One of those solutions is to combine elements from Spark and scikit-learn into our own hybrid solution.
Distributed Systems, Machine Learning, Python, scikit-learn, Training
- Distributed Artificial Intelligence: A primer on Multi-Agent Systems, Agent-Based Modeling, and Swarm Intelligence - Apr 18, 2019.
Distributed Artificial Intelligence (DAI) is a class of technologies and methods that span from swarm intelligence to multi-agent technologies. It is one of the subsets of AI where simulation has greater importance that point-prediction.
AI, Distributed Systems, Modeling, Swarm Intelligence
- Introduction to Apache Spark - Jul 6, 2018.
This is the first blog in this series to analyze Big Data using Spark. It provides an introduction to Spark and its ecosystem.
Apache Spark, Data Processing, Distributed Systems
- Ranking Popular Distributed Computing Packages for Data Science - Mar 20, 2018.
We examined 140 frameworks and distributed programing packages and came up with a list of top 20 distributed computing packages useful for Data Science, based on a combination of Github, Stack Overflow, and Google results.
Apache Spark, Data Science, Distributed Systems, GitHub, Hadoop
- Introducing Dask-SearchCV: Distributed hyperparameter optimization with Scikit-Learn - May 12, 2017.
We introduce a new library for doing distributed hyperparameter optimization with Scikit-Learn estimators. We compare it to the existing Scikit-Learn implementations, and discuss when it may be useful compared to other approaches.
Dask, Distributed Computing, Distributed Systems, Machine Learning, Optimization, scikit-learn
- 5 Machine Learning Projects You Can No Longer Overlook, May - May 10, 2017.
In this month's installment of Machine Learning Projects You Can No Longer Overlook, we find some data preparation and exploration tools, a (the?) reinforcement learning "framework," a new automated machine learning library, and yet another distributed deep learning library.
Automated Machine Learning, Data Exploration, Deep Learning, Distributed Systems, Machine Learning, Overlook, Pandas, Reinforcement Learning
- Dask and Pandas and XGBoost: Playing nicely between distributed systems - Apr 27, 2017.
This blogpost gives a quick example using Dask.dataframe to do distributed Pandas data wrangling, then using a new dask-xgboost package to setup an XGBoost cluster inside the Dask cluster and perform the handoff.
Dask, Distributed Systems, Pandas, Python, XGBoost
- XGBoost: Implementing the Winningest Kaggle Algorithm in Spark and Flink - Mar 24, 2016.
An overview of XGBoost4J, a JVM-based implementation of XGBoost, one of the most successful recent machine learning algorithms in Kaggle competitions, with distributed support for Spark and Flink.
Apache Spark, Distributed Systems, Flink, Kaggle, XGBoost
- Top Spark Ecosystem Projects - Mar 2, 2016.
Apache Spark has developed a rich ecosystem, including both official and third party tools. We have a look at 5 third party projects which complement Spark in 5 different ways.
Apache Mesos, Apache Spark, Cassandra, Databricks, Distributed Systems
- Distributed TensorFlow Has Arrived - Mar 1, 2016.
Google has open sourced its distributed version of TensorFlow. Get the info on it here, and catch up on some other TensorFlow news at the same time.
Deep Learning, Distributed Systems, Google, Matthew Mayo, TensorFlow
- Deep Learning with Spark and TensorFlow - Jan 28, 2016.
The integration of TensorFlow with Spark leverages the distributed framework for hyperparameter tuning and model deployment at scale. Both time savings and improved error rates are demonstrated.
Apache Spark, Deep Learning, Distributed Systems, TensorFlow
- Spark + Deep Learning: Distributed Deep Neural Network Training with SparkNet - Dec 4, 2015.
Training deep neural nets can take precious time and resources. By leveraging an existing distributed batch processing framework, SparkNet can train neural nets quickly and efficiently.
Pages: 1 2
Apache Spark, Caffe, Deep Learning, Distributed Systems, H2O, Matthew Mayo, Neural Networks