Difference between distributed learning versus federated learning algorithms
Want to know the difference between distributed and federated learning? Read this article to find out.
By Aishwarya Srinivasan, AI & ML Innovation Leader
Image by Bufflk on Pixabay
Distributed machine learning algorithm is a multi-nodal system that builds training models by independent training on different nodes. Having a distributed training system accelerates training on huge amounts of data. When working with big data, training time exponentially increases which makes scalability and online re-training.
For example, let’s say we want to build a recommendation model, and based on the user interaction everyday, we wish to re-train the models. We could see the user-interaction as high as hundreds of clicks per user and millions of users. In this case there is an enormous amount of data that needs to be used to train the recommendation model on a daily basis. This is where distributed learning systems can be built and training can be parallelized to optimize time.
Compared to distributed learning, federated learning algorithms are fundamentally different and are primarily for addressing data privacy. In a traditional data science pipeline, the data is collected to a single server and used to build and train a centralized model. In effect, federated learning is having a centralized model using decentralized model training. In federated learning systems, a seed parameter set is sent to independent nodes containing data and the models are trained on the local nodes using data stored in these respective nodes. Once the model is trained independently, each of these updated model weights are sent back to the central server where they are combined to create a highly efficient model. Using global data training improves the model efficiency by a big factor. This also ensures that the data in each node adheres to data privacy policies and protects any data leak/breach.
As an example we can look at an autocomplete model. The user data on phone/tablet/laptop can be confidential and the users would not want their data to be centrally stored anywhere. In this case, FL can be used to build a central autocomplete model which is effectively being trained (by combining independent trained model weights) on millions of users on their devices.
Wrapping up, we can say that distributed learning is about having centralized data but distributing the model training to different nodes, while federated learning is about having decentralized data and training and in effect having a central model.
Bio: Aishwarya Srinivasan: AI & ML Innovation Leader | LinkedIn Top Voice 2020 | 250k+ Followers | Unicorn in Data Science | Speaker
- Federated Learning: Google’s Take
- Getting Started with Distributed Machine Learning with PyTorch and Ray
- Introduction to Federated Learning