How to Implement a Federated Learning Project with Healthcare Data

Learn about Federated Learning and how you can use it in the healthcare sector.

How to Implement a Federated Learning Project with Healthcare Data
Photo by Nataliya Vaitkevich


Federated Learning (FL) is a machine learning approach that allows for the training of a model across multiple decentralized devices or institutions, without the need to centralize the data on a single server. It has been used across several industries, from mobile device keyboards to autonomous vehicles to oil rigs. It is particularly useful in the healthcare industry, where sensitive patient data is involved and strict regulations need to be followed to protect the privacy of individuals. In this blog post, we will discuss some practical steps to implementing a federated learning project with healthcare data.

First, it is important to understand the requirements and constraints of your project. This includes understanding the type of data you will be working with and the regulations that must be followed to protect the privacy of individuals. It may also be necessary to secure the necessary approvals and permissions to use the data for your project, e.g. Institutional Review Board (IRB) approvals.

Next, you will need to prepare your data. This involves extracting data from different clinical systems, harmonizing data across different sites (since the data may be encoded differently, have different formats, and have different distributions at each site), annotating the data (which sometimes requires a physician to review the data and annotate it), and dividing the data into partitions for training, testing, and validation. It is important to ensure that the data is properly balanced and representative of the overall population to ensure accurate results.

Once your data is prepared, you will need to choose a federated learning framework to use. There are several options available, including NVIDIA FLARE, TensorFlow Federated, PySyft, OpenFL, and Flower. Each of these frameworks has its own set of features and capabilities, so it is important to choose the one that best meets the needs of your project. We’ve found that NVIDIA FLARE provides a robust framework that can work with any underlying ML framework (PyTorch, TensorFlow, sklearn, etc.).

Next, you will need to set up the infrastructure for your federated learning project. This involves choosing a cloud server on which to host the resulting model and orchestrate the FL process, and setting up servers at each participating site, installing the required software, making your local dataset accessible to that server, and ensuring that the server can communicate with your cloud server. Depending on the FL framework that you selected, you may also need to set up a secure communication channel between the local servers at each site and your cloud server to ensure the privacy and security of the data.

Once the infrastructure is in place, you can begin the training process. This involves providing your model architecture to the cloud server, which will orchestrate the FL training - sending the model to the participating devices or institutions, where the local data will be used to train a local model. The local models are then sent back to the server, where they are aggregated and used to update the global model. This process is repeated until the global model has converged to an acceptable level of accuracy.

Finally, it is important to evaluate the performance of the model and ensure that it is meeting the requirements of your project. This involves testing the model on a separate set of data or using it to make predictions on real-world data. In many cases this also involves iterating on the model architecture, underlying datasets and/or preprocessing in order to optimize the model performance.

These steps may seem complex, but luckily there are FL platforms like Rhino Health that make this entire process simple and seamless. Robust end-to-end FL platforms will take care of infrastructure provisioning, provide strong security capabilities, and support all steps of a federated project from data pre-processing through model training and results analysis, with maximum flexibility - allowing data scientists to use their data analysis/processing tools and ML/FL frameworks of choice. They make federated projects much more similar to projects using centralized data.

The future of healthcare innovation relies on being able to access large amounts of data for analysis and model training. Federated learning is a powerful tool for accessing data without risking data privacy, making it a promising way to improve patient care and advance the field of healthcare. By following these steps and taking the necessary precautions to protect patient privacy, you can successfully implement a federated learning project and make a positive impact in the healthcare industry.
Yuval Baror is the CTO and a co-founder of Rhino Health. He has nearly 20 years of experience in software engineering, management, and startups (including founding a startup that was successfully acquired). Over the past decade he’s worked on building AI based production systems at 3 different companies. I enjoy the deep challenges of Artificial Intelligence, the excitement of building production systems that drive substantial impact for customers, and the unique cross-section of making AI work in real world systems.