Building Massively Scalable Machine Learning Pipelines with Microsoft Synapse ML

The new platform provides a single API to abstract dozens of ML frameworks and databases.

Image Credit: Microsoft Research


I recently started a new newsletter focus on AI education and already has over 50,000 subscribers. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:



Building large scale ML solutions is nothing short of a nightmare. Even if you have the perfect architecture, highly scalable ML pipelines typically require combining many infrastructure platforms and frameworks that are not precisely designed for seamless integration. The process of orchestrating different ML tools results challenging even for the most experienced ML developers. Microsoft Research just open sourced a new framework designed to address this challenge.

SynapseML is the new version of MMLSpark, an open source library designed from the ground up to implement massively scalable ML pipelines. Functionally, SynapseML extends the capabilities of Apache Spark to better support requirements of massively scalable ML solutions. The platform uses a distributed programming model to distribute a given ML workload across thousands of machines while ensuring that the GPUs-CPUs are utilized at full capacity. More importantly, SynapseML manages to achieve that using a single API that underneath can rely on frameworks like LightGBM or XGBoost.

Image Credit: Microsoft Research


SynapseML API provides a data, platform and language agnostic model to interact with ML frameworks. This allows ML engineers to rapidly orchestrate different ML tools and frameworks without sacrificing the developer experience. Another important aspect of the API is that abstracts the underlying file and database interactions.

Adding a nice distribution touch to the SynapseML release, Microsoft added the framework to the Azure Synapse Analytics platform. This ensures that the platform is available as a native Azure service with the corresponding enterprise support. SynapseML is one of the most interesting efforts to prevent the increasing fragmentation in the ML tools and frameworks market. Its going to be interesting to track how this platform is received and adopted by the ML community.

Bio: Jesus Rodriguez is currently a CTO at Intotheblock. He is a technology expert, executive investor and startup advisor. Jesus founded Tellago, an award winning software development firm focused helping companies become great software organizations by leveraging new enterprise software trends.

Original. Reposted with permission.