How to speed up a Deep Learning Language model by almost 50X at half the cost
In this blog post, we show how to accelerate fine-tuning the ALBERT language model while also reducing costs by using Determined’s built-in support for distributed training with AWS spot instances.
One of the big headaches in deep learning is that models take forever to train. As an ML engineer, waiting hours or days for training to complete makes iteratively improving your model a slow and frustrating process. You can speed up model training by using more GPUs, but this raises two challenges:
- Distributed training is a hassle because it requires changing your model code and dealing with DevOps headaches like server management, cluster scheduling, networking, etc.
- Using many GPUs at once can quickly cause your training costs to skyrocket, especially when using on-demand cloud GPUs.
In this blog post, we show how to accelerate fine-tuning the ALBERT language model while also reducing costs by using Determined’s built-in support for distributed training with AWS spot instances. Originally, ALBERT took over 36 hours to train on a single V100 GPU and cost $112 on AWS. With distributed training and spot instances, training the model using 64 V100 GPUs took only 48 minutes and cost only $47! That’s both a 46x performance improvement and a 58% reduction in cost!
Best of all, realizing these performance gains and cost reductions required nothing more than changing a few configuration settings. As we detail in the blog post, switching to distributed training and leveraging spot instances in Determined can be done without changing your model code, without needing to understand the details of using spot instances, and with no manual server wrangling required.
In the full article, we show you how we fine-tuned ALBERT on the SQuAD 2.0 dataset (using the Huggingface implementation), and how to save money by training with Determined using Spot Instances. You can read the full article “ALBERT on Determined: Distributed Training with Spot Instances” on our blog, and see the experiment in the Determined repository here.