From PyTorch to Mosaic: An Overview and First Impressions

Table of contents
Partner with
Aimpoint Digital
Meet an Expert
Figure 1 Hallelujah - we can finally simplify our LLM training loop with Mosaic Composer

Our Aimpoint Digital Labs team has been training large language models from scratch for a bit now and has learned a few lessons the hard way. Scaling models to large sizes can become almost more of an engineering challenge than an AI one. Our codebase was originally built by combining several repositories, including (not limited to) the excellent OLMo codebase. Turns out, the OLMo models were trained with help from the Mosaic ML team, who has also released some nifty code packages of their own. In our constant search to improve our training efficiency and ease of use, we recently had the opportunity to try out some of these Mosaic offerings. When initially going about integrating packages like Composer into our PyTorch codebase, we had questions such as how Composer compares to other popular frameworks like PyTorch Lightning, what integrations look like with frameworks like DeepSpeed and FSDP/DDP, and what some pros and cons are to using the Mosaic stack. Additionally, we did not know how much effort would be required to convert a custom transformer codebase from PyTorch into Composer and ultimately submit training runs using MCLI.

Over a series of three blogs, we hope to provide a technical reference regarding how some of these popular frameworks compare against each other, how to migrate a PyTorch codebase into Composer, and why we would want to do that in the first place. It assumes a working knowledge of AI, and good familiarity with Python code frameworks like PyTorch. We will also cover some technical concepts related to Transformers specifically. If you need a quick primer, we recommend the PyTorch documentation as a guide. This first blog will cover an overview of Composer. Be sure to check out our follow up blogs where we dive deeper into unlocking Composer and migrating.

What is Composer?

Figure 2 Not this kind of composer

Mosaic composer is an open-source framework, freely available, that helps to augment training advanced AI systems through ease of integration with both prebuilt and custom algorithms. 

Composer can help speedup and simplify workflows training models such as transformers, diffusion models, CNNs, and more. We will get into more specific details later, but some of the main benefits of using Composer to train your models are:

  • Custom speedup algorithms
    • Composer features a variety of specially developed algorithms that can be easily inserted into a training workflow using callbacks
  • OOM prevention
    • Automatic right-sizing of micro batch sizes based on available GPU vRAM to help prevent OOM errors 
  • Checkpoint resumption
    • Automatically resume from your most recent checkpoint after any training failure just by rerunning your training loop
  • Easy logging
    • Stream logging and checkpointing to cloud and other easily accessible sources
  • Elastic Checkpoint sharding
    • Resume training on varying size GPU clusters without heavy config file edits
  • Direct integration with cloud data storage
    • Stream data from cheap cloud storage locations using MDS data format 

You can get up and running with Composer by following the helpful QuickStart guides in the documentation. In the next section we will get into more specific details on some of the above-mentioned features and how to integrate them into your codebase. 

What are some key features of Composer and Mosaic MCLI?

Before going into more specific details around the Composer package, let’s first review the MCLI terminal integration which we used during some recent model runs and which easily integrates with the open-source Composer package. 

MCLI

MCLI is the command line interface that can be used to submit pretraining and finetuning model runs to Mosaic’s cloud compute. After getting a compute cluster attached, you can get your account setup and log into the simple to use GUI:

Figure 3 Mosaic ML GUI

The GUI allows you to see compute clusters attached to your organization for advanced training runs. Note that Mosaic’s recent partnership with Databricks has led to migration of some of the previously available features (inference and finetuning) to be integrated more directly into the Databricks platform. 

You can set up MCLI as easily as:

We will go step by step through a complete run submission with code examples in a subsequent blog but recommend reviewing the documentation for MCLI here. MCLI provides an environment in which you don’t have to individually manage compute and can focus more on AI engineering rather than setting up environments. You can easily submit a training run using a YAML file:

And then:

You’ll see run progress in your terminal (note the screenshot is for an actual training run, a “hello world” echo typically requires less compute :D):

Next – we'll dive into a comprehensive guide for unlocking the core benefits of Mosaic Composure.

Author
Aaron McClendon
Aaron McClendon
Head of AI
Read Bio

Let’s talk data.
We’ll bring the solutions.

Whether you need advanced AI solutions, strategic data expertise, or tailored insights, our team is here to help.

Meet an Expert