Introducing MPT-7B: A New Open-Source LLM

An LLM Trained on 1T Tokens of Text and Code by MosaicML Foundation Series.

Introducing MPT-7B: A New Open-Source LLM
Image by Author


The Large language models (LLM) are going crazy at the moment. However, as an organization, if you do not have the right resources, it can be challenging to jump on the large language model wave. Training and deploying large language models can be difficult, and you suddenly feel left out. Open-source LLMs, such as the LLaMA series from Meta have allowed for LLM resources to be available. 

And to add to the open-source collection is MosaicML Foundations' latest addition to their series - MPT-7B.


What is MPT-7B?


MPT stands for MosaicML Pretrained Transformer. MPT models are GPT-style decoder-only transformers that come with many improvements: 

  • Performance-optimized layer implementations
  • Greater training stability due to architecture changes
  • No context length limitations

MPT-7B is a transformer model that has been trained from scratch using 1T tokens of text and code. Yes, 1 TRILLION! It was trained on the MosaicML platform, with a time frame of 9.5 days with zero human intervention. Costing MosaicML ~$200k.

It is open-source, making it available for commercial use and the tool will be a game changer on how businesses and organizations work with their predictive analytics and decision-making process. 

The main features of MPT-7B are:

  • Licensed for commercial use 
  • Trained on a large amount of data (1T tokens)
  • Can handle extremely long inputs
  • Optimized for fast training and inference
  • Highly efficient open-source training code.

MPT-7B is the base model and has been shown to outperform other open-source 7B - 20B models. The quality of MPT-7B matches LLaMA-7B. To evaluate the quality of MPT-7B, MosaicML Foundation put together 11 open-source benchmarks and evaluated them using the industry-standard manner.

Introducing MPT-7B: A New Open-Source LLM
Image by MosaicML Foundation


MosaicML foundations are also releasing three additional fine-tuned models:

  1. MPT-7B-Instruct
  2. MPT-7B-Chat
  3. MPT-7B-StoryWriter-65k+




The MPT-7B-Instruct model is for short-form instruction following. With 26,834 dated the 14th of May, MPT-7B-Instruct allows you to ask quick and short questions and provides you with an instant response. Have a question, and you just want a simple answer - use MPT-7B-Instruct.

Why is this so great? Typically LLMs are taught to continue generating text based on the input that was provided. However, some are looking for LLMs that treat their input as an instruction. Instruction finetuning allows LLMs to perform instruction-following outputs. 




Yes, we have another chatbot. MPT-7B-Chat generates dialogue. For example, if you want the chatbot to generate a speech, giving it context it will generate a text in a conversational manner. Or maybe you want to write a tweet which paraphrases a paragraph from an article, it can generate the dialogue for you!

Why is this so great? MPT-7B Chat is ready and well-equipped for a variety of conversational tasks, delivering more seamless, engaging multi-turn interactions for users.




This is for the story writers! For those who want to write stories that have a long context, MPT-7B-StoryWriter-65k+ is a model designed for exactly that. The model was built by fine-tuning MPT-7B with a context length of 65k tokens, and it can extrapolate beyond 65k tokens. MosaicML Foundation has been able to generate 84k tokens on a single node of A100-80GB GPUs. 

Why is this so great? This is because most open-source LLMs can only handle sequences with up to a few thousand tokens. But just by using a single node of 8xA100-80GB on the MosaicML platform, you can finetune MPT-7B to handle context lengths up to 65k! 


More on How MPT-7B was Built


The MosaicML team built these models in only a few weeks. In only a few weeks they dealt with the data preparation, training, finetuning, and deployment. 

The data was sourced from a variety of sources, which all had a billion tokens available in each source. The number of effective tokens still got a billion in each source! The team used EleutherAI’s, GPT-NeoX, and 20B tokenizer, allowing them to train on a diverse mix of data, apply consistent space delimitation, and more. 

All the MPT-7B models were trained on the MosaicML platform, using A100-40GB and A100-80GB GPUs from Oracle Cloud. 

If you would like to know more about the tools and costs of MPT-7B, have a read of the: MPT-7B Blog.


Wrapping it up


The MosaicML platform can be considered as the best starting point for organisations, if it be private, commercial or community related to build custom LLMs. Having this open-source resource available will allow organisations to feel freer about using these tools to improve the current organisational challenges. 

Customers are able to train LLMs on any computing provider, or data source, whilst being able to maintain efficiency, privacy and cost transparency.

What do you think you will be using MPT-7B for? Let us know in the comments below
Nisha Arya is a Data Scientist, Freelance Technical Writer and Community Manager at KDnuggets. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.