Microsoft, Beihang release MoRA, an efficient LLM fine-tuning technique
MoRA is a new LLM fine-tuning technique that solves some of the problems of other parameter-efficient techniques such as LoRA. …
Join us in returning to NYC on June 5th to collaborate with executive leaders in exploring comprehensive methods for auditing AI models regarding bias, performance, and ethical compliance across diverse organizations. Find out how you can attend here.
Researchers from Microsoft and Beihang University have introduced a new technique for fine-tuning large language models (LLMs) at a fraction of the cost it usually takes.
The new technique, called MoRA, is a parameter-efficient fine-tuning (PEFT) technique that addresses some of the limitations of other popular techniques such as low-rank adaptation (LoRA). MoRA is especially useful when you want to fine-tune the model on tasks that require the model to acquire new knowledge. With PEFT methods becoming increasingly popular in the enterprise, MoRA can become an important addition to the growing toolset of LLM application developers.
The limitations of LoRA
Classic fine-tuning requires updating all the parameters of an LLM. When the model contains billions of parameters, full fine-tuning can become costly and slow. Parameter-efficient fine-tuning techniques are based on the premise that when fine-tuning LLMs for downstream applications, you do not need to update all the parameters. PEFT methods find the optimal subset of parameters that need to be modified to configure the model for the target task.
LoRA has gained popularity as a PEFT technique due to its ability to update parameters via low-rank matrices, which map the full-rank weight matrix to a very small subspace. LoRA significantly reduces memory requirements and facilitates the storage and deployment of fine-tuned models.
However, while LoRA performs well on tasks such as text classification and instruction tuning, it struggles with more complex tasks that require enhancing the knowledge and capabilities of LLMs, such as mathematical reasoning and continual pre-training. Several studies have found that LoRA’s low-rank updating mechanism may limit the ability of large language models to effectively learn and memorize new knowledge.
Since the rank of the LoRA adapter is significantly smaller than the full rank of the model, “this limitation restricts capacity to store new information via fine-tuning,” the researchers write.
MoRA
To address the limitations of LoRA, the researchers introduce MoRA, a PEFT technique that uses a square matrix instead of low-rank matrices. The main idea behind MoRA is to use trainable parameters in a way that achieves the highest possible rank in the space of the model’s original dimensions.
Unlike LoRA, the input and output dimensions of the MoRA adapter do not match those of the original model, which makes it impossible to combine them in the same matrix multiplication operation. To bridge this gap, the researchers developed a compression/decompression function that transforms inputs between the two spaces. This algorithm allows MoRA to be easily plugged into LLMs of different sizes.
The square weight matrix gives MoRA a stronger capacity to learn new knowledge than a LoRA model of the same size, according to the researchers.
MoRA in action
The researchers compared equally sized LoRA and MoRA models on various tasks and settings. On memorization tasks, MoRA significantly outperformed LoRA and came much closer to the performance of a fully fine-tuned model with fewer parameters and training steps.
“Our method shows significant improvements over LoRA with the same number of trainable parameters, benefiting from high-rank updating,” the researchers write.
In instruction tuning and mathematical reasoning tasks, MoRA showed performance that is almost on par with LoRA. However, for continual pretraining in biomedical and financial domains, MoRA outperformed LoRA, benefiting from its high-rank updating to memorize new knowledge.
The researchers also found that increasing the rank of the MoRA adapter can eliminate the performance gap between PEFT and full fine-tuning in mathematical reasoning tasks, though it comes at higher training and storage costs.
PEFT for the enterprise
Fine-tuning is an important use case for enterprise LLM applications. In addition to increasing the capabilities and accuracy of LLMs on proprietary knowledge, fine-tuning can enable companies to use smaller models for tasks that previously required expensive frontier models.
Currently, LoRA and its variants are the gold standards for parameter-efficient fine-tuning. There is a rich ecosystem of tools and platforms for creating LoRA adapters. For example, S-LoRA is a framework that enables developers to run thousands of LoRA adapters on a single GPU, unlocking applications that require many fine-tuned LLMs, such as models that are customized based on the content of each user.
The researchers at Microsoft and Beihang have released an open-source implementation of MoRA, which is compatible with LoRA. This can turn out to be an important tool for enterprise applications that want to add new knowledge to base models.