Low-rank adaptation (LoRA) is among the most widely used and effective techniques for efficiently training custom LLMs. For those interested in open-source LLMs, it's an essential technique worth familiarizing oneself with.
Last month, I shared an article with several LoRA experiments, based on the open-source Lit-GPT repository that I co-maintain with my colleagues at Lightning AI. This Ahead of AI article aims to discuss the primary lessons I derived from my experiments. Additionally, I'll address some of the frequently asked questions related to the topic. If you are interested in finetuning custom LLMs, I hope these insights will save you some time in "the long run" (no pun intended).
Despite the inherent randomness of LLM training (or when training models on GPUs in general), the outcomes remain remarkably consistent across multiple runs.
QLoRA presents a trade-off that might be worthwhile if you're constrained by GPU memory. It offers 33% memory savings at the cost of a 33% increase in runtime.