The Practical Guide to LoRA Fine-tuning Parameters
The Practical Guide to LoRA Fine-tuning Parameters
When fine-tuning large language models, Low-Rank Adaptation (LoRA) is the go-to algorithm. It reduces the number of trainable parameters by inserting low-rank matrices into the attention layers, saving massive amounts of GPU memory.
However, configuring LoRA requires setting three critical parameters: 1. **Rank (r)** 2. **Alpha (scaling factor)** 3. **Target Modules**
In this post, we explain how to choose these settings to achieve state-of-the-art weights on your consumer hardware.
1. What is LoRA Rank (r)?
The rank parameter determines the width of the adapter matrices. A higher rank means more parameters are updated, allowing the model to learn more complex relationships. However, higher ranks also require more VRAM and increase the risk of overfitting.
- Typical values:** 8, 16, 32, 64.
- Our recommendation:** Use a rank of **16** for general domain training (like adapting a model to medical or legal text). Use **8** for style mimicking or simple classification tasks. Higher ranks (64+) are rarely needed unless you are teaching the model entirely new conceptual frameworks.
2. Choosing LoRA Alpha
LoRA Alpha is a scaling factor for the adapter weights. It acts like a learning rate multiplier for the low-rank updates.
- Standard practice:** Set **alpha = 2 * r** (e.g., if rank is 16, set alpha to 32).
- Why?** This maintains a stable scaling factor as you adjust the rank, preventing the update gradients from exploding or vanishing.
3. Target Modules
Target modules specify which weight matrices in the transformer block should receive adapter layers. In early implementations, adapters were only applied to query (`q_proj`) and value (`v_proj`) projections.
Modern research shows that applying adapters to all linear projection layers yields significantly higher performance: * `q_proj`, `k_proj`, `v_proj`, `o_proj` (attention layers) * `gate_proj`, `up_proj`, `down_proj` (feed-forward layers)
Although this increases parameter count slightly, it makes the fine-tuning process much more stable.
About the author
Dr. Amara Osei is a verified AI trainer on our platform. To schedule a 1-on-1 model training session with them, visit their profile in our directory.