AI Features

Quantized Low-Rank Adaptation (QLoRA)

Learn about the components and working of the Quantized Low-Rank Adaptation (QLoRA) technique.

Quantized Low-Rank Adaptation (QLoRA), as the name suggests, combines the two most widely used methods of fine-tuning, i.e., LoRA and quantization. Where LoRA uses the low-rank matrices to reduce the number of trainable parameters, QLoRA extends it by further reducing the model size by quantizing its weights.

Overview of a single layer in QLoRA
Overview of a single layer in QLoRA

Components of QLoRA

The following are the three main components of QLoRA:

  • 4-bit NormalFloat quantization

  • Double quantization

  • Paged optimizers

Let’s dive into the details of each component

4-bit NormalFloat quantization

The NormalFloat (NF) data type is a theoretically optimal data type that uses ...

Ask