Skip to main content

Quantization

Also known as: Model Quantization, Weight Quantization

In one sentence

A compression technique that reduces AI model size and memory usage by using lower-precision numbers, making models faster and cheaper to run.

Explain like I'm 12

Like compressing a huge photo file to make it smaller—you lose a tiny bit of quality, but now it loads way faster and takes up less space.

In context

Converting a 70B model from 32-bit to 8-bit or 4-bit precision can reduce size from 140GB to 35GB or less, enabling local deployment on consumer hardware.

See also

Related Guides

Learn more about Quantization in these guides: