Skip to main content

Quantization

Reducing the precision of model weights (e.g., from 32-bit to 8-bit) to decrease model size and increase inference speed with minimal accuracy loss.

Related Terms

Explore More Terms

Browse Full Glossary