Quantization is the process of reducing the numerical precision of a model’s weights and activations (e.g., converting from 32-bit floats to 8-bit integers) to decrease memory footprint and improve inference speed with minimal impact on model accuracy.

Sources: