229. Quantization using Pytorch

Quantization

Quantization is a technique to change the data type used to compute neural networks for faster inference.

After you’ve deployed your model, there is no need to backpropagate(which is sensitive to precision). This means, if a slight decrease in precision is acceptable, we can lower the precision to compute the neural network for faster inference by applying quantization.

3 Quantization Methods supported by Pytorch

Dynamic Quantization
Apply quantization at run time. Only layers belonging to the set of types we pass to the function are quantized.
Post-Training Quantization
Apply quantization after training and before run time. Automatically applies quantization to all layers.
Quantization-Aware Training
Apply quantization at train time. Most tedious, but has the most accuracy.

Reference: Document

Quantization

3 Quantization Methods supported by Pytorch

Related Posts

391. Graph Execution VS Eager Execution

365. Applying Data Augmentation Using Albumentation

360. Saving Checkpoint During Training