Pruning
“Pruning” means sparsing the network for faster inference. Most of the weights inside networks are quite useless, so this can help when you have limited resources such as running inference on the edge.
Methods
There are mainly 2 methods to prune a model.
-
Unstructured Pruning
This method just simply removes all the unnecessary weights. All neurons will remain, which means some neurons might be fully connected while others are sparsely connected. -
Structured Pruning
This method removes neurons that are connected with unnecessary weights so that all remaining neurons would be fully connected.