Model Compression | Notion

Resoureces:

https://medium.com/gsi-technology/an-overview-of-model-compression-techniques-for-deep-learning-in-space-3fd8d4ce84e5
https://arxiv.org/abs/1710.09282
https://towardsdatascience.com/machine-learning-models-compression-and-quantization-simplified-a302ddf326f2#:~:text=Model compression can be divided,weights with small absolute value).

Pruning

Pruning by comparing weights' magnitudes to a threshold value.
Types:
- unstructured pruning - removing individual weights(connections)/neurons
- Structured pruning - remove entire channels or filters
Others:

unstructured pruning

https://arxiv.org/pdf/1506.02626.pdf

set zero weights in a weight matrix → increase sparsity in architecture

structured pruning

Quantization

Low-rank factorization

Knowledge distillation

Transferring knowledge from a large trained model (ensemble of models) to a smaller model
training & inference → different tasks