To reduce the memory requirement of neural network models, we proposed exponent sharing [1] when weights are stored in a floating point precision. The concept is depicted in the following figure. With the proposed floating point storage method every float becomes a combination of sign, index and mantissa.
Matrix multiplications are at the core of neural networks. This repository contains the code of Generalized Matrix Multiplications (GEMMs) with and without exponent sharing. Any pre-trained model can benefit from exponent sharing in terms of storage requirements. The layerwise exponent sharing in a model saves the memory by at least 9% for no accuracy loss when weights are in IEEE Float32 format. One such example is shared here.
[1] P. Kashikar, S. Sinha and A. K. Verma, "Exploiting Weight Statistics for Compressed Neural Network Implementation on Hardware," 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2021, pp. 1-4, doi: 10.1109/AICAS51828.2021.9458581.