Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Lower weight compression memory footprint by sorting weights accordin…
…g to their size (#2803) ### Changes Sort weights for compression: ``` all_weight_params = sorted(all_weight_params, key=lambda wp: wp.num_weights, reverse=True) ``` ### Reason for changes During weights compression, memory footprint gradually increases when new low-bit constants are created. At the same time there are temporary spikes in memory footprint which happen during compressed weight computation. For example, here: ``` if invert_scale: scale = fns.power(scale, -1) compressed_weights = weight * scale else: compressed_weights = weight / scale if zero_point is not None: compressed_weights += zero_point.astype(weight.dtype) compressed_weights = fns.round(compressed_weights) compressed_weights = fns.clip(compressed_weights, level_low, level_high).astype(dtype) ``` Multiple temporary full precision arrays are needed to be created here. After that they get garbage-collected. However, as it was said, this creates temporary spikes in memory footprint. Taking this into account, it makes sense to compress large constants first so that there are not many low-bit constants taking up memory yet. This mostly is affected by embedding matrices. Please see memory figures below. They were obtain during 8-bit weights compression. Figures were gathered with the [memory_logger.py](#2801), memory_type=SYSTEM_NORMALIZED. | Backend | Model | Before | After | |---------|-------|--------|-------| | OV | qwen2-7b | ![system-normalized_memory_usage](https://github.com/openvinotoolkit/nncf/assets/23343961/0fd9a423-71e0-474b-9e44-8eb3accd464c) | ![system-normalized_memory_usage](https://github.com/openvinotoolkit/nncf/assets/23343961/ae53b89a-a5a6-4cf4-bad8-91b853602227) | | PT | qwen2-7b | ![system-normalized_memory_usage](https://github.com/openvinotoolkit/nncf/assets/23343961/2fe7e377-36ee-4d67-9d57-3b5563a8e349) | ![system-normalized_memory_usage](https://github.com/openvinotoolkit/nncf/assets/23343961/1f89e657-cd1a-4af1-ae5a-8275fdd72679) | For example, for qwen2-7b OV model there is a reduction from ~12GB peak footprint to ~7GB. Much lower values for OV backend compared to PT are because OV models are read using mmap which allows to avoid allocating memory for the whole full-precision model. ### Related tickets 144501
- Loading branch information