Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower weight compression memory footprint by sorting weights according to their size #2803

Conversation

nikita-savelyevv
Copy link
Collaborator

@nikita-savelyevv nikita-savelyevv commented Jul 10, 2024

Changes

Sort weights for compression:

all_weight_params = sorted(all_weight_params, key=lambda wp: wp.num_weights, reverse=True)

Reason for changes

During weights compression, memory footprint gradually increases when new low-bit constants are created. At the same time there are temporary spikes in memory footprint which happen during compressed weight computation. For example, here:

if invert_scale:
    scale = fns.power(scale, -1)
    compressed_weights = weight * scale
else:
    compressed_weights = weight / scale
if zero_point is not None:
    compressed_weights += zero_point.astype(weight.dtype)
compressed_weights = fns.round(compressed_weights)
compressed_weights = fns.clip(compressed_weights, level_low, level_high).astype(dtype)

Multiple temporary full precision arrays are needed to be created here. After that they get garbage-collected. However, as it was said, this creates temporary spikes in memory footprint. Taking this into account, it makes sense to compress large constants first so that there are not many low-bit constants taking up memory yet. This mostly is affected by embedding matrices.

Please see memory figures below. They were obtain during 8-bit weights compression. Figures were gathered with the memory_logger.py, memory_type=SYSTEM_NORMALIZED.

Backend Model Before After
OV qwen2-7b system-normalized_memory_usage system-normalized_memory_usage
PT qwen2-7b system-normalized_memory_usage system-normalized_memory_usage

For example, for qwen2-7b OV model there is a reduction from ~12GB peak footprint to ~7GB.

Much lower values for OV backend compared to PT are because OV models are read using mmap which allows to avoid allocating memory for the whole full-precision model.

Related tickets

144501

@nikita-savelyevv nikita-savelyevv requested a review from a team as a code owner July 10, 2024 13:38
@github-actions github-actions bot added the NNCF PTQ Pull requests that updates NNCF PTQ label Jul 10, 2024
@andreyanufr
Copy link
Collaborator

@andreyanufr Could you please confirm that AWQ and Scale Estimation algorithms are agnostic to the weight ordering? @alexsu52 And also GPTQ?

AWQ and Scale Estimation algorithms are agnostic to the weight ordering, but not GPTQ.

@nikita-savelyevv
Copy link
Collaborator Author

Weight compression build with id 103 has passed

@ljaljushkin ljaljushkin merged commit 232b435 into openvinotoolkit:develop Jul 12, 2024
12 checks passed
AlexanderDokuchaev pushed a commit that referenced this pull request Aug 26, 2024
### Changes

During nncf weight compression, `rich` progress bar is used to display
the progress. In this PR, progress bar is changed to be weighted
according to model weights. With these changes, each weight contributes
proportional amount of percent to the progress bar.

Iteration number was removed from weight compression progress bar to
avoid confusion between different speeds in percent and iteration
coordinates. For example now a single weight might contribute 5-10% to
the whole progress.

### Reason for changes

The time it takes to compress a weight is roughly proportional to its
size, so incrementing the progress by 1 for each weight is not ideal.

Especially after #2803 when weight sorting was added. Now, the largest
weights come first and the smallest ones are at the end of the
compression. This leads to misleading time estimation when progress
contribution from every weight is equal.

Weights sizes for tinyllama-1.1b for reference:

![weight_size_hist](https://github.com/user-attachments/assets/30ba1e1b-0fc5-4d6b-84db-948362672bf2)


![weight_size_cumsum_hist](https://github.com/user-attachments/assets/b00e79e8-5000-44a4-97a5-4102c9aed0ae)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NNCF PTQ Pull requests that updates NNCF PTQ
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants