merge decoder and decoder with past to stateful for seq2seq #1078

eaidova · 2024-12-18T08:16:46Z

What does this PR do?

This PR introduces stateful approach similar like we use for decoder only models for encoder-decoder architecutes like whisper, t5 e.t.c.

Background

decoders in seq2seq models have additional key-value cache pairs that produced by decoder attention based on encoder state. As per generation cycle, the encoder is called once, these states are the same for each decoder inference step compared to self-attention that increments the previous sequence len on each step. For efficiency, these values should be calculated once during first decoder step, but it leads to differences in the model graph or requires to have some condition blocks (if we want to have one model). The current optimum-intel export solution is having 2 decoders, which allows maintain optimal performance, but increases pipeline memory consumption due to full weight duplication in memory for decoders

This PR allows overcome the limitation and export only one decoder that conditionally will calculate cross_attn cache on generation demand. Additionally, it moves cache management on plugin level that simplify model usage and gives more possibilities for memory and perf optimizations on runtime side. Our experiments shows 20-30% perf boost comparing with model with 2 stateless decoders

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-12-18T08:21:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tests/openvino/test_quantization.py

echarlaix

Looks great @eaidova 🔥

optimum/exporters/openvino/stateful.py

optimum/exporters/openvino/convert.py

tests/openvino/test_quantization.py

optimum/exporters/openvino/model_configs.py

optimum/intel/openvino/modeling_base_seq2seq.py

Ticket: 159473 Optimum-intel PR: huggingface/optimum-intel#1078 This PR switches optimum-intel in tests to stateful seq2seq branch. Tests check both stateful and with past decoders. Once optimum-intel PR is merged I'll switch version back to master.

IlyasMoutawwakil

LGTM !

echarlaix

Great PR thanks @eaidova ! Left two comments to make sure stateful compatible models are exported as expected, good to merge once resolved

optimum/exporters/openvino/convert.py

tests/openvino/test_modeling.py

Co-authored-by: Ilyas Moutawwakil <[email protected]>

eaidova · 2025-01-14T18:48:20Z

failed flux tests are not related, they are caused by update models on hf hub, I prepared PR for fixing this issue
#1108

eaidova force-pushed the ea/stateful_seq2seq branch 2 times, most recently from ce180de to 6b9dc88 Compare December 18, 2024 16:54

eaidova commented Dec 19, 2024

View reviewed changes

tests/openvino/test_quantization.py Outdated Show resolved Hide resolved

dmitry-gorokhov mentioned this pull request Jan 3, 2025

[CPU] Fuse SDPA and Concat as early as possible openvinotoolkit/openvino#28189

Merged

echarlaix reviewed Jan 3, 2025

View reviewed changes

optimum/exporters/openvino/stateful.py Outdated Show resolved Hide resolved

optimum/exporters/openvino/convert.py Outdated Show resolved Hide resolved

tests/openvino/test_quantization.py Show resolved Hide resolved

echarlaix requested a review from IlyasMoutawwakil January 8, 2025 13:31

eaidova force-pushed the ea/stateful_seq2seq branch from 7700019 to 75c653d Compare January 9, 2025 09:06

eaidova added 9 commits January 9, 2025 13:29

merge decoder and decoder with past to stateful for seq2seq

c618bf0

fix quantization

f255e20

fix loading decoder_with_past

767a7a6

fix quant tests

f5d6fdc

fix tests

5b9c707

fix more tests

7e549ae

make input dynamic and enable sdpa

c85eefe

review comments and kv cache compression disable in fp

c127d51

fix task recognition

40f3dac

eaidova force-pushed the ea/stateful_seq2seq branch from 75c653d to 40f3dac Compare January 9, 2025 09:57

eaidova requested a review from echarlaix January 9, 2025 10:00

fix quantization tests

26e08d1

eaidova requested review from nikita-savelyevv and AlexKoff88 January 9, 2025 11:05

AlexKoff88 approved these changes Jan 9, 2025

View reviewed changes

as-suvorov mentioned this pull request Jan 9, 2025

Whisper pipeline: support stateful decoder openvinotoolkit/openvino.genai#1474

Merged

IlyasMoutawwakil reviewed Jan 12, 2025

View reviewed changes

optimum/exporters/openvino/model_configs.py Outdated Show resolved Hide resolved

IlyasMoutawwakil reviewed Jan 12, 2025

View reviewed changes

optimum/exporters/openvino/model_configs.py Outdated Show resolved Hide resolved

eaidova requested a review from IlyasMoutawwakil January 13, 2025 05:57

IlyasMoutawwakil reviewed Jan 13, 2025

View reviewed changes

optimum/intel/openvino/modeling_base_seq2seq.py Outdated Show resolved Hide resolved

IlyasMoutawwakil reviewed Jan 13, 2025

View reviewed changes

optimum/intel/openvino/modeling_base_seq2seq.py Show resolved Hide resolved

respect from_onnx

9219632

eaidova force-pushed the ea/stateful_seq2seq branch from 42c8902 to 9219632 Compare January 13, 2025 16:30

eaidova requested a review from IlyasMoutawwakil January 13, 2025 16:31

IlyasMoutawwakil approved these changes Jan 14, 2025

View reviewed changes

echarlaix approved these changes Jan 14, 2025

View reviewed changes

optimum/exporters/openvino/convert.py Show resolved Hide resolved

tests/openvino/test_modeling.py Show resolved Hide resolved

eaidova and others added 2 commits January 14, 2025 21:57

update test to check that stateful expected

b12dca9

Apply suggestions from code review

7255ba4

Co-authored-by: Ilyas Moutawwakil <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge decoder and decoder with past to stateful for seq2seq #1078

merge decoder and decoder with past to stateful for seq2seq #1078

eaidova commented Dec 18, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 18, 2024

echarlaix left a comment

IlyasMoutawwakil left a comment

echarlaix left a comment •

edited

Loading

eaidova commented Jan 14, 2025

merge decoder and decoder with past to stateful for seq2seq #1078

Are you sure you want to change the base?

merge decoder and decoder with past to stateful for seq2seq #1078

Conversation

eaidova commented Dec 18, 2024 • edited Loading

What does this PR do?

Background

Before submitting

HuggingFaceDocBuilderDev commented Dec 18, 2024

echarlaix left a comment

Choose a reason for hiding this comment

IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

echarlaix left a comment • edited Loading

Choose a reason for hiding this comment

eaidova commented Jan 14, 2025

eaidova commented Dec 18, 2024 •

edited

Loading

echarlaix left a comment •

edited

Loading