Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge decoder and decoder with past to stateful for seq2seq #1078

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

eaidova
Copy link
Collaborator

@eaidova eaidova commented Dec 18, 2024

What does this PR do?

This PR introduces stateful approach similar like we use for decoder only models for encoder-decoder architecutes like whisper, t5 e.t.c.

Background

decoders in seq2seq models have additional key-value cache pairs that produced by decoder attention based on encoder state. As per generation cycle, the encoder is called once, these states are the same for each decoder inference step compared to self-attention that increments the previous sequence len on each step. For efficiency, these values should be calculated once during first decoder step, but it leads to differences in the model graph or requires to have some condition blocks (if we want to have one model). The current optimum-intel export solution is having 2 decoders, which allows maintain optimal performance, but increases pipeline memory consumption due to full weight duplication in memory for decoders

This PR allows overcome the limitation and export only one decoder that conditionally will calculate cross_attn cache on generation demand. Additionally, it moves cache management on plugin level that simplify model usage and gives more possibilities for memory and perf optimizations on runtime side. Our experiments shows 20-30% perf boost comparing with model with 2 stateless decoders

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@eaidova eaidova force-pushed the ea/stateful_seq2seq branch 2 times, most recently from ce180de to 6b9dc88 Compare December 18, 2024 16:54
Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @eaidova 🔥

optimum/exporters/openvino/stateful.py Outdated Show resolved Hide resolved
optimum/exporters/openvino/convert.py Outdated Show resolved Hide resolved
tests/openvino/test_quantization.py Show resolved Hide resolved
@eaidova eaidova force-pushed the ea/stateful_seq2seq branch from 75c653d to 40f3dac Compare January 9, 2025 09:57
@eaidova eaidova requested a review from echarlaix January 9, 2025 10:00
github-merge-queue bot pushed a commit to openvinotoolkit/openvino.genai that referenced this pull request Jan 13, 2025
Ticket: 159473
Optimum-intel PR: huggingface/optimum-intel#1078
This PR switches optimum-intel in tests to stateful seq2seq branch.
Tests check both stateful and with past decoders. Once optimum-intel PR
is merged I'll switch version back to master.
github-merge-queue bot pushed a commit to openvinotoolkit/openvino.genai that referenced this pull request Jan 13, 2025
Ticket: 159473
Optimum-intel PR: huggingface/optimum-intel#1078
This PR switches optimum-intel in tests to stateful seq2seq branch.
Tests check both stateful and with past decoders. Once optimum-intel PR
is merged I'll switch version back to master.
github-merge-queue bot pushed a commit to openvinotoolkit/openvino.genai that referenced this pull request Jan 13, 2025
Ticket: 159473
Optimum-intel PR: huggingface/optimum-intel#1078
This PR switches optimum-intel in tests to stateful seq2seq branch.
Tests check both stateful and with past decoders. Once optimum-intel PR
is merged I'll switch version back to master.
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR thanks @eaidova ! Left two comments to make sure stateful compatible models are exported as expected, good to merge once resolved

optimum/exporters/openvino/convert.py Show resolved Hide resolved
tests/openvino/test_modeling.py Show resolved Hide resolved
@eaidova
Copy link
Collaborator Author

eaidova commented Jan 14, 2025

failed flux tests are not related, they are caused by update models on hf hub, I prepared PR for fixing this issue
#1108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants