Reenable SDPA's FA2 During Training with torch.compile #30442

warner-benjamin · 2024-04-23T23:34:39Z

This PR resolves #30010 and completes #30070 by reenabling the SDPA Flash Attention 2 kernel for torch.compile when the model is training. During eval, SDPA dispatches to the efficient kernel with the same logic as in #30070.

This PR will prevent SDPA Attention models from using a low amount of memory during training in eager mode but using a large amount or OOM'ing when compiling due to using the wrong SDPA kernel. It shouldn't affect exporting or generation when the model is in eval mode.

Moving the is_causal dispatch logic from inline to an if statement is required to support both fullgraph=True and dynamic=True. The current code errors out with dynamic=True due to q_len > 1 not being the correct bool type. But wrapping it in a bool bool(q_len>1) to fix dynamic breaks fullgraph=True.

The Llama tests that I could run either all pass or fail in the same state as on main (LlamaIntegrationTest::test_conversion & LlamaIntegrationTest::test_compile_static_cache). I couldn't run Gemma tests due to a model gating error despite having access to Gemma.

warner-benjamin · 2024-04-23T23:35:27Z

Tagging @ArthurZucker and @younesbelkada for review.

HuggingFaceDocBuilderDev · 2024-04-24T08:45:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

I hate having this if else.... but I guess it's for the best here.
I wish this was natively supported.

Quick question I guess this adds a guard?
Otherwise, LGTM and slow test will be triggered one merged.

src/transformers/models/cohere/modeling_cohere.py

ArthurZucker · 2024-04-24T09:06:20Z

fyi @fxmarty when you come back

warner-benjamin · 2024-04-24T18:59:53Z

Not sure why the CI errored out after these formatting changes. Locally I still have LlamaIntegrationTest::test_conversion & LlamaIntegrationTest::test_compile_static_cache failing, both which also fail on main. Every other run slow test passes for Llama.

fxmarty

Thank you! It looks okay to me, just suggested a style change.

src/transformers/models/cohere/modeling_cohere.py

fxmarty · 2024-04-29T07:58:50Z

@warner-benjamin can you make sure the CI is green? Failing ones seem unrelated, maybe merging main will do

…0442) * Reenable SDPA's FA2 during training with torch.compile * fix Olmo's SDPA FA2 dispatching too * update formatting * improved SDPA comment * formatting and explanatory comment * is_causal if statement to one-liner

* Reenable SDPA's FA2 during training with torch.compile * fix Olmo's SDPA FA2 dispatching too * update formatting * improved SDPA comment * formatting and explanatory comment * is_causal if statement to one-liner

tombousso · 2024-06-03T18:17:06Z

This PR causes a dynamo graph break at torch.all(attention_mask == 1) when running with torch.compile in training mode, due to the dynamic control flow. Is there a way to get around this?

fxmarty · 2024-06-04T11:24:25Z

@tombousso Yes. Why is it an issue for you? Do you see perf degradation?

AFAIK there is no obvious way around it, maybe using newer APIs from pytorch/pytorch#114823 & maybe other PRs

fxmarty · 2024-06-04T12:15:32Z

https://pytorch.org/docs/main/cond.html may be the way?

tombousso · 2024-06-04T18:11:51Z

Yes, I was seeing perf degradation. I was hoping to get a graph with no breaks to make it easier to see what's going on, and to give the compiler the best opportunity to make optimizations.

fxmarty · 2024-06-05T08:00:49Z

@tombousso Could you open an issue for that?

Reenable SDPA's FA2 during training with torch.compile

26e967e

fix Olmo's SDPA FA2 dispatching too

bc8aa59

ArthurZucker approved these changes Apr 24, 2024

View reviewed changes

src/transformers/models/cohere/modeling_cohere.py Outdated Show resolved Hide resolved

ArthurZucker mentioned this pull request Apr 24, 2024

Llama: SDPA FA2 path + static cache fix #30437

Closed

warner-benjamin added 2 commits April 24, 2024 12:39

update formatting

76186c7

improved SDPA comment

f3f1347

fxmarty approved these changes Apr 26, 2024

View reviewed changes

src/transformers/models/cohere/modeling_cohere.py Outdated Show resolved Hide resolved

formatting and explanatory comment

60837f9

warner-benjamin added 2 commits April 29, 2024 10:19

Merge branch 'huggingface:main' into sdpa_fa2_compile

9e486ae

is_causal if statement to one-liner

0aab518

fxmarty merged commit 9df8b30 into huggingface:main Apr 29, 2024
21 checks passed

warner-benjamin mentioned this pull request Apr 29, 2024

Add support for torch.compile dynamic shapes #30560

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reenable SDPA's FA2 During Training with torch.compile #30442

Reenable SDPA's FA2 During Training with torch.compile #30442

warner-benjamin commented Apr 23, 2024

warner-benjamin commented Apr 23, 2024

HuggingFaceDocBuilderDev commented Apr 24, 2024

ArthurZucker left a comment

ArthurZucker commented Apr 24, 2024

warner-benjamin commented Apr 24, 2024 •

edited

Loading

fxmarty left a comment

fxmarty commented Apr 29, 2024

tombousso commented Jun 3, 2024

fxmarty commented Jun 4, 2024 •

edited

Loading

fxmarty commented Jun 4, 2024

tombousso commented Jun 4, 2024 •

edited

Loading

fxmarty commented Jun 5, 2024

Reenable SDPA's FA2 During Training with torch.compile #30442

Reenable SDPA's FA2 During Training with torch.compile #30442

Conversation

warner-benjamin commented Apr 23, 2024

warner-benjamin commented Apr 23, 2024

HuggingFaceDocBuilderDev commented Apr 24, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker commented Apr 24, 2024

warner-benjamin commented Apr 24, 2024 • edited Loading

fxmarty left a comment

Choose a reason for hiding this comment

fxmarty commented Apr 29, 2024

tombousso commented Jun 3, 2024

fxmarty commented Jun 4, 2024 • edited Loading

fxmarty commented Jun 4, 2024

tombousso commented Jun 4, 2024 • edited Loading

fxmarty commented Jun 5, 2024

warner-benjamin commented Apr 24, 2024 •

edited

Loading

fxmarty commented Jun 4, 2024 •

edited

Loading

tombousso commented Jun 4, 2024 •

edited

Loading