BetterTransformer support training & autocast for all archs #1225

fxmarty · 2023-07-25T11:39:18Z

WIP, support training for (almost) all archs.

Though for now for encoders we pass attention_mask to SDPA, so it will only dispatch to the math path. I tried to use nestedtensor pytorch/pytorch#105913 without much success.

We probably should be more flexible to allow to use xformers / Hazy-flash that do support either custom mask or indexing. Or for training, simply ignore the mask.

cc @younesbelkada

Fixes #1081 #952 #971

Still some tests to add / make pass, and precise the doc

fxmarty added 4 commits July 25, 2023 13:36

support training

212aa3a

encoders and encoder+decoder all work

abd8920

warning about training decoders with padding

db0c561

leave to an other PR the backward for some archs

bac435b

fxmarty requested a review from younesbelkada July 26, 2023 13:37

fxmarty added 4 commits July 26, 2023 15:39

nit

d1f160a

fix tests

c70a3db

hopefully tests pass

dd67595

fix

0fcdff8

fxmarty merged commit 38061a6 into huggingface:main Jul 26, 2023

This was referenced Jul 26, 2023

Training Support for BetterTransformer #971

Closed

Autocast is not supported for BetterTransformer integration. #1081

Closed

Enable AMP for BetterTransformer #952

Closed

fxmarty mentioned this pull request Aug 3, 2023

[Docs / BetterTransformer ] Added more details about flash attention + SDPA huggingface/transformers#25265

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BetterTransformer support training & autocast for all archs #1225

BetterTransformer support training & autocast for all archs #1225

fxmarty commented Jul 25, 2023 •

edited

Loading

BetterTransformer support training & autocast for all archs #1225

BetterTransformer support training & autocast for all archs #1225

Conversation

fxmarty commented Jul 25, 2023 • edited Loading

fxmarty commented Jul 25, 2023 •

edited

Loading