Fix Gradient Accumulation issue #34191

ArthurZucker · 2024-10-16T06:57:54Z

What does this PR do?

First draft

End goal is to make it easy for anyone to:

change the loss for his model
contribute a new loss for a model (like vision model, ENCODEC etc)
allow passing arbitrary kwargs, interfacing

TODO:

Fix deformable detr loss computation

src/transformers/loss_utils.py

BenjaminBossan

Thanks for coming forward with this fix so quickly. There is probably not much I can help with, but I took a look and added some comments.

BenjaminBossan · 2024-10-16T10:58:44Z

src/transformers/modeling_utils.py

+        if loss_type is None:
+            raise ValueError(
+                "We could not determine which loss function to use."
+                f"based on the the class name. Make sure you add `{ self.__class__.__name__}` to the `LOSS_MAPPING`"


I think users could be really confused when they read this message. They don't know what and where LOSS_MAPPING is and they don't know what value they should add there.

Yep will update this

BenjaminBossan · 2024-10-16T10:59:25Z

src/transformers/modeling_utils.py

+            )
+        if loss_type not in LOSS_MAPPING and getattr(self.config, "loss_type", None) is not None:
+            raise ValueError(
+                f"`loss_type={loss_type}` was set in the config but it is unrecognised"


Similar issue with potential confusion.

BenjaminBossan · 2024-10-16T11:02:49Z

src/transformers/loss_utils.py

+
+
+LOSS_MAPPING = {
+    "ForCausalLM": DefaultCrossEntropyLoss,


Just wondering aloud: Instead of matching based on class name, could we do a mapping from class to loss, and then do something like:

for key, val in LOSS_MAPPING.items(): if isinstance(self, key): loss = val break else: # no break # raise error

I assume the matching exists for custom classes that are out there in the wild. If name is a more reliable predictor than inheritance or if I'm misunderstanding, please disregard my comment.

Will look into improving, but this looks super slow

This would only be run once because of the LRU cache, right?

Also we get the classname from the class itself and want to have good defaults instead of matching against full name!

src/transformers/loss/loss_utils.py

Co-authored-by: Kashif Rasul <[email protected]>

…to quick-fix-ga

LysandreJik

Very good! I'll ask Daniel if he's down to review, it would be very useful to have his opinion. Thanks

src/transformers/loss/loss_rt_detr.py

ArthurZucker added 22 commits October 16, 2024 08:57

quick fix

44301cc

3 losses

1456088

oups

e57f00c

fix

7fa8503

nits

b955ea5

check how it scales for special models

07478e0

propagate for conditiona detr

1b356ef

propagate

4ef45b0

propagate

61da9b1

propagate

2e3f0f7

fixes

c31a3fb

propagate changes

a8cd107

update

711c357

fixup

4888cf3

nits

4323d85

f string

e5e4bbd

fixes

239a256

more fixes

bd298da

?

5dfc51c

nit

0a1cd2b

arg annoying f string

64f7e29

nits

aa01ae9

kashif reviewed Oct 16, 2024

View reviewed changes

src/transformers/loss_utils.py Outdated Show resolved Hide resolved

BenjaminBossan reviewed Oct 16, 2024

View reviewed changes

ArthurZucker added 6 commits October 16, 2024 13:32

grumble

8c1d68a

update

846cf1c

nit

e7e8a20

refactor

622290c

fix fetch tests

91e28aa

nit

da649b9

kashif reviewed Oct 16, 2024

View reviewed changes

src/transformers/loss/loss_utils.py Outdated Show resolved Hide resolved

ArthurZucker and others added 7 commits October 16, 2024 14:29

nit

df6472a

Update src/transformers/loss/loss_utils.py

cf1eb7b

Co-authored-by: Kashif Rasul <[email protected]>

Merge branch 'quick-fix-ga' of github.com:huggingface/transformers in…

dafd11b

…to quick-fix-ga

update

30f27cd

nit

d0edfad

fixup

9bcecc3

make pass

2839b3c

LysandreJik approved these changes Oct 16, 2024

View reviewed changes

ArthurZucker added 11 commits October 16, 2024 16:49

nits

557d225

port code to more models

393e178

fixup

aac054d

ntis

ce32d5e

arf

4dc49ac

update

d221e58

update

f03b193

nits

22b6283

update

64829e3

fix

0b6f425

update

e6f6f52

muellerzr mentioned this pull request Oct 16, 2024

[DRAFT] Enable users to use their own loss functions + deal with prefetching for grad accum #34198

Open

5 tasks

kashif reviewed Oct 16, 2024

View reviewed changes

src/transformers/loss/loss_rt_detr.py Outdated Show resolved Hide resolved

ArthurZucker added 3 commits October 16, 2024 20:29

nits

fa691aa

fine

66f6eef

agjkfslga.jsdlkgjklas

fcdf13d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Gradient Accumulation issue #34191

Fix Gradient Accumulation issue #34191

ArthurZucker commented Oct 16, 2024 •

edited

Loading

BenjaminBossan left a comment

BenjaminBossan Oct 16, 2024

ArthurZucker Oct 16, 2024

BenjaminBossan Oct 16, 2024

BenjaminBossan Oct 16, 2024

ArthurZucker Oct 16, 2024

BenjaminBossan Oct 16, 2024

ArthurZucker Oct 16, 2024

LysandreJik left a comment

Fix Gradient Accumulation issue #34191

Are you sure you want to change the base?

Fix Gradient Accumulation issue #34191

Conversation

ArthurZucker commented Oct 16, 2024 • edited Loading

What does this PR do?

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Oct 16, 2024

Choose a reason for hiding this comment

ArthurZucker Oct 16, 2024

Choose a reason for hiding this comment

BenjaminBossan Oct 16, 2024

Choose a reason for hiding this comment

BenjaminBossan Oct 16, 2024

Choose a reason for hiding this comment

ArthurZucker Oct 16, 2024

Choose a reason for hiding this comment

BenjaminBossan Oct 16, 2024

Choose a reason for hiding this comment

ArthurZucker Oct 16, 2024

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

ArthurZucker commented Oct 16, 2024 •

edited

Loading