Suggestion for introducing "shift_labels" argument for Trainer #17960

seungeunrho · 2022-06-30T11:33:38Z

Feature request

Add an argument to determine shifting the labels or not.

In TrainingArguments class, an argument named shift_labels should be added.

During training, at here and here, model must check both labels is not None and self.shift_labels is True

e.g.

if labels is not None and self.shift_labels:          # changed
            # Shift so that tokens < n predict n
            shift_logits = lm_logits[..., :-1, :].contiguous()
            shift_labels = labels[..., 1:].contiguous()

Default values for shift_labels is False, except for causal language models such as GPT2PreTrainedModel

Related to gpt2 : @patil-suraj and trainer @sgugger

Motivation

In the current state of the code, the shifting of labels for training GPT2LMHeadModel is changing under the use of label_smoothing, which I assume is unintended.

Specifically, training a GPT2LMHeadModel with args.label_smoothing_factor==0 (which is default), the code shifts the labels and computes the loss inside the model.forward().
This assumes that labels have not been shifted to be properly aligned with corresponding input_ids.

However, if I train GPT2LMHeadModel with args.label_smoothing_factor > 0, then the loss is computed here, inside the compute_loss() function of the Trainer.
This part assumes labels are already shifted, and does not proceed to shift the labels.

I believe whether to shift labels or not should be explicitly determined by its own argument, not by another argument like label_smoothing_factor. In my case, our team was very frustrated that our training results were totally different by only changing the label_smoothing with same given labels and input_ids.
The reason was due to the misalignment of labels and input_ids when turning on the label_smoothing.

Your contribution

I'm willing to make PR after your confirmation.

The text was updated successfully, but these errors were encountered:

sgugger · 2022-06-30T12:36:44Z

I don't think a new TrainingArgument is the right answer here. Some models shift the labels internally, I think it's all the models for causal LM (not jsut GPT-2), so I think instead of a flag, there should be a check when the loss is computed by the Trainer for label smoothing to see if the model class name is inside the MODEL_FOR_CAUSAL_LM_MAPPING_NAMES (to import from the auto module) and then shift the labels.

Let me know if you'd like to proceed with a PR for this fix!

seungeunrho · 2022-06-30T14:56:13Z

Thanks for quick reply. Your approach seems plausible and I'd like to proceed it.
I've read the document for contribution guide thoroughly. Can I just start now? or is there anything I should know before begin?

sgugger · 2022-06-30T15:02:06Z

You can start, good luck! :-)

When training CausalLM, loss is computed within model's foward() function and labels are shifted internally. However, if label smoothing is applied, loss is computed in trainer's compute_loss function and labels are not shifted. This causes unintended confusion during the alignment of labels and corresponding inputs. This commit is for resolving this confusion. Resolves huggingface#17960 On branch shift_labels_for_causalLM Changes to be committed: modified: src/transformers/trainer.py modified: src/transformers/trainer_pt_utils.py

* Shifting labels for causal LM when using label smoother When training CausalLM, loss is computed within model's foward() function and labels are shifted internally. However, if label smoothing is applied, loss is computed in trainer's compute_loss function and labels are not shifted. This causes unintended confusion during the alignment of labels and corresponding inputs. This commit is for resolving this confusion. Resolves #17960 On branch shift_labels_for_causalLM Changes to be committed: modified: src/transformers/trainer.py modified: src/transformers/trainer_pt_utils.py * Update trainer.py * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

…17987) * Shifting labels for causal LM when using label smoother When training CausalLM, loss is computed within model's foward() function and labels are shifted internally. However, if label smoothing is applied, loss is computed in trainer's compute_loss function and labels are not shifted. This causes unintended confusion during the alignment of labels and corresponding inputs. This commit is for resolving this confusion. Resolves huggingface#17960 On branch shift_labels_for_causalLM Changes to be committed: modified: src/transformers/trainer.py modified: src/transformers/trainer_pt_utils.py * Update trainer.py * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

Junpliu · 2023-12-12T04:00:55Z

Feature request

Add an argument to determine shifting the labels or not.

In TrainingArguments class, an argument named shift_labels should be added.

During training, at here and here, model must check both labels is not None and self.shift_labels is True

e.g.
if labels is not None and self.shift_labels:          # changed
            # Shift so that tokens < n predict n
            shift_logits = lm_logits[..., :-1, :].contiguous()
            shift_labels = labels[..., 1:].contiguous()
Default values for shift_labels is False, except for causal language models such as GPT2PreTrainedModel

Related to gpt2 : @patil-suraj and trainer @sgugger

Motivation

In the current state of the code, the shifting of labels for training GPT2LMHeadModel is changing under the use of label_smoothing, which I assume is unintended.

Specifically, training a GPT2LMHeadModel with args.label_smoothing_factor==0 (which is default), the code shifts the labels and computes the loss inside the model.forward(). This assumes that labels have not been shifted to be properly aligned with corresponding input_ids.

However, if I train GPT2LMHeadModel with args.label_smoothing_factor > 0, then the loss is computed here, inside the compute_loss() function of the Trainer. This part assumes labels are already shifted, and does not proceed to shift the labels.

I believe whether to shift labels or not should be explicitly determined by its own argument, not by another argument like label_smoothing_factor. In my case, our team was very frustrated that our training results were totally different by only changing the label_smoothing with same given labels and input_ids. The reason was due to the misalignment of labels and input_ids when turning on the label_smoothing.

Your contribution

I'm willing to make PR after your confirmation.

I want to know more about how the prediction text looks like under the label-smoothing case before the bug-fix. Does the model learn an indentity transformation and always predict the last input token repeatedly? I am curious about this.

zzaebok · 2024-08-09T04:35:47Z

@sgugger May I ask how to shift labels for the custom CausalLM model?
Let's assume I made a CustomCausalLM model, which is not mapped in MODEL_FOR_CAUSAL_LM_MAPPING_NAMES.
As other models, my CausalLM model (which is modeled based on modeling_llama.py) shifts labels in forward function.
When I use hf Trainer, my CaausalLM, and label smoother, how to shift labels in this case?

The current Trainer code is

if labels is not None:
    if unwrap_model(model)._get_name() in MODEL_FOR_CAUSAL_LM_MAPPING_NAMES.values():
        loss = self.label_smoother(outputs, labels, shift_labels=True)
    else:
        loss = self.label_smoother(outputs, labels)

I think the workaround is to add shift_labels=True in else statement.
Is there a correct, or better way to shift labels for my custom causal lm model when using label smoother??

ArthurZucker · 2024-08-26T15:33:11Z

By default, if you copy the end of the forward pass of the LlamaForConditionalGeneration you will see that there is a part where the labels are shifted!

seungeunrho mentioned this issue Jul 1, 2022

Shifting labels for causal LM when using label smoother #17987

Merged

5 tasks

sgugger closed this as completed in #17987 Jul 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion for introducing "shift_labels" argument for Trainer #17960

Suggestion for introducing "shift_labels" argument for Trainer #17960

seungeunrho commented Jun 30, 2022 •

edited

Loading

sgugger commented Jun 30, 2022

seungeunrho commented Jun 30, 2022

sgugger commented Jun 30, 2022

Junpliu commented Dec 12, 2023

Feature request

Motivation

Your contribution

zzaebok commented Aug 9, 2024

ArthurZucker commented Aug 26, 2024

Suggestion for introducing "shift_labels" argument for Trainer #17960

Suggestion for introducing "shift_labels" argument for Trainer #17960

Comments

seungeunrho commented Jun 30, 2022 • edited Loading

Feature request

Motivation

Your contribution

sgugger commented Jun 30, 2022

seungeunrho commented Jun 30, 2022

sgugger commented Jun 30, 2022

Junpliu commented Dec 12, 2023

Feature request

Motivation

Your contribution

zzaebok commented Aug 9, 2024

ArthurZucker commented Aug 26, 2024

seungeunrho commented Jun 30, 2022 •

edited

Loading