Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: the source part should not participate in loss calculation in SFT stage #762

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

xffxff
Copy link
Contributor

@xffxff xffxff commented Oct 10, 2023

fix #660

In the SFT stage, it's essential that the source part doesn't contribute to the loss calculation, only the completion part should be considered. To address this issue, I've adjusted the labels for the source part to be set as -100. This specific value, -100, corresponds to the default "ignore index" in the torch.nn.CrossEntropyLoss function. Importantly, both OPT and LLAMA models utilize torch.nn.CrossEntropyLoss for their loss calculations, as seen in OPTForCausalLM and LLamaForCausalLM. As a result, there is no need to make any modifications to the way the loss is computed, as it will automatically handle the source part as intended.

The training loss of opt-350m with Dahoas/rm-static as dataset
image

@AndyW-llm
Copy link

  1. The function is now moved to "DeepSpeedExamples/applications/DeepSpeed-Chat/dschat/utils/data/data_utils.py"
  2. This solution seems to assume single-turn conversation, please consider cases for multi-turn conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Step1: Not mask source part
2 participants