Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix a bug in pretrain_bert.py #355

Merged
merged 1 commit into from
Feb 27, 2024
Merged

fix a bug in pretrain_bert.py #355

merged 1 commit into from
Feb 27, 2024

Conversation

lzzmm
Copy link

@lzzmm lzzmm commented Feb 27, 2024

fix a bug that cause

...
File "pretrain_bert.py", line 91, in loss_func
    lm_loss_, sop_logits = output_tensor
    ^^^^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)

when using examples_deepspeed/bert_with_pile/ds_pretrain_bert.sh, examples/pretrain_bert.sh and examples/pretrain_bert_distributed.sh.

cuda 11.7, pytorch 2.0.1

output_tensor is like:

(tensor([[ 7.6910,  8.2171,  7.6419,  ...,  7.6605,  7.7664,  7.9584],
        [ 8.0546,  7.6694,  7.9774,  ...,  7.8288,  7.9537,  7.6055],
        [ 8.0042, 11.0503,  7.7699,  ...,  7.5501,  7.7662,  7.9076],
        [ 8.0456,  7.8315,  7.9037,  ...,  7.6657,  7.7722,  7.8742]],
       device='cuda:0', grad_fn=<CloneBackward0>), 
 tensor([[ 0.3318, -0.0834],
        [ 0.1063, -0.3003],
        [-0.2517, -0.0669],
        [ 0.1223,  0.1731]], device='cuda:0', grad_fn=<ToCopyBackward0>), 
 (tensor([[ 7.6910,  8.2171,  7.6419,  ...,  7.6605,  7.7664,  7.9584],
        [ 8.0546,  7.6694,  7.9774,  ...,  7.8288,  7.9537,  7.6055],
        [ 8.0042, 11.0503,  7.7699,  ...,  7.5501,  7.7662,  7.9076],
        [ 8.0456,  7.8315,  7.9037,  ...,  7.6657,  7.7722,  7.8742]],
       device='cuda:0', grad_fn=<CloneBackward0>), tensor([[ 0.3318, -0.0834],
        [ 0.1063, -0.3003],
        [-0.2517, -0.0669],
        [ 0.1223,  0.1731]], device='cuda:0', grad_fn=<ToCopyBackward0>))
)

Thanks for reviewing :-)

fix a bug that cause
File "pretrain_bert.py", line 91, in loss_func
    lm_loss_, sop_logits = output_tensor
    ^^^^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)
@conglongli conglongli merged commit a9856ce into microsoft:main Feb 27, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants