-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Meet c10::DistBackendError when finetuning Qwen2-VL with video dataset #5417
Comments
please see the example scripts and increase the |
Why does a timeout occur? Can you explain the real reason? |
increase the ddp_timeout parameter can not solve my error when use script "llamafactory-cli train examples/train_full/qwen2vl_full_sft.yaml" to start training. Errors always occur: |
Reminder
System Info
llamafactory
version: 0.8.4.dev0Reproduction
I used script:
And received the error message below from the command line when tokenizing the dataset:
I tried different dataset and seeds, but they all fail at the same time (~30 min), so I'm wondering if there's something wrong with qwen2-vl or the fine-tuning code.
Expected behavior
Process the dataset and run the finetuning normally without error
Others
No response
The text was updated successfully, but these errors were encountered: