-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add num_devices in Engine for multi-gpu training #3778
Add num_devices in Engine for multi-gpu training #3778
Conversation
I'm not sure OTX is already supporting multi GPU training. As far as I know, we haven't checked all models can be trained on multi GPU. And I also think that we should add integration test to validate distributed training if we really support it. If it's just for preparing, then I think it's ok to merge after reverting documentation. |
Yes you are right, but this is a solution to the issue that there is no way to use it, so it is a PR to fix the issue. I agree that we should do validation for all models, but we should at least make this work in OTX. (Anyway, They also all work for Classification.) |
I understood. But honestly, I think it's hard to say that OTX supports multi GPU training. Currently, yolox models can't be trained on multi GPU (please refer #3635). I propose to notify that classification is only validated in documentation at least. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Eunwoo comment. We need to validate all tasks/models and handle the errors if it is not possible to train on multi GPUs
Summary
I noticed that multi-gpu settings are not available through Engine, fix this.
https://jira.devtools.intel.com/browse/CVS-148420
This allows us to set up multi-gpu in the API and CLI.
num_devices
property and setter function inEngine
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.