Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train multiple models in a single GPU on parallel #10204

Closed
Programmer-RD-AI opened this issue Oct 28, 2021 Discussed in #10159 · 4 comments
Closed

Train multiple models in a single GPU on parallel #10204

Programmer-RD-AI opened this issue Oct 28, 2021 Discussed in #10159 · 4 comments

Comments

@Programmer-RD-AI
Copy link
Contributor

Discussed in #10159

Originally posted by grudloff October 27, 2021
Is there a recommended way of training multiple models in parallel in a single GPU? I tried using joblib's Parallel & delayed but I got a CUDA OOM with two instances even though a single model uses barely a fourth of the total memory. And is a speedup compared to sequential calling expected?

@Programmer-RD-AI
Copy link
Contributor Author

@Programmer-RD-AI
Copy link
Contributor Author

Solution #2807

@Programmer-RD-AI
Copy link
Contributor Author

@awaelchli
Copy link
Contributor

Hey @Programmer-RD-AI
Please don't duplicate posts from the discussion forum here into GitHub issues.

GitHub issues are for:

  • Bug reports
  • Feature requests
  • Anything related to the development of Lightning

GitHub issues are not:

  • For pure question answering / implementation help
  • Broad discussions not directly related to the PL development

Thanks for your understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants