Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable moving the model to gpu within a work locally. #15699

Closed
tchaton opened this issue Nov 16, 2022 · 1 comment · Fixed by #15923
Closed

Enable moving the model to gpu within a work locally. #15699

tchaton opened this issue Nov 16, 2022 · 1 comment · Fixed by #15923
Labels
app (removed) Generic label for Lightning App package priority: 0 High priority task

Comments

@tchaton
Copy link
Contributor

tchaton commented Nov 16, 2022

🚀 Feature

Motivation

Lightning App MPBackend uses multiprocessing.Process to run the works. When trying to move a model to GPU locally, it raised a RuntimeError: Cannot re-initialize CUDA in forked subprocess.

import lightning as L
import torch

class Work(L.LightningWork):

    def run(self):
        torch.zeros(1, device="cuda")

app = L.LightningApp(Work())
  File "/home/thomas/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/thomas/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 418, in __call__
    raise e
  File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 401, in __call__
    self.run_once()
  File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 549, in run_once
    self.work.on_exception(e)
  File "/home/thomas/lightning/src/lightning/app/core/work.py", line 564, in on_exception
    raise exception
  File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 514, in run_once
    ret = self.run_executor_cls(self.work, work_run, self.delta_queue)(*args, **kwargs)
  File "/home/thomas/lightning/src/lightning/app/utilities/proxies.py", line 350, in __call__
    return self.work_run(*args, **kwargs)
  File "gpu_app.py", line 8, in run
    torch.zeros(1, device="cuda")
  File "/home/thomas/Dreambooth_app/.venv/lib/python3.8/site-packages/torch/cuda/__init__.py", line 207, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Pitch

Alternatives

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging PyTorch Lightning, Transformers, and Hydra.

cc @tchaton

@Borda
Copy link
Member

Borda commented Dec 5, 2022

@tchaton was this fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app (removed) Generic label for Lightning App package priority: 0 High priority task
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants