You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently if we use the ddp and the fcmae model for fine-tuning for the virtual staining tasks, there seems to be a 'memory leak'. The solution could be to expose these parameters at the ViscyTrainer.
Using PyTorch Lightning’s CombinedLoader with Distributed Data Parallel (DDP spawns multiple processes (one per GPU) and seems to lead to excessive accumulation in a subset of worker processes. Setting persistent_workers=False restarts the DataLoader workers at the beginning of each epoch, which prevents the accumulation of memory or disk space. There is a performance trade-off here as well as reducing the hardcoded prefetch factor from 4 to 2.
The text was updated successfully, but these errors were encountered:
When I enable pinned memory in #195 I see this issue: pytorch/pytorch#97432. But this is likely not related to the HCS datamodule since that one is not using pinned memory.
Currently if we use the
ddp
and thefcmae
model for fine-tuning for the virtual staining tasks, there seems to be a 'memory leak'. The solution could be to expose these parameters at theViscyTrainer
.Using PyTorch Lightning’s CombinedLoader with Distributed Data Parallel (DDP spawns multiple processes (one per GPU) and seems to lead to excessive accumulation in a subset of worker processes. Setting
persistent_workers=False
restarts the DataLoader workers at the beginning of each epoch, which prevents the accumulation of memory or disk space. There is a performance trade-off here as well as reducing the hardcoded prefetch factor from 4 to 2.The text was updated successfully, but these errors were encountered: