GPU ram memory increase until overflow when using PSNR and SSIM #2597

ouioui199 · 2024-06-14T11:20:48Z

🐛 Bug

Hello all,

I'm implementing CycleGAN with Lightning. I use PSNR and SSIM from torchmetrics for evaluation.
During training, I see that my GPU ram memory increases non stop until overflow and the whole training shuts down.
This might similar to #2481

To Reproduce

Add this to init method of model class:

self.train_metrics = MetricCollection({"PSNR": PeakSignalNoiseRatio(), "SSIM": StructuralSimilarityIndexMeasure()})
self.valid_metrics = self.train_metrics.clone(prefix='val_')

In training_step method:
train_metrics = self.train_metrics(fake, real)

In validation_step method:
valid_metrics = self.valid_metrics(fake, real)

Environment

TorchMetrics version: 1.3.0 installed via pip
Python: 3.11.7
Pytorch: 2.1.2
Issue encountered when training on Window 10

Easy fix proposition

I try to debug the code.
When verifying train_metrics, I get this:

"{'PSNR': tensor(10.5713, device='cuda:0', grad_fn=<SqueezeBackward0>), 'SSIM': tensor(0.0373, device='cuda:0', grad_fn=<SqueezeBackward0>)}"

which is weird because metrics aren't supposed to be attached to computational graph.
When verifying valid_metrics, I don't see grad_fn.
Guessing that's the issue, I tried to call fake.detach() when computing train_metrics.
Now the training is stable, the GPU memory stops increasing non stop.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-06-14T11:21:10Z

Hi! thanks for your contribution!, great first issue!

Borda · 2024-08-21T14:47:15Z

@ouioui199 looking at your example (could you pls share the full sample code?) and wondering if you in the epoch end hook also call compute?

ouioui199 added bug / fix Something isn't working help wanted Extra attention is needed labels Jun 14, 2024

Borda added the v1.3.x label Aug 2, 2024

Borda assigned SkafteNicki Aug 2, 2024

Borda added question Further information is requested and removed help wanted Extra attention is needed labels Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU ram memory increase until overflow when using PSNR and SSIM #2597

GPU ram memory increase until overflow when using PSNR and SSIM #2597

ouioui199 commented Jun 14, 2024 •

edited by Borda

Loading

github-actions bot commented Jun 14, 2024

Borda commented Aug 21, 2024

GPU ram memory increase until overflow when using PSNR and SSIM #2597

GPU ram memory increase until overflow when using PSNR and SSIM #2597

Comments

ouioui199 commented Jun 14, 2024 • edited by Borda Loading

🐛 Bug

To Reproduce

Environment

Easy fix proposition

github-actions bot commented Jun 14, 2024

Borda commented Aug 21, 2024

ouioui199 commented Jun 14, 2024 •

edited by Borda

Loading