Regression metrics raise error in semanticsegmentation task #905

bartonp2 · 2021-11-02T13:08:25Z

❓ Questions and Help

What is your question?

I am trying to train a SemanticSegmentation() task. Some metrics (not the loss) such as the torchmetrics.MeanSquaredError() and other regression metrics don't work in the flash training loop. The following error is raised:
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment.

Is there a way to avoid this error?

Full Error Trace:

  File "/home/patrick/agro/weedetect/flash_training.py", line 43, in <module>
    trainer.finetune(model, datamodule=datamodule, strategy="freeze")
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/flash/core/trainer.py", line 165, in finetune
    return super().fit(model, train_dataloader, val_dataloaders, datamodule)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit
    self._run(model)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 922, in _run
    self._dispatch()
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 990, in _dispatch
    self.accelerator.start_training(self)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
    self._results = trainer.run_stage()
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1000, in run_stage
    return self._run_train()
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1049, in _run_train
    self.fit_loop.run()
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 200, in advance
    epoch_output = self.epoch_loop.run(train_dataloader)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 130, in advance
    batch_output = self.batch_loop.run(batch, self.iteration_count, self._dataloader_idx)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 100, in run
    super().run(batch, batch_idx, dataloader_idx)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 149, in advance
    self.batch_outputs[opt_idx].append(deepcopy(result.training_step_output))
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 161, in deepcopy
    y = copier(memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/torch/_tensor.py", line 55, in __deepcopy__
    raise RuntimeError("Only Tensors created explicitly by the user "
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

What's your environment?

OS: Ubuntu 18
Packaging pip
Version 0.5.1

The text was updated successfully, but these errors were encountered:

ethanwharris · 2021-11-03T19:28:43Z

Hi @bartonp2 I think this is due to the format of the predictions and targets in the semantic segmentation task. The targets are integer tensors where the value in each position is the index of the target class at that position. The predictions are tensors of logits with an additional dimension compared to the targets. This means that they cannot be meaningfully compared by a regression metric like mean squared error. Can you describe a bit more about your use case here?

One option is to extend the SemanticSegmentation task and override the to_metrics_format hook with some custom logic to change the predictions into the format required by the metric. Alternatively you could wrap the metric object to change the format there instead.

bartonp2 · 2021-11-04T08:41:56Z

Thanks for the suggestions @ethanwharris! While I admit that I have slightly modified the segmentation task to regress each pixel rather than predicting a specific class, I have already taken care of ensuring the correct shapes and types of input and target tensors.

The input format does not appear to be the problem as the metric computes without error. The error is only raised by the deepcopy function called within advance of the TrainingBatchLoop(). I am unsure of the internal workings of these training loops and why the deepcopy is needed. Further, I wouldn't know how to best modify this loop without subclassing the Trainer class.

The deepcopy problem is also not limited to regression metrics. It also applies to image metrics like SSIM for example.

ethanwharris · 2021-11-05T12:37:01Z

@bartonp2 Ah, that makes sense. We jsut encountered the same bug in #892 and are fixing it there. This should also go away when using PL 1.5 once #933 is merged 😃

bartonp2 added the question Further information is requested label Nov 2, 2021

ethanwharris mentioned this issue Nov 5, 2021

Tabular regression task and example #892

Merged

8 tasks

ethanwharris closed this as completed in #892 Nov 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression metrics raise error in semanticsegmentation task #905

Regression metrics raise error in semanticsegmentation task #905

bartonp2 commented Nov 2, 2021 •

edited

Loading

ethanwharris commented Nov 3, 2021

bartonp2 commented Nov 4, 2021

ethanwharris commented Nov 5, 2021

Regression metrics raise error in semanticsegmentation task #905

Regression metrics raise error in semanticsegmentation task #905

Comments

bartonp2 commented Nov 2, 2021 • edited Loading

❓ Questions and Help

What is your question?

Full Error Trace:

What's your environment?

ethanwharris commented Nov 3, 2021

bartonp2 commented Nov 4, 2021

ethanwharris commented Nov 5, 2021

bartonp2 commented Nov 2, 2021 •

edited

Loading