Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Regression metrics raise error in semanticsegmentation task #905

Closed
bartonp2 opened this issue Nov 2, 2021 · 3 comments · Fixed by #892
Closed

Regression metrics raise error in semanticsegmentation task #905

bartonp2 opened this issue Nov 2, 2021 · 3 comments · Fixed by #892
Labels
question Further information is requested

Comments

@bartonp2
Copy link
Contributor

bartonp2 commented Nov 2, 2021

❓ Questions and Help

What is your question?

I am trying to train a SemanticSegmentation() task. Some metrics (not the loss) such as the torchmetrics.MeanSquaredError() and other regression metrics don't work in the flash training loop. The following error is raised:
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment.

Is there a way to avoid this error?

Full Error Trace:

  File "/home/patrick/agro/weedetect/flash_training.py", line 43, in <module>
    trainer.finetune(model, datamodule=datamodule, strategy="freeze")
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/flash/core/trainer.py", line 165, in finetune
    return super().fit(model, train_dataloader, val_dataloaders, datamodule)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit
    self._run(model)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 922, in _run
    self._dispatch()
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 990, in _dispatch
    self.accelerator.start_training(self)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
    self._results = trainer.run_stage()
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1000, in run_stage
    return self._run_train()
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1049, in _run_train
    self.fit_loop.run()
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 200, in advance
    epoch_output = self.epoch_loop.run(train_dataloader)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 130, in advance
    batch_output = self.batch_loop.run(batch, self.iteration_count, self._dataloader_idx)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 100, in run
    super().run(batch, batch_idx, dataloader_idx)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 111, in run
    self.advance(*args, **kwargs)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 149, in advance
    self.batch_outputs[opt_idx].append(deepcopy(result.training_step_output))
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/copy.py", line 161, in deepcopy
    y = copier(memo)
  File "/home/patrick/miniconda3/envs/weedector/lib/python3.7/site-packages/torch/_tensor.py", line 55, in __deepcopy__
    raise RuntimeError("Only Tensors created explicitly by the user "
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

What's your environment?

  • OS: Ubuntu 18
  • Packaging pip
  • Version 0.5.1
@bartonp2 bartonp2 added the question Further information is requested label Nov 2, 2021
@ethanwharris
Copy link
Collaborator

Hi @bartonp2 I think this is due to the format of the predictions and targets in the semantic segmentation task. The targets are integer tensors where the value in each position is the index of the target class at that position. The predictions are tensors of logits with an additional dimension compared to the targets. This means that they cannot be meaningfully compared by a regression metric like mean squared error. Can you describe a bit more about your use case here?

One option is to extend the SemanticSegmentation task and override the to_metrics_format hook with some custom logic to change the predictions into the format required by the metric. Alternatively you could wrap the metric object to change the format there instead.

@bartonp2
Copy link
Contributor Author

bartonp2 commented Nov 4, 2021

Thanks for the suggestions @ethanwharris! While I admit that I have slightly modified the segmentation task to regress each pixel rather than predicting a specific class, I have already taken care of ensuring the correct shapes and types of input and target tensors.

The input format does not appear to be the problem as the metric computes without error. The error is only raised by the deepcopy function called within advance of the TrainingBatchLoop(). I am unsure of the internal workings of these training loops and why the deepcopy is needed. Further, I wouldn't know how to best modify this loop without subclassing the Trainer class.

The deepcopy problem is also not limited to regression metrics. It also applies to image metrics like SSIM for example.

@ethanwharris
Copy link
Collaborator

@bartonp2 Ah, that makes sense. We jsut encountered the same bug in #892 and are fixing it there. This should also go away when using PL 1.5 once #933 is merged 😃

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants