Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError with multiple validation loaders and fast_dev_run #2531

Closed
lucmos opened this issue Jul 6, 2020 · 2 comments · Fixed by #2581
Closed

IndexError with multiple validation loaders and fast_dev_run #2531

lucmos opened this issue Jul 6, 2020 · 2 comments · Fixed by #2581
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@lucmos
Copy link
Contributor

lucmos commented Jul 6, 2020

🐛 Bug

An IndexError when using multiple validation datasets and fast_dev_run=True

To Reproduce

Steps to reproduce the behavior:

  1. Use multiple val_dataloaders
  2. Use fast_dev_run=True

Code sample

https://colab.research.google.com/drive/107nKJxF4ttWPtQbo8-Wb0RG3Sa_fxjQP?usp=sharing

Traceback

Traceback (most recent call last):
  File "/home/luca/Repositories/set-operations/src/run_experiment.py", line 73, in <module>
    trainer.fit(model,)
  File "/home/luca/.cache/pypoetry/virtualenvs/set-operations-GbjOlTQ2-py3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 979, in fit
    self.single_gpu_train(model)
  File "/home/luca/.cache/pypoetry/virtualenvs/set-operations-GbjOlTQ2-py3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 185, in single_gpu_train
    self.run_pretrain_routine(model)
  File "/home/luca/.cache/pypoetry/virtualenvs/set-operations-GbjOlTQ2-py3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1156, in run_pretrain_routine
    self.train()
  File "/home/luca/.cache/pypoetry/virtualenvs/set-operations-GbjOlTQ2-py3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 370, in train
    self.run_training_epoch()
  File "/home/luca/.cache/pypoetry/virtualenvs/set-operations-GbjOlTQ2-py3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 470, in run_training_epoch
    self.run_evaluation(test_mode=False)
  File "/home/luca/.cache/pypoetry/virtualenvs/set-operations-GbjOlTQ2-py3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 409, in run_evaluation
    eval_results = self._evaluate(self.model, dataloaders, max_batches, test_mode)
  File "/home/luca/.cache/pypoetry/virtualenvs/set-operations-GbjOlTQ2-py3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 270, in _evaluate
    dl_max_batches = max_batches[dataloader_idx]
IndexError: list index out of range

                              Exception ignored in: <function tqdm.__del__ at 0x7fe5848ba710>
Traceback (most recent call last):
  File "/home/luca/.cache/pypoetry/virtualenvs/set-operations-GbjOlTQ2-py3.7/lib/python3.7/site-packages/tqdm/std.py", line 1086, in __del__
  File "/home/luca/.cache/pypoetry/virtualenvs/set-operations-GbjOlTQ2-py3.7/lib/python3.7/site-packages/tqdm/std.py", line 1293, in close
  File "/home/luca/.cache/pypoetry/virtualenvs/set-operations-GbjOlTQ2-py3.7/lib/python3.7/site-packages/tqdm/std.py", line 1471, in display
  File "/home/luca/.cache/pypoetry/virtualenvs/set-operations-GbjOlTQ2-py3.7/lib/python3.7/site-packages/tqdm/std.py", line 1089, in __repr__
  File "/home/luca/.cache/pypoetry/virtualenvs/set-operations-GbjOlTQ2-py3.7/lib/python3.7/site-packages/tqdm/std.py", line 1433, in format_dict
TypeError: cannot unpack non-iterable NoneType object

Process finished with exit code 1

Reason

If fast_dev_run=True here max_batches is set to [1]
https://github.com/PyTorchLightning/pytorch-lightning/blob/afdfba1dc6061c5e1ee6eaf215500d6a56e95482/pytorch_lightning/trainer/evaluation_loop.py#L376-L377

Thus, later on, it does not pass this test and it remains stuck to [1]:
https://github.com/PyTorchLightning/pytorch-lightning/blob/afdfba1dc6061c5e1ee6eaf215500d6a56e95482/pytorch_lightning/trainer/evaluation_loop.py#L256-L257

Then, the loop iterates over all the dataloaders, causing a IndexError at line 270 at the second iteration:
https://github.com/PyTorchLightning/pytorch-lightning/blob/afdfba1dc6061c5e1ee6eaf215500d6a56e95482/pytorch_lightning/trainer/evaluation_loop.py#L260-L270

Possible solution

  • Let fast_dev_run=True use all validation loaders
  • Modify the evaluation for loop to use only the first val loader

Environment

  • CUDA:
    • GPU:
    • available: False
    • version: 10.1
  • Packages:
    • numpy: 1.18.5
    • pyTorch_debug: False
    • pyTorch_version: 1.5.1+cu101
    • pytorch-lightning: 0.8.4
    • tensorboard: 2.2.2
    • tqdm: 4.41.1
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.6.9
    • version: 1 SMP Wed Feb 19 05:26:34 PST 2020
@lucmos lucmos added bug Something isn't working help wanted Open to be worked on labels Jul 6, 2020
@rohitgr7
Copy link
Contributor

rohitgr7 commented Jul 6, 2020

Let fast_dev_run=True use all validation loaders

This is a better choice since Dataset of different dataloaders can be different and we need to check all of them using fast_dev_run.

@Borda
Copy link
Member

Borda commented Jul 7, 2020

@lucmos seems you digged in... mind send a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants