Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subclassing a data module from a config file results in exit code 2 when the module is defined in the same file as the CLI #12362

Closed
kklemon opened this issue Mar 17, 2022 · 2 comments
Labels
bug Something isn't working lightningcli pl.cli.LightningCLI

Comments

@kklemon
Copy link

kklemon commented Mar 17, 2022

🐛 Bug

If I define a custom base LightningDataModule class and set subclass_mode_data=True in the LightningCLI module, following the instructions here and then provide the data configuration in a config file, I get an exit code 2 without any error message when the custom data module is defined in the same file as the CLI module.

Note, that this only seems to happen if the config is in a subfolder.

To Reproduce

project/
  config/
    base.yaml
  main.py

main.py:

import torch
import torch.nn as nn
import pytorch_lightning as pl

from pytorch_lightning.utilities.cli import LightningCLI
from torch.utils.data import DataLoader


class DummyLitModule(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = nn.Linear(10, 1)

    def configure_optimizers(self):
        return torch.optim.SGD(self.parameters(), lr=1.0)

    def forward(self, x):
        return self.layer(x).mean()

    def training_step(self, batch, batch_idx):
        return self(batch)


class BaseDummyDataModule(pl.LightningDataModule):
    pass


class DummyDataModule(BaseDummyDataModule):
    def train_dataloader(self):
        return DataLoader(torch.randn(512, 10))


LightningCLI(
    DummyLitModule,
    BaseDummyDataModule,
    subclass_mode_data=True
)

config/base.yaml:

fit:
  data:
    class_path: main.DummyDataModule
    init_args: {}

The training is executed with:

python main.py --config=config/base.yaml fit

This crash can be resolved in the following ways:

  • Put base.yaml in the root folder
  • Put the data classes (both the base module and implementation) in a custom module, e.g. data.py
  • Provide the configuration for the data module as CLI argument instead of using a configuration file

Expected behavior

Training runs without an error.

Environment

  • CUDA:
    - GPU:
    - available: False
    - version: None
  • Packages:
    - numpy: 1.20.0
    - pyTorch_debug: False
    - pyTorch_version: 1.10.1+cpu
    - pytorch-lightning: 1.5.10
    - tqdm: 4.62.3
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    - ELF
    - processor: x86_64
    - python: 3.8.0b4

cc @carmocca @mauvilsa

@akihironitta akihironitta added bug Something isn't working lightningcli pl.cli.LightningCLI labels Mar 18, 2022
@mauvilsa
Copy link
Contributor

mauvilsa commented Mar 18, 2022

Regarding the no error message, there was a regression in recent versions of jsonargparse, see #12303. It has been fixed but haven't had the time to release it. Will try to do so today.

The bug here is in your main.py script and not in pytorch-lightning or in jsonargparse. The problem is that LightningCLI should be inside a if __name__ == '__main__': block, otherwise main.py is not an importable module. In the config you also need to change the class path to __main__.DummyDataModule.

What is happening is that you run main.py and LightningCLI starts parsing the config. Since config/base.yaml is in a different directory, temporarily the working directory changes to config/ so that relative paths in base.yaml if any, work as expected. At some point there is an attempt to import the class path main.DummyDataModule. In python the main script is imported as __main__, and python is a bit dumb and does not consider main.* to be the same module. So main.py is imported once again, which leads to the LightningCLI being executed again. At this point it fails because the path config/base.yaml is no longer valid due to the working directory being different. I am not sure what could be done to fix this since it is just a weird behavior that python has. The class path also needs to change because of python being dumb and considering __main__.BaseDummyDataModule and main.BaseDummyDataModule to be different classes.

To observe the behavior I explained keep the config/base.yaml file and run a script with:

import os
from jsonargparse.typing import Path_fr
from jsonargparse.util import change_to_path_dir

class MyClass:
    pass

print(f'main.py imported with __name__={__name__} and cwd={os.getcwd()}')
with change_to_path_dir(Path_fr('config/base.yaml')):
    __import__('main', fromlist=['MyClass'])

@kklemon
Copy link
Author

kklemon commented Mar 18, 2022

Regarding the no error message, there was a regression in recent versions of jsonargparse, see #12303. It has been fixed but haven't had the time to release it. Will try to do so today.

Makes sense. Thanks for the explanation.

The bug here is in your main.py script and not in pytorch-lightning or in jsonargparse. The problem is that LightningCLI should be inside a if name == 'main': block, otherwise main.py is not an importable module. In the config you also need to change the class path to main.DummyDataModule.

That was my fault. I noticed this bug in a larger project some time ago where the CLI object was not exposed globally but forgot to the same when I tried to replicate the behaviour with minimal code.

When not exposing the CLI object globally, I indeed get the correct and original error message:

main.py: error: Configuration check failed :: Parser key "data": "main.DummyDataModule" is not a subclass of BaseDummyDataModule

Following your advice, this can indeed be fixed by replacing main.DummyDataModule with __main__.DummyDataModule in the config file.

I guess this is just some non-intuitive behaviour that needs to taken care of. From my side, this issue can be closed.

@kklemon kklemon closed this as completed Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lightningcli pl.cli.LightningCLI
Projects
None yet
Development

No branches or pull requests

3 participants