Subclassing a data module from a config file results in exit code 2 when the module is defined in the same file as the CLI #12362

kklemon · 2022-03-17T18:45:58Z

🐛 Bug

If I define a custom base LightningDataModule class and set subclass_mode_data=True in the LightningCLI module, following the instructions here and then provide the data configuration in a config file, I get an exit code 2 without any error message when the custom data module is defined in the same file as the CLI module.

Note, that this only seems to happen if the config is in a subfolder.

To Reproduce

project/
  config/
    base.yaml
  main.py

main.py:

import torch
import torch.nn as nn
import pytorch_lightning as pl

from pytorch_lightning.utilities.cli import LightningCLI
from torch.utils.data import DataLoader


class DummyLitModule(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = nn.Linear(10, 1)

    def configure_optimizers(self):
        return torch.optim.SGD(self.parameters(), lr=1.0)

    def forward(self, x):
        return self.layer(x).mean()

    def training_step(self, batch, batch_idx):
        return self(batch)


class BaseDummyDataModule(pl.LightningDataModule):
    pass


class DummyDataModule(BaseDummyDataModule):
    def train_dataloader(self):
        return DataLoader(torch.randn(512, 10))


LightningCLI(
    DummyLitModule,
    BaseDummyDataModule,
    subclass_mode_data=True
)

config/base.yaml:

fit:
  data:
    class_path: main.DummyDataModule
    init_args: {}

The training is executed with:

python main.py --config=config/base.yaml fit

This crash can be resolved in the following ways:

Put base.yaml in the root folder
Put the data classes (both the base module and implementation) in a custom module, e.g. data.py
Provide the configuration for the data module as CLI argument instead of using a configuration file

Expected behavior

Training runs without an error.

Environment

CUDA:
- GPU:
- available: False
- version: None
Packages:
- numpy: 1.20.0
- pyTorch_debug: False
- pyTorch_version: 1.10.1+cpu
- pytorch-lightning: 1.5.10
- tqdm: 4.62.3
System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.8.0b4

cc @carmocca @mauvilsa

The text was updated successfully, but these errors were encountered:

mauvilsa · 2022-03-18T08:58:45Z

Regarding the no error message, there was a regression in recent versions of jsonargparse, see #12303. It has been fixed but haven't had the time to release it. Will try to do so today.

The bug here is in your main.py script and not in pytorch-lightning or in jsonargparse. The problem is that LightningCLI should be inside a if __name__ == '__main__': block, otherwise main.py is not an importable module. In the config you also need to change the class path to __main__.DummyDataModule.

What is happening is that you run main.py and LightningCLI starts parsing the config. Since config/base.yaml is in a different directory, temporarily the working directory changes to config/ so that relative paths in base.yaml if any, work as expected. At some point there is an attempt to import the class path main.DummyDataModule. In python the main script is imported as __main__, and python is a bit dumb and does not consider main.* to be the same module. So main.py is imported once again, which leads to the LightningCLI being executed again. At this point it fails because the path config/base.yaml is no longer valid due to the working directory being different. I am not sure what could be done to fix this since it is just a weird behavior that python has. The class path also needs to change because of python being dumb and considering __main__.BaseDummyDataModule and main.BaseDummyDataModule to be different classes.

To observe the behavior I explained keep the config/base.yaml file and run a script with:

import os
from jsonargparse.typing import Path_fr
from jsonargparse.util import change_to_path_dir

class MyClass:
    pass

print(f'main.py imported with __name__={__name__} and cwd={os.getcwd()}')
with change_to_path_dir(Path_fr('config/base.yaml')):
    __import__('main', fromlist=['MyClass'])

kklemon · 2022-03-18T19:16:30Z

Regarding the no error message, there was a regression in recent versions of jsonargparse, see #12303. It has been fixed but haven't had the time to release it. Will try to do so today.

Makes sense. Thanks for the explanation.

The bug here is in your main.py script and not in pytorch-lightning or in jsonargparse. The problem is that LightningCLI should be inside a if name == 'main': block, otherwise main.py is not an importable module. In the config you also need to change the class path to main.DummyDataModule.

That was my fault. I noticed this bug in a larger project some time ago where the CLI object was not exposed globally but forgot to the same when I tried to replicate the behaviour with minimal code.

When not exposing the CLI object globally, I indeed get the correct and original error message:

main.py: error: Configuration check failed :: Parser key "data": "main.DummyDataModule" is not a subclass of BaseDummyDataModule

Following your advice, this can indeed be fixed by replacing main.DummyDataModule with __main__.DummyDataModule in the config file.

I guess this is just some non-intuitive behaviour that needs to taken care of. From my side, this issue can be closed.

akihironitta added bug Something isn't working lightningcli pl.cli.LightningCLI labels Mar 18, 2022

kklemon closed this as completed Mar 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subclassing a data module from a config file results in exit code 2 when the module is defined in the same file as the CLI #12362

Subclassing a data module from a config file results in exit code 2 when the module is defined in the same file as the CLI #12362

kklemon commented Mar 17, 2022 •

edited by github-actions bot

Loading

mauvilsa commented Mar 18, 2022 •

edited

Loading

kklemon commented Mar 18, 2022

Subclassing a data module from a config file results in exit code 2 when the module is defined in the same file as the CLI #12362

Subclassing a data module from a config file results in exit code 2 when the module is defined in the same file as the CLI #12362

Comments

kklemon commented Mar 17, 2022 • edited by github-actions bot Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

mauvilsa commented Mar 18, 2022 • edited Loading

kklemon commented Mar 18, 2022

kklemon commented Mar 17, 2022 •

edited by github-actions bot

Loading

mauvilsa commented Mar 18, 2022 •

edited

Loading