Regression: `CSVLogger` not working on version 2.2.0 #19432

ramon-adalia-lmd · 2024-02-08T08:26:35Z

Bug description

CSVLogger throws the following error when used in version 2.2.0:

  File "/usr/lib/python3.10/csv.py", line 157, in writerows
    return self.writer.writerows(map(self._dict_to_list, rowdicts))
  File "/usr/lib/python3.10/csv.py", line 149, in _dict_to_list
    raise ValueError("dict contains fields not in fieldnames: "
ValueError: dict contains fields not in fieldnames: 'train_hit_rate', 'train_precision'

Going back to 2.1.4 solves the issue.

What version are you seeing the problem on?

master

How to reproduce the bug

No response

Error messages and logs

  File "/usr/lib/python3.10/csv.py", line 157, in writerows
    return self.writer.writerows(map(self._dict_to_list, rowdicts))
  File "/usr/lib/python3.10/csv.py", line 149, in _dict_to_list
    raise ValueError("dict contains fields not in fieldnames: "
ValueError: dict contains fields not in fieldnames: 'train_hit_rate', 'train_precision'

Environment

Current environment

CUDA:
- GPU: None
- available: False
- version: 12.1
Lightning:
- lightning: 2.2.0
- lightning-utilities: 0.10.1
- pytorch-lightning: 2.2.0
- torch: 2.2.0
- torch-geometric: 2.4.0
- torchmetrics: 1.3.0.post0
Packages:
- aiohttp: 3.9.3
- aiosignal: 1.3.1
- async-timeout: 4.0.3
- attrs: 23.2.0
- certifi: 2024.2.2
- charset-normalizer: 3.3.2
- classify-imports: 4.2.0
- cloudpickle: 3.0.0
- contourpy: 1.2.0
- cycler: 0.12.1
- filelock: 3.13.1
- fonttools: 4.48.1
- frozenlist: 1.4.1
- fsspec: 2024.2.0
- idna: 3.6
- jinja2: 3.1.3
- joblib: 1.3.2
- kiwisolver: 1.4.5
- lightning: 2.2.0
- lightning-utilities: 0.10.1
- markdown-it-py: 3.0.0
- markupsafe: 2.1.5
- matplotlib: 3.8.2
- mdurl: 0.1.2
- mpmath: 1.3.0
- multidict: 6.0.5
- networkx: 3.2.1
- numpy: 1.26.4
- nvidia-cublas-cu12: 12.1.3.1
- nvidia-cuda-cupti-cu12: 12.1.105
- nvidia-cuda-nvrtc-cu12: 12.1.105
- nvidia-cuda-runtime-cu12: 12.1.105
- nvidia-cudnn-cu12: 8.9.2.26
- nvidia-cufft-cu12: 11.0.2.54
- nvidia-curand-cu12: 10.3.2.106
- nvidia-cusolver-cu12: 11.4.5.107
- nvidia-cusparse-cu12: 12.1.0.106
- nvidia-nccl-cu12: 2.19.3
- nvidia-nvjitlink-cu12: 12.3.101
- nvidia-nvtx-cu12: 12.1.105
- overrides: 7.7.0
- packaging: 23.2
- pandas: 2.2.0
- pillow: 10.2.0
- pip: 24.0
- psutil: 5.9.8
- pygments: 2.17.2
- pynvml: 11.4.1
- pyparsing: 3.1.1
- python-dateutil: 2.8.2
- pytorch-lightning: 2.2.0
- pytz: 2024.1
- pyupgrade: 3.15.0
- pyyaml: 6.0.1
- rdkit: 2023.9.4
- reorder-python-imports: 3.12.0
- requests: 2.31.0
- rich: 13.7.0
- ruff: 0.2.1
- scalene: 1.5.34
- scikit-learn: 1.4.0
- scipy: 1.12.0
- seaborn: 0.13.2
- setuptools: 69.0.3
- six: 1.16.0
- sympy: 1.12
- threadpoolctl: 3.2.0
- tokenize-rt: 5.2.0
- torch: 2.2.0
- torch-geometric: 2.4.0
- torchmetrics: 1.3.0.post0
- tqdm: 4.66.1
- triton: 2.2.0
- typing-extensions: 4.9.0
- tzdata: 2023.4
- urllib3: 2.2.0
- wheel: 0.42.0
- yarl: 1.9.4
System:
- OS: Linux
- architecture:
  - 64bit
  - ELF
- processor: x86_64
- python: 3.10.12
- release: 6.5.0-14-generic
- version: Arbitrary lr_scheduler? #14~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov 20 18:15:30 UTC 2

More info

No response

cc @Borda

The text was updated successfully, but these errors were encountered:

awaelchli · 2024-02-11T00:06:16Z

@ramon-adalia-lmd Would you be able to provide a code example that produces this error?

ramon-adalia-lmd · 2024-02-11T08:09:08Z

Uppon further testing, it does not seem to be specific to 2.2.0, but the bug is still there. Interestingly, the bug happens every other time I run the code. Here is an example script that triggers it:

import torch
from lightning import LightningModule
from lightning import Trainer
from lightning.pytorch.loggers import CSVLogger
from torchmetrics import MeanSquaredError


class Model(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(100, 1)
        self.train_mse = MeanSquaredError()
        self.val_mse = MeanSquaredError()

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = torch.nn.functional.mse_loss(y_hat, y)
        self.train_mse.update(y_hat, y)
        return loss

    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        self.val_mse.update(y_hat, y)

    def on_train_epoch_end(self):
        self.log("train_mse", self.train_mse.compute(), prog_bar=True)
        self.train_mse.reset()

    def on_validation_epoch_end(self):
        self.log("val_mse", self.val_mse.compute(), prog_bar=True)
        self.val_mse.reset()

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters())


def main():
    X = torch.randn(100, 100)
    y = torch.randn(100, 1)
    Z = torch.randn(100, 100)
    t = torch.randn(100, 1)
    train_loader = torch.utils.data.DataLoader(list(zip(X, y)))
    val_loader = torch.utils.data.DataLoader(list(zip(Z, t)))
    model = Model()
    trainer = Trainer(
        max_epochs=5, logger=CSVLogger("test_logs", name="test", version=0)
    )
    trainer.fit(model, train_loader, val_loader)


if __name__ == "__main__":
    main()

Run it the first time: works. The second time: error. The third time: works. And so on...

awaelchli · 2024-02-11T13:58:35Z

@ramon-adalia-lmd Ah ok, this is because you fixed the version to 0, so the second time it gets executed, the file is already there, the logger tries to append to the file but sees different keys. In this case, the best we can do I think is delete the file from the beginning if it exists, since the user explicitly asks version=x to be overwritten.

ramon-adalia-lmd added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Feb 8, 2024

github-actions bot added the ver: 2.2.x label Feb 8, 2024

awaelchli added help wanted Open to be worked on logger: csv repro needed The issue is missing a reproducible example and removed needs triage Waiting to be triaged by maintainers labels Feb 11, 2024

awaelchli removed the repro needed The issue is missing a reproducible example label Feb 11, 2024

awaelchli added this to the 2.2.x milestone Feb 11, 2024

awaelchli mentioned this issue Feb 11, 2024

Fix CSVLogger trying to append to file from previous run in same version folder #19446

Merged

awaelchli closed this as completed in #19446 Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression: `CSVLogger` not working on version 2.2.0 #19432

Regression: `CSVLogger` not working on version 2.2.0 #19432

ramon-adalia-lmd commented Feb 8, 2024 •

edited by github-actions bot

Loading

awaelchli commented Feb 11, 2024

ramon-adalia-lmd commented Feb 11, 2024

awaelchli commented Feb 11, 2024

Regression: CSVLogger not working on version 2.2.0 #19432

Regression: CSVLogger not working on version 2.2.0 #19432

Comments

ramon-adalia-lmd commented Feb 8, 2024 • edited by github-actions bot Loading

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

awaelchli commented Feb 11, 2024

ramon-adalia-lmd commented Feb 11, 2024

awaelchli commented Feb 11, 2024

Regression: `CSVLogger` not working on version 2.2.0 #19432

Regression: `CSVLogger` not working on version 2.2.0 #19432

ramon-adalia-lmd commented Feb 8, 2024 •

edited by github-actions bot

Loading