Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: CSVLogger not working on version 2.2.0 #19432

Closed
ramon-adalia-lmd opened this issue Feb 8, 2024 · 3 comments · Fixed by #19446
Closed

Regression: CSVLogger not working on version 2.2.0 #19432

ramon-adalia-lmd opened this issue Feb 8, 2024 · 3 comments · Fixed by #19446
Labels
bug Something isn't working help wanted Open to be worked on logger: csv ver: 2.2.x
Milestone

Comments

@ramon-adalia-lmd
Copy link

ramon-adalia-lmd commented Feb 8, 2024

Bug description

CSVLogger throws the following error when used in version 2.2.0:

  File "/usr/lib/python3.10/csv.py", line 157, in writerows
    return self.writer.writerows(map(self._dict_to_list, rowdicts))
  File "/usr/lib/python3.10/csv.py", line 149, in _dict_to_list
    raise ValueError("dict contains fields not in fieldnames: "
ValueError: dict contains fields not in fieldnames: 'train_hit_rate', 'train_precision'

Going back to 2.1.4 solves the issue.

What version are you seeing the problem on?

master

How to reproduce the bug

No response

Error messages and logs

  File "/usr/lib/python3.10/csv.py", line 157, in writerows
    return self.writer.writerows(map(self._dict_to_list, rowdicts))
  File "/usr/lib/python3.10/csv.py", line 149, in _dict_to_list
    raise ValueError("dict contains fields not in fieldnames: "
ValueError: dict contains fields not in fieldnames: 'train_hit_rate', 'train_precision'

Environment

Current environment
  • CUDA:
    • GPU: None
    • available: False
    • version: 12.1
  • Lightning:
    • lightning: 2.2.0
    • lightning-utilities: 0.10.1
    • pytorch-lightning: 2.2.0
    • torch: 2.2.0
    • torch-geometric: 2.4.0
    • torchmetrics: 1.3.0.post0
  • Packages:
    • aiohttp: 3.9.3
    • aiosignal: 1.3.1
    • async-timeout: 4.0.3
    • attrs: 23.2.0
    • certifi: 2024.2.2
    • charset-normalizer: 3.3.2
    • classify-imports: 4.2.0
    • cloudpickle: 3.0.0
    • contourpy: 1.2.0
    • cycler: 0.12.1
    • filelock: 3.13.1
    • fonttools: 4.48.1
    • frozenlist: 1.4.1
    • fsspec: 2024.2.0
    • idna: 3.6
    • jinja2: 3.1.3
    • joblib: 1.3.2
    • kiwisolver: 1.4.5
    • lightning: 2.2.0
    • lightning-utilities: 0.10.1
    • markdown-it-py: 3.0.0
    • markupsafe: 2.1.5
    • matplotlib: 3.8.2
    • mdurl: 0.1.2
    • mpmath: 1.3.0
    • multidict: 6.0.5
    • networkx: 3.2.1
    • numpy: 1.26.4
    • nvidia-cublas-cu12: 12.1.3.1
    • nvidia-cuda-cupti-cu12: 12.1.105
    • nvidia-cuda-nvrtc-cu12: 12.1.105
    • nvidia-cuda-runtime-cu12: 12.1.105
    • nvidia-cudnn-cu12: 8.9.2.26
    • nvidia-cufft-cu12: 11.0.2.54
    • nvidia-curand-cu12: 10.3.2.106
    • nvidia-cusolver-cu12: 11.4.5.107
    • nvidia-cusparse-cu12: 12.1.0.106
    • nvidia-nccl-cu12: 2.19.3
    • nvidia-nvjitlink-cu12: 12.3.101
    • nvidia-nvtx-cu12: 12.1.105
    • overrides: 7.7.0
    • packaging: 23.2
    • pandas: 2.2.0
    • pillow: 10.2.0
    • pip: 24.0
    • psutil: 5.9.8
    • pygments: 2.17.2
    • pynvml: 11.4.1
    • pyparsing: 3.1.1
    • python-dateutil: 2.8.2
    • pytorch-lightning: 2.2.0
    • pytz: 2024.1
    • pyupgrade: 3.15.0
    • pyyaml: 6.0.1
    • rdkit: 2023.9.4
    • reorder-python-imports: 3.12.0
    • requests: 2.31.0
    • rich: 13.7.0
    • ruff: 0.2.1
    • scalene: 1.5.34
    • scikit-learn: 1.4.0
    • scipy: 1.12.0
    • seaborn: 0.13.2
    • setuptools: 69.0.3
    • six: 1.16.0
    • sympy: 1.12
    • threadpoolctl: 3.2.0
    • tokenize-rt: 5.2.0
    • torch: 2.2.0
    • torch-geometric: 2.4.0
    • torchmetrics: 1.3.0.post0
    • tqdm: 4.66.1
    • triton: 2.2.0
    • typing-extensions: 4.9.0
    • tzdata: 2023.4
    • urllib3: 2.2.0
    • wheel: 0.42.0
    • yarl: 1.9.4
  • System:
    • OS: Linux
    • architecture:
      • 64bit
      • ELF
    • processor: x86_64
    • python: 3.10.12
    • release: 6.5.0-14-generic
    • version: Arbitrary lr_scheduler? #14~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov 20 18:15:30 UTC 2

More info

No response

cc @Borda

@ramon-adalia-lmd ramon-adalia-lmd added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Feb 8, 2024
@awaelchli
Copy link
Contributor

@ramon-adalia-lmd Would you be able to provide a code example that produces this error?

@awaelchli awaelchli added help wanted Open to be worked on logger: csv repro needed The issue is missing a reproducible example and removed needs triage Waiting to be triaged by maintainers labels Feb 11, 2024
@ramon-adalia-lmd
Copy link
Author

Uppon further testing, it does not seem to be specific to 2.2.0, but the bug is still there. Interestingly, the bug happens every other time I run the code. Here is an example script that triggers it:

import torch
from lightning import LightningModule
from lightning import Trainer
from lightning.pytorch.loggers import CSVLogger
from torchmetrics import MeanSquaredError


class Model(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(100, 1)
        self.train_mse = MeanSquaredError()
        self.val_mse = MeanSquaredError()

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = torch.nn.functional.mse_loss(y_hat, y)
        self.train_mse.update(y_hat, y)
        return loss

    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        self.val_mse.update(y_hat, y)

    def on_train_epoch_end(self):
        self.log("train_mse", self.train_mse.compute(), prog_bar=True)
        self.train_mse.reset()

    def on_validation_epoch_end(self):
        self.log("val_mse", self.val_mse.compute(), prog_bar=True)
        self.val_mse.reset()

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters())


def main():
    X = torch.randn(100, 100)
    y = torch.randn(100, 1)
    Z = torch.randn(100, 100)
    t = torch.randn(100, 1)
    train_loader = torch.utils.data.DataLoader(list(zip(X, y)))
    val_loader = torch.utils.data.DataLoader(list(zip(Z, t)))
    model = Model()
    trainer = Trainer(
        max_epochs=5, logger=CSVLogger("test_logs", name="test", version=0)
    )
    trainer.fit(model, train_loader, val_loader)


if __name__ == "__main__":
    main()

Run it the first time: works. The second time: error. The third time: works. And so on...

@awaelchli awaelchli removed the repro needed The issue is missing a reproducible example label Feb 11, 2024
@awaelchli
Copy link
Contributor

@ramon-adalia-lmd Ah ok, this is because you fixed the version to 0, so the second time it gets executed, the file is already there, the logger tries to append to the file but sees different keys. In this case, the best we can do I think is delete the file from the beginning if it exists, since the user explicitly asks version=x to be overwritten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on logger: csv ver: 2.2.x
Projects
None yet
2 participants