Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LearningRateMonitor callback causes unexpected changes in step/epoch count with WandBLogger #13016

Closed
rbracco opened this issue May 9, 2022 · 7 comments
Labels
bug Something isn't working callback: lr monitor logger: wandb Weights & Biases

Comments

@rbracco
Copy link
Contributor

rbracco commented May 9, 2022

🐛 Bug

Using the LRMonitor callback breaks wandb logging by causing the step count to become incorrect. The image below shows varying epoch/step count while overfitting batches with No LR monitor, LearningRateMonitor(logging_interval="epoch") and LearningRateMonitor(logging_interval=None)
image

Neat-bee-446 does not use the LRMonitor callback and the ratio of step#:epoch# is 1:1
Devout-forest-447 adds as a callback LearningRateMonitor(logging_interval="epoch"), and the ratio of step#:epoch# becomes 2:1
Woven-dew-448 uses the callback LearningRateMonitor() and the ratio of step#:epoch# becomes 3:1

When not overfitting a batch, LearningRateMonitor() has the correct number of steps, but LearningRateMonitor(logging_interval="epoch") and LearningRateMonitor(logging_interval="step") still have double what they should

Also, this doesn't occur with tensorboard, only wandb.

Expected behavior

The logged step count should be correct and not adversely impacted by adding the LRMonitor callback.

Environment

  • PyTorch Lightning Version: 1.6.2
  • WandB Version: 0.12.5
  • PyTorch Version (e.g., 1.11:
  • Python version (e.g., 3.8.10):
  • OS (e.g., Linux): Linux
  • How you installed PyTorch: Pip

Additional context

cc @awaelchli @morganmcg1 @AyushExel @borisdayma @scottire @manangoel99 @rohitgr7

@rbracco rbracco added the needs triage Waiting to be triaged by maintainers label May 9, 2022
@tanmoyio
Copy link
Contributor

I would like to work on it

@akihironitta akihironitta added logger: wandb Weights & Biases callback: lr monitor and removed needs triage Waiting to be triaged by maintainers labels May 22, 2022
@akihironitta
Copy link
Contributor

akihironitta commented May 22, 2022

@rbracco Do you see the issue with other loggers in addition to wandb logger? Would you mind trying the default TensorBoardLogger and seeing if the issue is specific to the logger?

Also, it would be great and very helpful if you could share a script for reproduction.

@akihironitta akihironitta added the bug Something isn't working label May 22, 2022
@manangoel99
Copy link
Contributor

manangoel99 commented May 22, 2022

Hi @rbracco ! Engineer from W&B here. The likely reason for this is that pytorch-lightning has its own step counter and wandb has its own step counter as well. The wandb step counter is incremented whenever wandb.log is called which can be multiple times within the same lightning step.

For example:

def training_step(self, batch):
    self.log("a", 1)
    self.log("b", 2)

This is 1 training step but this would increment the wandb step twice.
Hence, adding the LRMonitor callback would add another log call which would further increment the wandb step.

@awaelchli
Copy link
Contributor

Hey @rbracco if what @manangoel99 is the case, you can change the x-axis to "trainer step" in the UI (top right) either globally or per plot.

@manangoel99
Copy link
Contributor

@rbracco Please let me know if the suggested solution works!

@rbracco
Copy link
Contributor Author

rbracco commented May 22, 2022

Thank you @awaelchli and @manangoel99 that worked great. Here is the updated chart once I change x-axis to trainer step. Closing!

image

@rbracco rbracco closed this as completed May 22, 2022
@awaelchli
Copy link
Contributor

yay :) Happy logging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working callback: lr monitor logger: wandb Weights & Biases
Projects
None yet
Development

No branches or pull requests

5 participants