-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a logging callback that unifies multiple loggers #15
Conversation
Callback-based Logger with support for Tensorboard and Wandb, includes
What's next:
Ready for review. Edit: If you think any of these should be tackled with this PR, let me know. Once this PR is merged, I'll open the mentioned issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overall like it and think it is well thought through. Only think I am unsure about is the frequent use of sys.exit()
instead of raising errors. This makes try/except impossible, does not allow for Tracebacks and makes testing hard.
Tests is also something I would recommend for this PR. At least initialization and basic mock of logging to file.
# instead of batch steps, which can be problematic when using gradient accumulation. | ||
self.global_step_counter = {"train": 0, "val": 0, "test": 0} | ||
|
||
def setup(self, trainer: Trainer, pl_module: LighterSystem, stage: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pl_module
-> Pytorch Lightning Module. But type is LighterSystem
. Maybe call it lighter_system
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not possible as PyTorch Lightning automatically passes the argument with that name. I type hinted is as a LighterSystem (which still is a LightningModule) because this logger requires some parts that are specific to LighterSystem, for example:
lighter/lighter/callbacks/logger.py
Line 239 in 30b15fd
metrics = getattr(pl_module, f"{mode}_metrics") |
outputs["loss"] = loss | ||
|
||
# Metrics | ||
# Get the torchmetrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work well with monai? Here other metrics are used... Or should we specify that Lighter needs torchmetrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its a good idea also to have other metrics supported. But maybe a separate issue for later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics have to be torchmetrics
:
Line 70 in f7ac3d6
train_metrics (Optional[Union[Metric, List[Metric]]], optional): training metric(s). |
I wouldn't support any other format of metric. I think that accounting for all the possible ways of doing it would quickly create a mess here:
lighter/lighter/callbacks/logger.py
Line 242 in f7ac3d6
outputs["metrics"] = metrics.compute() |
If a user wants a metric that's not available in torchmetrics
, they should implement it as a torchmetrics.Metric
themself. It's pretty easy https://torchmetrics.readthedocs.io/en/stable/pages/implement.html.
However, from a quick glance it appears that Monai metrics have a unified API too - we could write a metric wrapper that turns a Monai metric into a TorchMetric metric. It'd go to our contrib
repo.
What do you guys think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds perfect then! Since torchmetrics
handle so many things internally, I agree that this is the best approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I agree. But then the logger should raise an Exception, if a metric is passed, that is not a torchmetrics.Metric
. Given this PR is merged, shall I raise an issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check would be in the LighterSystem. But if we're doing it for metrics, we should then type-check everything:
Line 20 in 08cb305
def __init__(self, |
That becomes quite a hassle, I guess this is when I wish there was runtime type checking in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@surajpaib wdy think?
@ibro45 and I had a few discussions about the In the However, @ibro45 the ability to get defined values returned from the code might be very useful especially for testing when we can build test cases accordingly. Maybe its a good idea to have a more consistent standard across the entire library that allows handling both well-defined and ill-defined error cases. |
@Ibro Testing the branch right now, are the |
Co-authored-by: Keno <[email protected]>
Co-authored-by: Keno <[email protected]>
Co-authored-by: Keno <[email protected]>
@surajpaib @kbressem both of you brought up Any preferences/guidelines on how we should handle them? |
That one was confusing for me too, I remember it logging it first and then it wouldn't. Can you check if the file was populated with some logs at the end? The logs only contain the Lighter logs though. No logs from other libraries. Tried to intercept them all into our loguru Logger, but it didn't work correctly with DDP - all ranks ended up logging at the same time. I'd like to resolve this issue together with DDP per-rank log files. Probably best to comment it out for the moment. What do you think? |
Works for me, lets not have this atm and do it properly all together. Including logging for raising errors and exceptions |
Great work overall with this! Checked the logs on |
Don't mind me not formatting the code, I decided not to do it as it'll be done by Suraj's CI anyway after it's merged. Also, I'll write a docstring for the |
I would prefer to raise Exception, maybe even custom Exceptions. This has the advantage, that you can test for the code if it raises correctly. see here for pytest: https://docs.pytest.org/en/7.1.x/reference/reference.html#pytest.raises or here for unittests: https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertRaises Btw. MONAI uses unittest, here I saw pytest. I do not have a preference. I thing pytest is sometimes easier, but using unittest would make integration to monai easier, in case we want to push some of our stuff to their library. |
I'll have to read up on what's the most recommended way to do it.
I don't think that anything in That said, I'm sure that some of the Regarding pytest vs unittest, I have much to learn about testing, so I have no preference or opinion. |
Aims to implement a LighterLogger as a PL Callback rather than a PL Logger, as the latter's implementation is very messy.