Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a logging callback that unifies multiple loggers #15

Merged
merged 13 commits into from
Jan 26, 2023

Conversation

ibro45
Copy link
Collaborator

@ibro45 ibro45 commented Jan 18, 2023

Aims to implement a LighterLogger as a PL Callback rather than a PL Logger, as the latter's implementation is very messy.

@ibro45 ibro45 marked this pull request as ready for review January 25, 2023 21:15
@ibro45
Copy link
Collaborator Author

ibro45 commented Jan 25, 2023

Callback-based Logger with support for Tensorboard and Wandb, includes

  • loss
  • metrics
  • input, pred, target data. Either as an image or a single scalar.
  • DDP support

What's next:

  • support for other input, pred, target data types. E.g. logging strings, multiple scalars, histograms etc. This will need projects that deal with such cases. We should tackle that as we encounter such projects IMO. Will create an issue for it.
  • config (hparams) logging. This needs the YAML config of the run to be dumped before the instantiation of the logger. Probably will be dumped by cli.py into a temp file and then moved into the log_dir of the Logger and logged to tensorboard, wandb, etc. This will be a separate PR, opening an issue for this too.
  • implement other loggers. MLflow or other, the idea is to have them all, but again, something to add as we start using them as this is not a priority right now.
  • In DDP, each rank should log its own log file. This will require getting all the logging.Loggers of all libraries (e.g. Monai or PyTorch Lightning) to work with our loguru.Logger so that everything can be logged properly. Will create an issue.

Ready for review.

Edit: If you think any of these should be tackled with this PR, let me know. Once this PR is merged, I'll open the mentioned issues.

Copy link
Contributor

@kbressem kbressem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overall like it and think it is well thought through. Only think I am unsure about is the frequent use of sys.exit() instead of raising errors. This makes try/except impossible, does not allow for Tracebacks and makes testing hard.

Tests is also something I would recommend for this PR. At least initialization and basic mock of logging to file.

# instead of batch steps, which can be problematic when using gradient accumulation.
self.global_step_counter = {"train": 0, "val": 0, "test": 0}

def setup(self, trainer: Trainer, pl_module: LighterSystem, stage: str) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pl_module-> Pytorch Lightning Module. But type is LighterSystem. Maybe call it lighter_system instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not possible as PyTorch Lightning automatically passes the argument with that name. I type hinted is as a LighterSystem (which still is a LightningModule) because this logger requires some parts that are specific to LighterSystem, for example:

metrics = getattr(pl_module, f"{mode}_metrics")

lighter/callbacks/logger.py Show resolved Hide resolved
lighter/callbacks/logger.py Outdated Show resolved Hide resolved
lighter/callbacks/logger.py Show resolved Hide resolved
lighter/callbacks/logger.py Outdated Show resolved Hide resolved
lighter/callbacks/logger.py Outdated Show resolved Hide resolved
outputs["loss"] = loss

# Metrics
# Get the torchmetrics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work well with monai? Here other metrics are used... Or should we specify that Lighter needs torchmetrics?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its a good idea also to have other metrics supported. But maybe a separate issue for later?

Copy link
Collaborator Author

@ibro45 ibro45 Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics have to be torchmetrics:

train_metrics (Optional[Union[Metric, List[Metric]]], optional): training metric(s).

I wouldn't support any other format of metric. I think that accounting for all the possible ways of doing it would quickly create a mess here:

outputs["metrics"] = metrics.compute()

If a user wants a metric that's not available in torchmetrics, they should implement it as a torchmetrics.Metric themself. It's pretty easy https://torchmetrics.readthedocs.io/en/stable/pages/implement.html.

However, from a quick glance it appears that Monai metrics have a unified API too - we could write a metric wrapper that turns a Monai metric into a TorchMetric metric. It'd go to our contrib repo.

What do you guys think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds perfect then! Since torchmetrics handle so many things internally, I agree that this is the best approach

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I agree. But then the logger should raise an Exception, if a metric is passed, that is not a torchmetrics.Metric. Given this PR is merged, shall I raise an issue for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check would be in the LighterSystem. But if we're doing it for metrics, we should then type-check everything:

def __init__(self,

That becomes quite a hassle, I guess this is when I wish there was runtime type checking in Python.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@surajpaib wdy think?

lighter/callbacks/logger.py Show resolved Hide resolved
@surajpaib
Copy link
Collaborator

@ibro45 and I had a few discussions about the sys.exit strategy before.

In the logger.py, I think since all of the sys.exit() are very well-defined error cases, having tracebacks might not even be necessary imo.

However, @ibro45 the ability to get defined values returned from the code might be very useful especially for testing when we can build test cases accordingly. Maybe its a good idea to have a more consistent standard across the entire library that allows handling both well-defined and ill-defined error cases.

@surajpaib
Copy link
Collaborator

@Ibro Testing the branch right now, are the .log files expected to be empty?

@ibro45
Copy link
Collaborator Author

ibro45 commented Jan 26, 2023

@surajpaib @kbressem both of you brought up sys.exit() independently, pretty convinced now that we should change it. That would be a separate PR though as I'll have to go through the whole repo replacing them.

Any preferences/guidelines on how we should handle them?

@ibro45
Copy link
Collaborator Author

ibro45 commented Jan 26, 2023

@surajpaib

@Ibro Testing the branch right now, are the .log files expected to be empty?

That one was confusing for me too, I remember it logging it first and then it wouldn't. Can you check if the file was populated with some logs at the end?

The logs only contain the Lighter logs though. No logs from other libraries. Tried to intercept them all into our loguru Logger, but it didn't work correctly with DDP - all ranks ended up logging at the same time. I'd like to resolve this issue together with DDP per-rank log files. Probably best to comment it out for the moment. What do you think?

@surajpaib
Copy link
Collaborator

@surajpaib

@Ibro Testing the branch right now, are the .log files expected to be empty?

That one was confusing for me too, I remember it logging it first and then it wouldn't. Can you check if the file was populated with some logs at the end?

The logs only contain the Lighter logs though. No logs from other libraries. Tried to intercept them all into our loguru Logger, but it didn't work correctly with DDP - all ranks ended up logging at the same time. I'd like to resolve this issue together with DDP per-rank log files. Probably best to comment it out for the moment. What do you think?

Works for me, lets not have this atm and do it properly all together. Including logging for raising errors and exceptions

@surajpaib
Copy link
Collaborator

Great work overall with this! Checked the logs on wandb, looks very nice!

@ibro45
Copy link
Collaborator Author

ibro45 commented Jan 26, 2023

Don't mind me not formatting the code, I decided not to do it as it'll be done by Suraj's CI anyway after it's merged.

Also, I'll write a docstring for the __init__ when this PR is ready to merge.

@surajpaib surajpaib merged commit 08cb305 into main Jan 26, 2023
@surajpaib surajpaib deleted the logging-handler branch January 26, 2023 05:48
@kbressem
Copy link
Contributor

@surajpaib @kbressem both of you brought up sys.exit() independently, pretty convinced now that we should change it. That would be a separate PR though as I'll have to go through the whole repo replacing them.

Any preferences/guidelines on how we should handle them?

I would prefer to raise Exception, maybe even custom Exceptions. This has the advantage, that you can test for the code if it raises correctly.

see here for pytest: https://docs.pytest.org/en/7.1.x/reference/reference.html#pytest.raises

or here for unittests: https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertRaises

Btw. MONAI uses unittest, here I saw pytest. I do not have a preference. I thing pytest is sometimes easier, but using unittest would make integration to monai easier, in case we want to push some of our stuff to their library.

@ibro45
Copy link
Collaborator Author

ibro45 commented Jan 26, 2023

I would prefer to raise Exception, maybe even custom Exceptions. This has the advantage, that you can test for the code if it raises correctly.

I'll have to read up on what's the most recommended way to do it.

... but using unittest would make integration to monai easier, in case we want to push some of our stuff to their library.

I don't think that anything in lighter core will be going in the monai direction, as lighter is meant for general purposes. You can do monai-related projects, but also anything else. We rely on their config system mainly because it's amazing and because the devs are great and super responsive, not because we want lighter to be completely tied to monai. The fact that, as a result of it, we can bundle our *medical imaging projects* easily just sweetens the deal (and hopefully we'll have many projects that rely on that). But, the goal of Lighter is to be as agnostic as possible, enabling you to work with all kinds of data.

That said, I'm sure that some of the lighter.contrib will go into that direction, as we'll have parts that are related to Monai, like custom Monai transforms.

Regarding pytest vs unittest, I have much to learn about testing, so I have no preference or opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants