Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LighterWriter - write/save predictions #40

Merged
merged 13 commits into from
Mar 2, 2023
Merged

LighterWriter - write/save predictions #40

merged 13 commits into from
Mar 2, 2023

Conversation

ibro45
Copy link
Collaborator

@ibro45 ibro45 commented Feb 1, 2023

A callback for writing predictions.

I looked into Lightning's BasePredictionWriter. it's a very basic and not really useful Callback, so instead of inheriting from it, I thought that it's cleaner to start from a clean Callback.

Right now, it's only possible to write the predictions according to their dataloader/batch index. Will look into having an optional get_names() method in aDatasets, which, if present, would replace the indices as names.

@ibro45
Copy link
Collaborator Author

ibro45 commented Feb 2, 2023

Right now, write_as allows you to specify only one type - but what if pred is a list or dict of different types? For example, a model that outputs a segmentation mask and a description of the segmented object.

Maybe extend write_as so that it can be:

  • a single type - pred's values, no matter if only one or multiple, will be saved as that type.
  • list of types - positionally correspond to each element of a pred list.
  • dict of types - correspond to a pred dict per key.

@kbressem @surajpaib

@kbressem
Copy link
Contributor

kbressem commented Feb 2, 2023

I'd like it to be a writer base class and multiple sub-classes, such as SegmentationWriter, ClassificationWriter, etc. Cramming all into one class, will be to much. Should a model have multiple outputs, we can chain multiple writers.

@surajpaib
Copy link
Collaborator

I very much agree with @kbressem on this.

I don't know if we want to think about every possible use case independently. I think that's why lightning also provided a base prediction writer.

We can ship our default writer which works for common use cases, as you have implemented. For more specific use cases like you mentioned, I say let the users handle these, there can be way too much variability. For instance, I save my predictions in a csv format, and I would use my custom writer over the lighter writer. I'm sure there would be more such scenarios from users.

@ibro45
Copy link
Collaborator Author

ibro45 commented Feb 2, 2023

Even if we do per-task writers, we still have a problem of mapping them to each output in multi-value preds.
pl.Trainer(callbacks=...) accepts a single Callback or a list of Callbacks. If it's a list you can't just map them to a pred of type list, as there may be other callbacks there, like pl.callbacks.ModelCheckpoint. Since it doesn't accept a dict, you also can't match it to a pred of type dict. And trying to fix that would be extremely hacky.

Moreover, I don't think we should split LighterWriter per task (SegmentationWriter, ClassificationWriter, ...). Reasons:

  1. This diverges from LighterSystem's unified behavior. There is no LighterClassificationSystem or similar. Everything is based on what kind of data is being passed through the model and criterion. Obviously, there are some assumptions, but they're pretty straightforward, and only for some wild use cases, you *may* need to pay closer attention.
    Similarly, LighterLogger unifies multiple loggers; you just select which loggers to use by turning the flags on or off. I don't think it would've made much sense there to have our own WandbLogger and TensorboardLogger when the only difference between them is the log call (tensorboard.add_scalar(), wandb.log()`,...)
  2. LighterWriter itself is pretty simple; separating it feels unnecessary, and a lot of behavior will be shared without adding much on top of it. E.g., For both SegmentationWriter and ClassificationWriter, you'd also like to have the option to save the tensor as-is to a .pt file. The main thing that would be separated into different classes would be the _write() method, and I think it's just too simple and unnecessary to separate. Obviously, it'll be slightly more complex if it supports mapping different write_as types to different outputs in pred, but that's about it.
  3. How many tasks are there? How many combos of those? It'd make more sense to have a LighterWriter that can map the type per output of pred if multiple are passed, essentially covering all kinds of task setups. The data types are finite and straightforward to account for, but the task, their combos, and the kind of preds returned are not.

Regarding the CSV format, that will be supported here, too, and that's going to be very simple. The idea of the LighterWriter is to cover as much as possible most simply, so that the user does not have to write their own Writer Callback. They will still be able to do so if they wish, obviously, as they can with the logger and the system, overriding the existing or writing it from scratch.

I don't know if we want to think about every possible use case independently. I think that's why lightning also provided a base prediction writer.

But that is the essence of lighter; we try to provide as much as possible out-of-the-box and, hopefully, in a simple way. Lightning gives you the building blocks, so you have to code it up. We want to eliminate the coding part as much as possible and let it all/most be config-defined.

@surajpaib
Copy link
Collaborator

Maybe I was a bit muddy in my response and misread what @kbressem mentioned. What I meant to say was we don't need to think of handling every possible case in the LighterWriter. Of course, that would be the default writer and handle a bunch of use cases. lighter won't have any other Writer than the default one, but users can define their own subclasses using the LighterWriter to support any specific use cases they might have, and we should try to support that in the cleanest way possible.

To address your previous comment,
I think having write_as mimic the format of output predictions should work very well, i.e, if output is a dict, then write_as is also a dict with types. Of course, with some simplifications such as broadcasting types.

But that is the essence of lighter; we try to provide as much as possible out-of-the-box and, hopefully, in a simple way. Lightning gives you the building blocks, so you have to code it up. We want to eliminate the coding part as much as possible and let it all/most be config-defined.

100% agree, but this also leads us to some complications

I'm still of the opinion that we shouldn't try to handle all use cases in LighterWriter because this easily leads to a very large and bloated config with too many flags.

  • If I have multi-output predictions, what if I want to write them in a specific folder per item? which is probably more sensible than having a flat output structure. This will need another flag unless we assume that multi-output will also be folder structured.
  • For medical imaging use cases, we want images saved as NRRD/NIFTI. This means more functionality is added to the LighterWriter and gets into more specific behavior, which deviates from the more general approach of lighter.
  • Users may want arbitrary types in the write_as. For example, some may want to save predicted output in a json format which is common for a lot of CV tasks such as detection.

We can ofcourse try to support as many use cases as we think of, but we also need to consider being easily extendable.

@surajpaib surajpaib marked this pull request as ready for review February 8, 2023 00:14
@surajpaib
Copy link
Collaborator

Made some aesthetic comments for now. I will test the PR with some code tomorrow. Overall looks pretty good, seems like you thought of everything that's possible (including 24 FPS video 😉) !

We should add some test cases for such multi-use case features.

  1. It will help the person submitting the PR to test for a range of conditions
  2. Reviewers can easily understand all the functionality offered, and so can the users when needed.

Maybe we make that a requirement from now on? It doesn't have to be something very crafty since people would already have test cases while implementing functionality, it should be easy to have it added to the PR. It will automatically increase coverage as well.
@ibro45 @kbressem Wdy think?

lighter/callbacks/writer.py Outdated Show resolved Hide resolved
lighter/callbacks/writer.py Outdated Show resolved Hide resolved
lighter/callbacks/utils.py Outdated Show resolved Hide resolved
lighter/callbacks/writer.py Outdated Show resolved Hide resolved
lighter/callbacks/writer.py Outdated Show resolved Hide resolved
@surajpaib
Copy link
Collaborator

@ibro45 Are we ready to merge this?

@ibro45
Copy link
Collaborator Author

ibro45 commented Mar 2, 2023

Writing tests

@ibro45
Copy link
Collaborator Author

ibro45 commented Mar 2, 2023

Alright if you agree let's merge it, so that we can merge other PRs, and I'll make another one for tests

@ibro45 ibro45 merged commit 4f3590e into main Mar 2, 2023
@ibro45 ibro45 deleted the prediction branch March 2, 2023 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants