LighterWriter - write/save predictions #40

ibro45 · 2023-02-01T20:57:26Z

A callback for writing predictions.

I looked into Lightning's BasePredictionWriter. it's a very basic and not really useful Callback, so instead of inheriting from it, I thought that it's cleaner to start from a clean Callback.

Right now, it's only possible to write the predictions according to their dataloader/batch index. Will look into having an optional get_names() method in aDatasets, which, if present, would replace the indices as names.

ibro45 · 2023-02-02T04:44:38Z

Right now, write_as allows you to specify only one type - but what if pred is a list or dict of different types? For example, a model that outputs a segmentation mask and a description of the segmented object.

Maybe extend write_as so that it can be:

a single type - pred's values, no matter if only one or multiple, will be saved as that type.
list of types - positionally correspond to each element of a pred list.
dict of types - correspond to a pred dict per key.

@kbressem @surajpaib

kbressem · 2023-02-02T09:34:21Z

I'd like it to be a writer base class and multiple sub-classes, such as SegmentationWriter, ClassificationWriter, etc. Cramming all into one class, will be to much. Should a model have multiple outputs, we can chain multiple writers.

surajpaib · 2023-02-02T14:35:29Z

I very much agree with @kbressem on this.

I don't know if we want to think about every possible use case independently. I think that's why lightning also provided a base prediction writer.

We can ship our default writer which works for common use cases, as you have implemented. For more specific use cases like you mentioned, I say let the users handle these, there can be way too much variability. For instance, I save my predictions in a csv format, and I would use my custom writer over the lighter writer. I'm sure there would be more such scenarios from users.

ibro45 · 2023-02-02T14:48:31Z

Even if we do per-task writers, we still have a problem of mapping them to each output in multi-value preds.
pl.Trainer(callbacks=...) accepts a single Callback or a list of Callbacks. If it's a list you can't just map them to a pred of type list, as there may be other callbacks there, like pl.callbacks.ModelCheckpoint. Since it doesn't accept a dict, you also can't match it to a pred of type dict. And trying to fix that would be extremely hacky.

Moreover, I don't think we should split LighterWriter per task (SegmentationWriter, ClassificationWriter, ...). Reasons:

This diverges from LighterSystem's unified behavior. There is no LighterClassificationSystem or similar. Everything is based on what kind of data is being passed through the model and criterion. Obviously, there are some assumptions, but they're pretty straightforward, and only for some wild use cases, you *may* need to pay closer attention.
Similarly, LighterLogger unifies multiple loggers; you just select which loggers to use by turning the flags on or off. I don't think it would've made much sense there to have our own WandbLogger and TensorboardLogger when the only difference between them is the log call (tensorboard.add_scalar(), wandb.log()`,...)
LighterWriter itself is pretty simple; separating it feels unnecessary, and a lot of behavior will be shared without adding much on top of it. E.g., For both SegmentationWriter and ClassificationWriter, you'd also like to have the option to save the tensor as-is to a .pt file. The main thing that would be separated into different classes would be the _write() method, and I think it's just too simple and unnecessary to separate. Obviously, it'll be slightly more complex if it supports mapping different write_as types to different outputs in pred, but that's about it.
How many tasks are there? How many combos of those? It'd make more sense to have a LighterWriter that can map the type per output of pred if multiple are passed, essentially covering all kinds of task setups. The data types are finite and straightforward to account for, but the task, their combos, and the kind of preds returned are not.

Regarding the CSV format, that will be supported here, too, and that's going to be very simple. The idea of the LighterWriter is to cover as much as possible most simply, so that the user does not have to write their own Writer Callback. They will still be able to do so if they wish, obviously, as they can with the logger and the system, overriding the existing or writing it from scratch.

I don't know if we want to think about every possible use case independently. I think that's why lightning also provided a base prediction writer.

But that is the essence of lighter; we try to provide as much as possible out-of-the-box and, hopefully, in a simple way. Lightning gives you the building blocks, so you have to code it up. We want to eliminate the coding part as much as possible and let it all/most be config-defined.

surajpaib · 2023-02-02T20:35:30Z

Maybe I was a bit muddy in my response and misread what @kbressem mentioned. What I meant to say was we don't need to think of handling every possible case in the LighterWriter. Of course, that would be the default writer and handle a bunch of use cases. lighter won't have any other Writer than the default one, but users can define their own subclasses using the LighterWriter to support any specific use cases they might have, and we should try to support that in the cleanest way possible.

To address your previous comment,
I think having write_as mimic the format of output predictions should work very well, i.e, if output is a dict, then write_as is also a dict with types. Of course, with some simplifications such as broadcasting types.

But that is the essence of lighter; we try to provide as much as possible out-of-the-box and, hopefully, in a simple way. Lightning gives you the building blocks, so you have to code it up. We want to eliminate the coding part as much as possible and let it all/most be config-defined.

100% agree, but this also leads us to some complications

I'm still of the opinion that we shouldn't try to handle all use cases in LighterWriter because this easily leads to a very large and bloated config with too many flags.

If I have multi-output predictions, what if I want to write them in a specific folder per item? which is probably more sensible than having a flat output structure. This will need another flag unless we assume that multi-output will also be folder structured.
For medical imaging use cases, we want images saved as NRRD/NIFTI. This means more functionality is added to the LighterWriter and gets into more specific behavior, which deviates from the more general approach of lighter.
Users may want arbitrary types in the write_as. For example, some may want to save predicted output in a json format which is common for a lot of CV tasks such as detection.

We can ofcourse try to support as many use cases as we think of, but we also need to consider being easily extendable.

surajpaib · 2023-02-08T01:01:24Z

Made some aesthetic comments for now. I will test the PR with some code tomorrow. Overall looks pretty good, seems like you thought of everything that's possible (including 24 FPS video 😉) !

We should add some test cases for such multi-use case features.

It will help the person submitting the PR to test for a range of conditions
Reviewers can easily understand all the functionality offered, and so can the users when needed.

Maybe we make that a requirement from now on? It doesn't have to be something very crafty since people would already have test cases while implementing functionality, it should be easy to have it added to the PR. It will automatically increase coverage as well.
@ibro45 @kbressem Wdy think?

lighter/callbacks/writer.py

lighter/callbacks/utils.py

lighter/callbacks/writer.py

surajpaib · 2023-03-02T00:06:06Z

@ibro45 Are we ready to merge this?

ibro45 · 2023-03-02T00:31:44Z

Writing tests

ibro45 · 2023-03-02T12:28:50Z

Alright if you agree let's merge it, so that we can merge other PRs, and I'll make another one for tests

ibro45 added 4 commits February 1, 2023 15:47

Add first prototype of LighterWriter

87c2baa

style

db4508b

Merge branch 'main' into prediction

744afb2

add DDP support to LighterWriter, improve DDP in LighterLogger

68559d6

ibro45 added 2 commits February 4, 2023 12:23

add support for multi-type write_as

a7078c4

Refactor to Base, File, and Table Writer

3df027a

ibro45 force-pushed the prediction branch from d621a85 to 3df027a Compare February 7, 2023 02:42

improve _parse_write_as, add more docstrings

d9573e9

surajpaib marked this pull request as ready for review February 8, 2023 00:14

small fix

24319ef

surajpaib requested changes Feb 8, 2023

View reviewed changes

ibro45 force-pushed the prediction branch from aa36c9b to 24319ef Compare February 18, 2023 14:28

ibro45 added 2 commits February 28, 2023 12:33

write_to to write_interval

adbfe54

Reorganize writers

cc224fe

Add init to writer package

fe38afa

ibro45 added 2 commits March 2, 2023 09:23

Merge branch 'main' into prediction

2ef324c

format

3ffbff1

ibro45 merged commit 4f3590e into main Mar 2, 2023

ibro45 deleted the prediction branch March 2, 2023 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LighterWriter - write/save predictions #40

LighterWriter - write/save predictions #40

ibro45 commented Feb 1, 2023

ibro45 commented Feb 2, 2023 •

edited

Loading

kbressem commented Feb 2, 2023

surajpaib commented Feb 2, 2023

ibro45 commented Feb 2, 2023 •

edited

Loading

surajpaib commented Feb 2, 2023

surajpaib commented Feb 8, 2023

surajpaib commented Mar 2, 2023

ibro45 commented Mar 2, 2023

ibro45 commented Mar 2, 2023

LighterWriter - write/save predictions #40

LighterWriter - write/save predictions #40

Conversation

ibro45 commented Feb 1, 2023

ibro45 commented Feb 2, 2023 • edited Loading

kbressem commented Feb 2, 2023

surajpaib commented Feb 2, 2023

ibro45 commented Feb 2, 2023 • edited Loading

surajpaib commented Feb 2, 2023

surajpaib commented Feb 8, 2023

surajpaib commented Mar 2, 2023

ibro45 commented Mar 2, 2023

ibro45 commented Mar 2, 2023

ibro45 commented Feb 2, 2023 •

edited

Loading

ibro45 commented Feb 2, 2023 •

edited

Loading