Refactor InputTransform and DataModule #1233

karthikrangasai · 2022-03-15T16:39:20Z

What does this PR do?

Resolves discussion from #1166

At present, the InputTransform for every DataModule is being passed for every stage even though InputTransform has specific methods to differentiate the stage at which the certain transform runs. This also creates 4 different instances of the class in each input (train, val, test, predict), with different methods to be run.

This PR aims to change the aforementioned by making the DataModule as the owner of the InputTransform because all the class does it to generate collate_fn for the dataloaders and the implementation for the on_after_batch_transfer callback.

Thus a single instance of the InputTransform class, present in the DataModule, can resolve the required Callables for every stage and the appropriate dataloader collate_fn and on_after_batch_transfer functions are created in the DataLoader's __init__ method.

This also relieves the Input class from having to take care of the InputTransform.

TL;DR

Previous API

dm = XYZTask_DataModule.from_xyz(
    train_file=train_file,
    val_file=val_file,
    test_file=test_file,
    predict_file=predict_file,
    train_transform=InputTransform,
    val_transform=InputTransform,
    test_transform=InputTransform,
    predict_transform=InputTransform,
    transform_kwargs=transform_kwargs,
)

# Implementation
class XYZTask_DataModule(DataModule):
    
    @classmethod
    def from_xyz(
        cls,
        train_file=train_file,
        val_file=val_file,
        test_file=test_file,
        predict_file=predict_file,
        train_transform=InputTransform,
        val_transform=InputTransform,
        test_transform=InputTransform,
        predict_transform=InputTransform,
        transform_kwargs=transform_kwargs,
        input_cls=Input,
    ):
        return cls(
            input_cls(RunningStage.TRAINING, train_file, transform=train_transform, **transform_kwargs),
            input_cls(RunningStage.VALIDATING, val_file, transform=val_transform, **transform_kwargs),
            input_cls(RunningStage.TESTING, test_file, transform=test_transform, **transform_kwargs),
            input_cls(RunningStage.PREDICTING, predict_file, transform=predict_transform, **transform_kwargs),
        )

New API

dm = XYZTask_DataModule.from_xyz(
    train_file=train_file,
    val_file=val_file,
    test_file=test_file,
    predict_file=predict_file,
    transform=InputTransform,
    transform_kwargs=transform_kwargs,
)

# Implementation
class XYZTask_DataModule(DataModule):
    
    @classmethod
    def from_xyz(
        cls,
        train_file=train_file,
        val_file=val_file,
        test_file=test_file,
        predict_file=predict_file,
        transform=InputTransform,
        transform_kwargs=transform_kwargs,
        input_cls=Input,
    ):
        return cls(
            input_cls(RunningStage.TRAINING, train_file),
            input_cls(RunningStage.TRAINING, val_file),
            input_cls(RunningStage.TRAINING, test_file),
            input_cls(RunningStage.TRAINING, predict_file),
            transform=transform,
            **transform_kwargs
        )

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests? [not needed for typos/docs]
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

…lass.

codecov · 2022-03-15T16:42:46Z

Codecov Report

Merging #1233 (e10cd5a) into master (9001449) will decrease coverage by 0.05%.
The diff coverage is 88.88%.

@@            Coverage Diff             @@
##           master    #1233      +/-   ##
==========================================
- Coverage   91.11%   91.05%   -0.06%     
==========================================
  Files         285      286       +1     
  Lines       12791    12764      -27     
==========================================
- Hits        11654    11622      -32     
- Misses       1137     1142       +5

Flag	Coverage Δ
unittests	`91.05% <88.88%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
flash/audio/speech_recognition/data.py	`100.00% <ø> (ø)`
flash/image/detection/backbones.py	`93.54% <ø> (+19.12%)`	⬆️
flash/image/segmentation/data.py	`100.00% <ø> (ø)`
flash/image/style_transfer/data.py	`100.00% <ø> (ø)`
flash/pointcloud/detection/data.py	`92.00% <0.00%> (ø)`
flash/text/question_answering/data.py	`100.00% <ø> (ø)`
flash/pointcloud/segmentation/data.py	`81.81% <33.33%> (ø)`
flash/core/integrations/icevision/adapter.py	`87.91% <68.18%> (-5.34%)`	⬇️
flash/core/integrations/icevision/wrappers.py	`75.00% <75.00%> (ø)`
flash/core/integrations/icevision/backbones.py	`92.30% <83.33%> (-2.14%)`	⬇️
... and 33 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9001449...e10cd5a. Read the comment docs.

…his PR and not torch 11.

…ansforms

karthikrangasai · 2022-03-21T07:06:02Z

Hello all,

I have updated the PR for almost all tasks.

I need some help for:

ServeInput and serve related parts.
Pointcloud tasks retrieve their collate_fn when model is instantiated. The updated _collate_fn method in InputTransform class accepts an extra argument (stage: RunningStage) and I wasn't sure of how to set that up in a cleaner manner.
Same issue as above with the image.ObjectDetectionTask which fails for the test_init_train testing function. The extra argument is causing an error. Need some help in updating/changing this test.

Thanks

ethanwharris

Awesome work! Let's do the following:

revert the changes around moving collation from seq2seq, qa, forecasting, etc. (these can be discussed separately and done in individual PRs if we decide)
update the examples / docs to use the now recommended API (that is, not using transform_kwargs)

Regarding ServeInput, it may cause issues having the datamodule own the transforms the way it's currently laid out. But I think we just need to refactor the serving stuff to apply the transforms in the same way as the datamodule.

ethanwharris · 2022-03-21T13:31:41Z

flash/audio/speech_recognition/input_transform.py

+
+
+@dataclass
+class SpeechRecognitionInputCollateTransform(InputTransform):


I would prefer to keep this as it was and then consider moving it in a seperate PR

I was initially planning to make it a separate PR, but the tests for these tasks were failing. So I ended up implementing them all here.

I guess we just need to resolve the collate function from the models correctly (or there are issues with it). The main reason to leave this for the future is that I'm not sure this is where this functionality should end up. To me it's weird that the transform can include collation since that's not really a transform. I also think we should avoid a situation where users have to provide the backbone in two places (since only the model should know the backbone really).

flash/core/data/data_module.py

flash/core/data/io/input_transform.py

flash/tabular/forecasting/input_transform.py

tests/text/question_answering/test_model.py

tchaton · 2022-03-21T13:41:21Z

flash/audio/speech_recognition/input_transform.py

+    pad_to_multiple_of: Optional[int] = None
+    pad_to_multiple_of_labels: Optional[int] = None
+
+    def __post_init__(self):


Would be awesome to investigate AugLy Augmentation for Speech: https://github.com/facebookresearch/AugLy/tree/main/augly/audio

flash/core/data/data_module.py

flash/text/question_answering/input_transform.py

flash/text/question_answering/model.py

This reverts commit 3dc9260.

This reverts commit c16f3cb.

krshrimali

Awesome work, @karthikrangasai - lots of important refactoring done in this PR. Thank you! We are getting there, just left a few comments (minor nits, and a few questions for my knowledge). Please let me know if you have any questions.

Just taking a note for the future, we should also update the examples since the API has now changed (tests are also failing I guess for this reason). My suggestion would be, that we create a PR to fix examples - and once that is ready, then only merge this and the other PR for examples. Just to make sure that examples are never out of date. But open to discussion, of course :) cc: @ethanwharris @Borda

flash/core/data/io/input_transform.py

krshrimali · 2022-03-24T05:52:34Z

flash/core/data/io/input_transform.py

+    if on_device:
+        return input_transform._identity, collate
+    return collate, input_transform._identity


Can you please add a comment in this function on what it does?

Also, for the future, we should add a comment on what on_device means and does, in the _InputTransformProcessorV2 class.

cc: @ethanwharris

flash/core/data/io/input_transform.py

flash/core/integrations/icevision/adapter.py

flash_examples/flash_components/custom_data_loading.py

ethanwharris

Awesome, LGTM 😃

tchaton · 2022-03-26T00:15:16Z

Awesome work @karthikrangasai !

karthikrangasai added 9 commits March 15, 2022 19:18

Make DataModule own InputTransform and remove references from Input c…

f57081e

…lass.

Port Audio DataModules to newer version.

5fb0ab5

Ported Graph DataModules to new version.

5d53726

Port PointCloud DataModules to new version.

8e71311

Port Tabular DataModules to new version.

1a5a92e

Port Template DataModule to new version.

a23a870

Port Text DataModules to new verion.

d6e3b2a

Port Video DataModules to new version.

fcec0e4

Port Image DataModules to new version.

b5d883e

karthikrangasai requested review from ethanwharris, Borda, tchaton, ananyahjha93, justusschock, carmocca and kaushikb11 as code owners March 15, 2022 16:39

Fix tests. Also limiting torch version to check test failure due to t…

c450439

…his PR and not torch 11.

ethanwharris added the Refactor (Functional) label Mar 15, 2022

ethanwharris added this to the 0.8.0 milestone Mar 15, 2022

karthikrangasai added 10 commits March 16, 2022 12:36

Update Video InputTransform Implementations.

df93359

Merge remote-tracking branch 'upstream/master' into refactor/input_tr…

ea3ee8a

…ansforms

Remove pytorch version limitation.

1ae5cfc

Fixed Graph Tests and updated documentation.

dd9fd30

Fix Audio tests and update documentation.

c0ebddb

Fix input to an Audio test.

10f6494

Fix Tabular tests and updated documentation.

792674a

Fix Text tests and updated documentation.

3a77d87

Fix imports errors caused to unsafe imports.

41e0f78

Fix METADATA values in QuestionAnsweringInputTransform.

46212b6

ethanwharris reviewed Mar 21, 2022

View reviewed changes

tchaton reviewed Mar 21, 2022

View reviewed changes

karthikrangasai added 11 commits March 21, 2022 20:07

Revert "Update docs."

c16f3cb

This reverts commit 3dc9260.

Revert "Revert "Update docs.""

d8c1a2d

This reverts commit c16f3cb.

Revert changes that were mase Task wise.

5a21821

Fix some collate_fn issues.

e245e93

Small fixes and possible serve fix.

d4bb1c3

collate_fn fixes and provide missing serve args.

e97d0c7

Fix key issues in _transforms dict.

9b8bc27

Fix image detection tests.

d5b2f5f

Update docstrings for all tasks.

b3f0017

Merge branch 'master' into refactor/input_transforms

5f926b8

Update custom_data_loading example in flash examples.

eddb94b

karthikrangasai requested review from ethanwharris and tchaton and removed request for ananyahjha93 March 23, 2022 09:02

krshrimali approved these changes Mar 24, 2022

View reviewed changes

krshrimali and others added 5 commits March 24, 2022 11:38

Merge branch 'master' into refactor/input_transforms

bcb13eb

Changes based on review.

03a9174

Fix key issues in transforms dict of the InputTransform object.

99910ca

Merge branch 'master' into refactor/input_transforms

ebad5e1

Merge branch 'master' into refactor/input_transforms

d77b9ff

ethanwharris approved these changes Mar 25, 2022

View reviewed changes

karthikrangasai and others added 3 commits March 25, 2022 16:16

Pythonic dict retreival change, and remove Properties inheritence test.

b09dd26

Fix / refactor icevision wrapping for DDP

82015e9

Trigger CI

e10cd5a

ethanwharris merged commit 6da53fe into Lightning-Universe:master Mar 25, 2022

ligaz mentioned this pull request Mar 28, 2022

TypeError: __init__() got an unexpected keyword argument 'transform' #1255

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor InputTransform and DataModule #1233

Refactor InputTransform and DataModule #1233

karthikrangasai commented Mar 15, 2022 •

edited

Loading

codecov bot commented Mar 15, 2022 •

edited

Loading

karthikrangasai commented Mar 21, 2022

ethanwharris left a comment •

edited

Loading

ethanwharris Mar 21, 2022

karthikrangasai Mar 21, 2022

ethanwharris Mar 21, 2022

tchaton Mar 21, 2022

krshrimali left a comment •

edited

Loading

krshrimali Mar 24, 2022

ethanwharris left a comment

tchaton commented Mar 26, 2022



		@dataclass
		class SpeechRecognitionInputCollateTransform(InputTransform):

Refactor InputTransform and DataModule #1233

Refactor InputTransform and DataModule #1233

Conversation

karthikrangasai commented Mar 15, 2022 • edited Loading

What does this PR do?

Previous API

New API

Before submitting

PR review

Did you have fun?

codecov bot commented Mar 15, 2022 • edited Loading

Codecov Report

karthikrangasai commented Mar 21, 2022

ethanwharris left a comment • edited Loading

Choose a reason for hiding this comment

ethanwharris Mar 21, 2022

Choose a reason for hiding this comment

karthikrangasai Mar 21, 2022

Choose a reason for hiding this comment

ethanwharris Mar 21, 2022

Choose a reason for hiding this comment

tchaton Mar 21, 2022

Choose a reason for hiding this comment

krshrimali left a comment • edited Loading

Choose a reason for hiding this comment

krshrimali Mar 24, 2022

Choose a reason for hiding this comment

ethanwharris left a comment

Choose a reason for hiding this comment

tchaton commented Mar 26, 2022

karthikrangasai commented Mar 15, 2022 •

edited

Loading

codecov bot commented Mar 15, 2022 •

edited

Loading

ethanwharris left a comment •

edited

Loading

krshrimali left a comment •

edited

Loading