Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLoader wrapping, re-instantiation and patching #10329

Closed
awaelchli opened this issue Nov 3, 2021 · 8 comments
Closed

DataLoader wrapping, re-instantiation and patching #10329

awaelchli opened this issue Nov 3, 2021 · 8 comments
Labels
bug Something isn't working fabric lightning.fabric.Fabric help wanted Open to be worked on priority: 2 Low priority task won't fix This will not be worked on

Comments

@awaelchli
Copy link
Contributor

awaelchli commented Nov 3, 2021

🐛 Bug

Recent changes to the Lite DataLoader in #10279 have not been properly reviewed and tested. Several clean ups need to be done and better tests need to be written.

The following issue exists:

To Reproduce

def test_setup_custom_dataloaders_new():

    class DataLoaderSubclass1(DataLoader):
        def __init__(self, attribute1, *args, **kwargs):
            # UNCOMMENT THE LINE BELOW, NO DIFFERENCE (EXPECTED)
            # self.attribute1 = attribute1
            super().__init__(*args, **kwargs)

    class DataLoaderSubclass2(DataLoaderSubclass1):
        def __init__(self, attribute1, attribute2, *args, **kwargs):
            # UNCOMMENT THE LINE BELOW, TEST PASSES
            # self.attribute2 = attribute2
            super().__init__(attribute1, *args, **kwargs)

    class LiteWithCustomDataLoader(LightningLite):
        def run(self):
            dataloader = DataLoaderSubclass2("attribute1", "attribute2", dataset=range(4), batch_size=2)
            lite_dataloader = self.setup_dataloaders(dataloader)
            
            # THIS ASSERTION PASSES
            assert lite_dataloader._dataloader.attribute1 == "attribute1"
            
            # THIS ASSERTION FAILS
            assert lite_dataloader._dataloader.attribute2 == "attribute2"

    LiteWithCustomDataLoader().run()

Expected behavior

According to the changes in the PR #10279, this is supposed to work and was the main motivation of it. But it does not because the tests were insufficient. This issue was found by me in #10297 while trying to clean up the code.

Environment

Lightning v1.5.0!

cc @tchaton @rohitgr7 @carmocca @justusschock @awaelchli

@awaelchli awaelchli added bug Something isn't working help wanted Open to be worked on fabric lightning.fabric.Fabric labels Nov 3, 2021
@awaelchli awaelchli added this to the 1.5.x milestone Nov 3, 2021
@awaelchli awaelchli added the priority: 0 High priority task label Nov 3, 2021
@awaelchli
Copy link
Contributor Author

awaelchli commented Nov 3, 2021

The issue exists because this wrapper here

https://github.com/PyTorchLightning/pytorch-lightning/blob/93caa7cda9277f86f17b1eab0793019aee8d40c2/pytorch_lightning/lite/wrappers.py#L106-L116

pops args and kwargs.
When we get to this line here

https://github.com/PyTorchLightning/pytorch-lightning/blob/93caa7cda9277f86f17b1eab0793019aee8d40c2/pytorch_lightning/lite/lite.py#L242-L244

we are re-instantiating (with the new init) but passing arguments by kwargs, strictly! This means the for loop here runs empty!

https://github.com/PyTorchLightning/pytorch-lightning/blob/93caa7cda9277f86f17b1eab0793019aee8d40c2/pytorch_lightning/lite/wrappers.py#L110-L114

What is the design flaw? As always, it is the fact that we are making too strong assumptions without documenting them and without testing for them.

@awaelchli
Copy link
Contributor Author

awaelchli commented Nov 5, 2021

@awaelchli
Copy link
Contributor Author

awaelchli commented Nov 23, 2021

One minor limitation of the patching is that the dataloader needs to allow write access to its dict, i.e. something like this would be a valid DataLoader but our patching would fail:

class MyDataLoader(DataLoader):

    def __init__(self, whatever, *args, **kwargs):
        # Lightning will not be allowed to set the attribute self.whatever here (read-only)
        super().__init__(*args, **kwargs)
        
    @property
    def whatever(self):
        return "foo"

We may want to handle this error case.

@carmocca
Copy link
Contributor

In practice, it would be instead:

class MyDataLoader(DataLoader):

    def __init__(self, whatever, *args, **kwargs):
        self._whatever = whatever
        super().__init__(*args, **kwargs)
        
    @property
    def whatever(self):
        return self._whatever

@carmocca carmocca removed this from the 1.5.x milestone Nov 24, 2021
@carmocca carmocca added priority: 2 Low priority task and removed priority: 1 Medium priority task labels Nov 24, 2021
@awaelchli
Copy link
Contributor Author

@carmocca @tchaton Do you remember what we wanted to do with "Open PR to PyTorch Geometric to improve support for their DataLoaders"?
Was it about removing these lines?
https://github.com/pyg-team/pytorch_geometric/blob/4bd1821b72039cb6beb96041bc9f18b2c4af36bb/torch_geometric/loader/dataloader.py#L74-L76

But we need to wait until at least 1.6 is out.

@carmocca
Copy link
Contributor

No, it was about addressing the failure which made @tchaton add the TypeError catch. But I'm not sure which implementation caused the error, most already drop the arguments from kwargs.

@tchaton
Copy link
Contributor

tchaton commented Nov 25, 2021

Hey @awaelchli,

I wonder if it would be this one: https://github.com/pyg-team/pytorch_geometric/blob/4bd1821b72039cb6beb96041bc9f18b2c4af36bb/torch_geometric/loader/graph_saint.py#L50 where the dataset isn't being poped from the kwargs, but this is maybe expected.

@stale
Copy link

stale bot commented Dec 26, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Dec 26, 2021
@carmocca carmocca closed this as completed Jan 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fabric lightning.fabric.Fabric help wanted Open to be worked on priority: 2 Low priority task won't fix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants