dataloader interface #8438

cdfox · 2021-07-15T23:17:56Z

cdfox
Jul 15, 2021

I'm doing something in Lightning where I've concluded that a good approach is to make my own dataloader. I think I've nearly got it working, but one thing that was a roadblock is that there isn't an interface for it like there is for Pytorch's Dataset (at least, that I know of). So you have to kind of reverse engineer what PTL is expecting from a dataloader. Maybe it would be a good idea to add a dataloader interface class that would document PTL's expectations for the objects that are passed to the Trainer's .fit().

tchaton · 2021-07-19T09:38:41Z

tchaton
Jul 19, 2021
Maintainer

Dear @cdfox,

Most of the condition for the DataLoader interface is there: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/data_loading.py#L172

Let me explain what Lightning does under the hook. Using dataloader.dict, it will extract samplers, dataset, etc... and replace sampler with DistributedSampler automatically if the user requested gpus > 1.

Here is the pseudo code:

dl_dict = dataloader.__dict__
parameters = inspect.signature(dataloader.__init__).paramaters
dataloader.__class__(*{k: v for k, v in dl_dict.items() if k in parameters})

For your custom DataLoader to work with Lightning, you need to make sure we can re-instantiate it from its attributes.

Check out this test for inspiration: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/tests/trainer/test_data_loading.py#L100.

If you still have issues after investigating the code and test, ping me directly.

Best,
T.C

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataloader interface #8438

{{title}}

Replies: 1 comment

{{title}}

Select a reply

dataloader interface #8438

cdfox Jul 15, 2021

Replies: 1 comment

tchaton Jul 19, 2021 Maintainer

cdfox
Jul 15, 2021

tchaton
Jul 19, 2021
Maintainer