Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'SequenceWrapperMapDataPipe' object has no attribute 'prefetch . when using MultiProcessingReadingService #1143

Closed
One-sixth opened this issue Apr 27, 2023 · 2 comments

Comments

@One-sixth
Copy link

🐛 Describe the bug

from torchdata.datapipes.map import SequenceWrapper
from torchdata.dataloader2 import DataLoader2
from torchdata.dataloader2 import MultiProcessingReadingService


class SimpleDataset:
    def __len__(self):
        return 1000

    def __getitem__(self, i):
        if i >= len(self): raise StopIteration()
        return i


if __name__ == '__main__':
    ds = SimpleDataset()
    warp_ds = SequenceWrapper(ds)

    rs = MultiProcessingReadingService(num_workers=2)
    # no error
    # dl = DataLoader2(warp_ds)
    # raise error
    dl = DataLoader2(warp_ds, reading_service=rs)

    for x in dl:
        print(x)

Versions

torchdata nightly

@NivekT
Copy link
Contributor

NivekT commented Apr 27, 2023

The issue is that DataLoader2 attempts to apply Prefetch, even though Prefetcher only works with IterDataPipe.

A temporary work around is to use .shuffle() or .to_iterdatapipe() prior to passing it into DataLoader2.

I see a few potential solution to this:

  1. Always convert MapDataPipe to IterDataPipe within DataLoader2
    • I think this will not cause any loss in functionality because the end goal of DataLoader2 is to return the dataset in some order (e.g. sequential or shuffled). If anyone has any concern with this, feel free to comment.
  2. Add prefetch support for MapDataPipe

@One-sixth
Copy link
Author

@NivekT Thank you reply. I have successfully used it.
If the error message is more easier to understand, or if it is automatically converted to IterDataPipe, it would be even better.

facebook-github-bot pushed a commit that referenced this issue May 1, 2023
…fault (#1146)

Summary:
Pull Request resolved: #1146

Fixes #1143

I believe since `DataLoader2` would always call `iter(map_data_pipe)` at some point anyway, it should make no different to perform the conversion with `to_iter_datapipe()` within `__init__`.

This will also allow prefetchers to be attached to the DataPipe.

Please let me know if this may be BC-breaking.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D45452068

Pulled By: NivekT

fbshipit-source-id: 608dc000735b16641e41e442553fe9283a1b8829
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants