-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TorchData #576
Comments
Yes, we definitely plan to support DataPipes in the future. When I first talked to the torchvision devs, they mentioned the plan to rework their datasets to use DataPipes. At the time, the DataPipe stuff seemed too bleeding edge for us to use directly, but things are definitely more stable now. I need to take another look and see just how different things are. |
That's awesome, really glad to hear! |
Still need to dig deeper into how TorchData works and how torchvision is planning to migrate to TorchData, but I think this will be a good opportunity to refactor. Right now, we have two class/subclass hierarchies:
I think it would make more sense to do something like:
If I understand correctly, this seems to be the intention of TorchData, to create pluggable pipelines for each file format to improve reuse and avoid code duplication. |
Another area where TorchData may help: we have a lot of datasets that can either be loaded from files on local disk, or streamed from a STAC API like on the Planetary Computer. I believe that was one of the main driving factors behind TorchData, so I'm interested to see if they've found a good way to have a single dataset that optionally loads from different sources like this. |
Looked through the documentation a bit. From what I can tell, my first comment is definitely supported by TorchData. I opened an issue to see if my second comment is/could be supported as well: pytorch/data#672 |
@adamjstewart One more format that might be good to support down the line is simple tar iterable format (like webdataset, only using torchdata). For your second comment, I wonder if you're looking for something like AIStore with torchdata loaders? https://pytorch.org/data/main/generated/torchdata.datapipes.iter.AISFileLoader.html#torchdata.datapipes.iter.AISFileLoader |
Seems like TorchData is dead: pytorch/data#1196 |
Hi, do you plan to support TorchData iterable-style and map-style datapipes in the future?
I ask since eventually the PyTorch DataLoader V2 will, "only be responsible for multiprocessing, distributed, and similar functionalities, not data processing logic. All data processing features, such as the shuffling and batching, will be moved out of DataLoader to DataPipe."
https://github.com/pytorch/data#frequently-asked-questions-faq
The text was updated successfully, but these errors were encountered: