Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataPipe] Adding documentation for Prefetcher #835

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/torchdata.datapipes.iter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ A miscellaneous set of DataPipes with different functionalities.
LengthSetter
MapToIterConverter
OnDiskCacheHolder
Prefetcher
RandomSplitter
ShardingFilter

Expand Down
22 changes: 22 additions & 0 deletions torchdata/datapipes/iter/util/prefetcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,28 @@ def __init__(self, source_datapipe, buffer_size):

@functional_datapipe("prefetch")
class PrefetcherIterDataPipe(IterDataPipe):
"""
Prefetches elements from the source DataPipe and puts them into a buffer (functional name: ``prefetch``).
Prefetching performs the operations (e.g. I/O, computations) of the DataPipes up to this one ahead of time
and stores the result in the buffer, ready to be consume by the subsequent DataPipe. It has no effect aside
from getting the sample ready ahead of time.

This is used by ``PrototypeMultiProcessingReadingService`` when the arguments
``prefetch_worker`` (for prefetching at each worker process) or
``prefetch_mainloop`` (for prefetching at the moain loop) are greater than 0.

Beyond the built-in use cases, this can be useful to put after I/O DataPipes that have
expensive I/O operations (e.g. takes a long time to request a file from a remote server).

Args:
source_datapipe: IterDataPipe from which samples are prefetched
buffer_size: the size of the buffer which stores the prefetched samples

Example:
>>> from torchdata.datapipes.iter import IterableWrapper
>>> dp = IterableWrapper(file_paths).open_files().prefetch(5)
"""

def __init__(self, source_datapipe, buffer_size: int = 10):
self.source_datapipe = source_datapipe
if buffer_size <= 0:
Expand Down