Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement InProcessReadingService #1139

Closed
wants to merge 11 commits into from
Closed

Conversation

ejguan
Copy link
Contributor

@ejguan ejguan commented Apr 21, 2023

Fixes #1107
Fixes #720
Fixes #616

Changes

  • Implement InProcessReadingService (Willing to take any suggestion on naming)
    • Control shuffle and sharding (noop)
    • Add support to pause/resume/limit
  • Make InProcessReadingService as the default reading_service to DataLoader2.
    • Then, reading_service always has a value, and remove the logic of reading_service is None.
  • Modify MultiProcessingReadingService
    • When num_workers=0, raise a warning

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 21, 2023
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan ejguan requested a review from NivekT April 21, 2023 17:08
torchdata/dataloader2/reading_service.py Outdated Show resolved Hide resolved
torchdata/dataloader2/dataloader2.py Outdated Show resolved Hide resolved
worker_init_fn: Optional[Callable[[DataPipe, WorkerInfo], DataPipe]] = None,
worker_reset_fn: Optional[Callable[[DataPipe, WorkerInfo, SeedGenerator], DataPipe]] = None,
):
if num_workers == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if num_workers == 1, what is the benefit of using this instead of SingleProcessRS? Will it actually be faster (probably slower)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using num_workers=1 would create a single worker process that is dedicated to loading data. And, the main process would work on the model training. The use case might be preventing the data part drain the CPU resource.

torchdata/dataloader2/reading_service.py Show resolved Hide resolved
torchdata/dataloader2/reading_service.py Outdated Show resolved Hide resolved
multiprocessing_context: Optional[str] = None,
worker_prefetch_cnt: int = 10,
main_prefetch_cnt: int = 10,
worker_init_fn: Optional[Callable[[DataPipe, WorkerInfo], DataPipe]] = None,
worker_reset_fn: Optional[Callable[[DataPipe, WorkerInfo, SeedGenerator], DataPipe]] = None,
) -> None:
assert num_workers > 0, "Please use `InProcessReadingService` for num_workers=0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am making a BC-breaking change here. MPRS won't accept num_workers=0 anymore.

In previous commits, I have been trying to add a __new__ to return InProcessReadingService when num_workers=0. However, it doesn't work well with pickling due to clone and mp.

@ejguan ejguan changed the title Implement SingleProcessingReadingService Implement InProcessReadingService Apr 24, 2023
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan ejguan requested a review from NivekT April 24, 2023 16:28
@@ -26,15 +26,15 @@ def _random_fn(data):
Used to validate the randomness of subprocess-local RNGs are set deterministically.
"""
py_random_num = random.randint(0, 2 ** 32)
np_random_num = np.random.randint(0, 2 ** 32)
np_random_num = np.random.randint(0, 2 ** 32 - 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow this problem wasn't revealed until this PR

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM! Let's also add it to this documentation table here:

https://pytorch.org/data/beta/dataloader2.html#readingservice

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Comment on lines 456 to 486
if hasattr(dp, "resume") and callable(dp.resume):
dp.resume()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to skip QueueWrapper here as well.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ejguan merged this pull request in 8f9d123.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
3 participants