-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use prefetch() after batching // image_dataset.py #18160
Use prefetch() after batching // image_dataset.py #18160
Conversation
Adding @jsimsa for the performance best practice here. From https://www.tensorflow.org/guide/data_performance#prefetching, I didn't see any suggestion about whether the prefetch should be applied before batching or not. The current approach will prefetch unbatched data (with autotune), unless the prefetched data can't fill the next batch, then I don't see any big difference here. |
Quoting from official documentation:
I think it is better to prefetch the batches rather than single elements, so we can have next batch(es) ready while processing current batch in the training process. |
Thanks for the reference, let's wait for some inputs from tf.data side. |
Chatted with @wilsingosti offline for this issue: (copied from the chat) Prefetch is useful in 2 cases:
In this particular case, since the input of batch is a ParallelMap, inserting a Prefetch between batch and ParallelMap will not help case 2 because ParallelMap is an asynchronous op (which means that it has its own buffer). Inserting it after batch can help especially if the output of batch is another synchronous transform. |
Based on the assessment above, I am approving this PR. Thanks @Frightera for the contribution. |
Thanks @qlzh727, those statements were also helpful. |
Imported from GitHub PR #18160 I think it is better use `prefetch()` after batching. In a pipeline we would want the next batch ready (t+1) while processing current batch (t). Copybara import of the project: -- ac4c8ea by Kaan Bıçakcı <[email protected]>: Move prefetch() to end Merging this change closes #18160 FUTURE_COPYBARA_INTEGRATE_REVIEW=#18160 from Frightera:update_image_dataset ac4c8ea PiperOrigin-RevId: 539677136
I think it is better use
prefetch()
after batching. In a pipeline we would want the next batch ready (t+1) while processing current batch (t).