Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use prefetch() after batching // image_dataset.py #18160

Merged
merged 1 commit into from
Jun 12, 2023

Conversation

Frightera
Copy link
Contributor

I think it is better use prefetch() after batching. In a pipeline we would want the next batch ready (t+1) while processing current batch (t).

@gbaned gbaned requested a review from fchollet May 29, 2023 16:22
@sachinprasadhs sachinprasadhs added the keras-team-review-pending Pending review by a Keras team member. label Jun 7, 2023
@qlzh727 qlzh727 self-requested a review June 8, 2023 17:07
@sachinprasadhs sachinprasadhs removed the keras-team-review-pending Pending review by a Keras team member. label Jun 8, 2023
@qlzh727
Copy link
Member

qlzh727 commented Jun 9, 2023

Adding @jsimsa for the performance best practice here.

From https://www.tensorflow.org/guide/data_performance#prefetching, I didn't see any suggestion about whether the prefetch should be applied before batching or not.

The current approach will prefetch unbatched data (with autotune), unless the prefetched data can't fill the next batch, then I don't see any big difference here.

@Frightera
Copy link
Contributor Author

@qlzh727

Quoting from official documentation:

Most dataset input pipelines should end with a call to prefetch. This allows later elements to be prepared while the current element is being processed. This often improves latency and throughput, at the cost of using additional memory to store prefetched elements.

I think it is better to prefetch the batches rather than single elements, so we can have next batch(es) ready while processing current batch in the training process.

@qlzh727
Copy link
Member

qlzh727 commented Jun 9, 2023

Thanks for the reference, let's wait for some inputs from tf.data side.

@qlzh727
Copy link
Member

qlzh727 commented Jun 9, 2023

Chatted with @wilsingosti offline for this issue: (copied from the chat)

Prefetch is useful in 2 cases:

  1. If the upstream of Prefetch has variance, inserting Prefetch can help to buffer enough elements to reduce variance

  2. if there is a long sequence of synchronous transformations, eg. sequential map, shuffle, batch, inserting a Prefetch somewhere in the sequence can break the sequence such that the original transformations can be pipelined.

In this particular case, since the input of batch is a ParallelMap, inserting a Prefetch between batch and ParallelMap will not help case 2 because ParallelMap is an asynchronous op (which means that it has its own buffer). Inserting it after batch can help especially if the output of batch is another synchronous transform.

@qlzh727
Copy link
Member

qlzh727 commented Jun 9, 2023

Based on the assessment above, I am approving this PR. Thanks @Frightera for the contribution.

@google-ml-butler google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Jun 9, 2023
@Frightera
Copy link
Contributor Author

Thanks @qlzh727, those statements were also helpful.

copybara-service bot pushed a commit that referenced this pull request Jun 12, 2023
Imported from GitHub PR #18160

I think it is better use `prefetch()` after batching. In a pipeline we would want the next batch ready (t+1) while processing current batch (t).
Copybara import of the project:

--
ac4c8ea by Kaan Bıçakcı <[email protected]>:

Move prefetch() to end

Merging this change closes #18160

FUTURE_COPYBARA_INTEGRATE_REVIEW=#18160 from Frightera:update_image_dataset ac4c8ea
PiperOrigin-RevId: 539677136
@copybara-service copybara-service bot merged commit f1eca72 into keras-team:master Jun 12, 2023
@Frightera Frightera deleted the update_image_dataset branch June 13, 2023 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review ready to pull Ready to be merged into the codebase size:S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants