-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make _prepare_sample non blocking and pin memory of CPU input buffers #2207
Conversation
Looks good - unfortunately we cannot use pinned memory if we are in wsl, can we add a check for that? |
Just added. Please check |
@hanzhi713 A quick question: do you happen to evaluate the performance impact of this change? Just wondering, because the input preparation part only takes 1~3% of the overall running time in our profiling results. |
About 1% for 70B tp4 bs=64. Just a minor optimization. Merge at your discretion 😃 |
Co-authored-by: Antoni Baum <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanzhi713 LGTM! Thanks for the PR!
Hide
_prepare_sample
latency with model execution since it looks like it doesn't depend on model forward.We can use a copy stream for h2ds in
_prepare_sample
, but we probably don't really need to because these h2ds are very short.cc @Yard1