-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve ordering of ava_async APIs between multiple threads #2
Comments
I don't understand how this is possible. The communication channel is ordered and the dispatch is ordered, so reordering can only happen between operations on different threads and those reorderings are allowed for async calls since async calls are definitionally allowed to return before synchonizing with the underlying library. What am I missing in my understanding? |
The mistake happens when the multi-threaded program introduces some other assumptions. This error occurred when I was improving the support of more complicated TensorFlow workloads. The work is on another private repo and I'll cc you to a related commit email. |
Ok. I see. From a simplistic Lapis semantic perspective this means that cudaLaunchKernel should NOT be It sounds like you are saying that we can get the ordering we need by enforcing some additional ordering on the execution of functions in the server. What ordering are you planning to enforce? I worry that enforcing an ordering will have a surprisingly high performance cost. I suspect that we should find a way to encode the need for this ordering in the spec and only enforce the ordering in cases where it is needed. |
I agree that tracking and enforcing a *complete* ordering is a major
performance challenge.
However, the basic idea I suggested to Hangchen is to keep a counter per
stream. Every guest API call involving that stream increments the
counter. Every API call we forward includes a snapshot of the counter
value. The API server then ensures the same order observed by the guest
based on that counter value, delaying dispatch of any API calls with
discontiguous values.
…On 4/7/2020 8:05 PM, Arthur Peters wrote:
Ok. I see. From a simplistic Lapis semantic perspective this means
that cudaLaunchKernel should NOT be |ava_async| since it performs some
action before returning.
It sounds like you are saying that we can get the ordering we need by
enforcing some additional ordering on the execution of functions in
the server. What ordering are you planning to enforce? I worry that
enforcing an ordering will have a surprisingly high performance cost.
I suspect that we should find a way to encode the need for this
ordering in the spec and only enforce the ordering in cases where it
is needed.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ6DSLLM65JDFVSHBE6EBLRLPEVDANCNFSM4MDLWFUQ>.
|
In the current prototype,
ava_async
means the API returns right after it's sent to the API server, when the API server may not receive or execute the API yet.In the multi-threading scenario, the order of
ava_async
APIs being executed may be changed and wrong in the API server when there's inter-thread synchronization.To enforce the execution correctness,
ava_async
should preserve the ordering of those APIs between guest library and API server.Related issue: [wait for merge from ava-serverless].
The text was updated successfully, but these errors were encountered: