-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Unity][Op] Introduce call_inplace_packed
as a counterpart to call_tir_inplace
#15878
Conversation
if (call->op == call_inplace_packed_op_) { | ||
// call_inplace_packed has its own attrs so we don't pass those down | ||
auto ret = Call(call->args[0], Array<Expr>(call->args.begin() + 1, call->args.end()), | ||
tvm::Attrs(), call->sinfo_args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, we never use the attrs for call_pure_packed
, so we could simply drop those and combine those cases
assert result[0] == tvm_arr_a | ||
assert (result[0].numpy() == sum).all() | ||
assert result[1] != tvm_arr_a and result[1] != tvm_arr_b | ||
assert (result[1].numpy() == sum).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want, we can check reference equality/disequality at run time as part of the implementation of this op. I'm not sure it would be worth it, though.
Very strong +1 for this need. This week I integrated a KV cache update kernel, and faced a problem of how to call this op when the module being defined is already under a dataflow scope. To workaround this, I ended up returning in-place updated arguments from the kernel to make it look like a pure operation from outside, and call it via I wished that I could simply call |
Were you getting a well-formed error? |
No, assuming that such check occurs automatically during |
Good to know. I'm glad it gets caught, though I'm curious how LLaMa is able to build and run despite also having packed calls in dataflow blocks. |
I mean, the KV cache update operation is disguised as a pure operation in the model. So it does build and run, that's no problem. But semantically it is not quite right and I also need to return the updated cache from the model (even though it is in-place updated) so that passes that remove unused bindings in a df block don't accidentally remove the KV cache update op. |
Glad to know that this op sounds like it will address that problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
As discussed in the October 3, 2023, community meeting, for cases like the
KVCache
in LLaMa in mlc-llm, it might be desirable to have a way to have packed calls inside dataflow blocks where the packed calls simply "consume" their output and mutate it in-place (performing no other side effects). While this is not pure, if the other conditions for doing an in-place call are met (no other live aliases of the consumed value), this is safe, though users should use caution if applying it. (The in-place update analyses could also be used to make sure the conditions are met.) This PR introduces acall_inplace_packed
operator to signal this intent, though unlikecall_tir_inplace
, this operator really only conveys the intent—it is up to the implementation of the packed call to behave as it advertises.@psrivas2 raised the idea of having a different name, which may be worth discussing. We could consider using the word "consume" in it or referring to linear typing if we do not feel "inplace" conveys the intent well enough.