Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unity][Op] Introduce call_inplace_packed as a counterpart to call_tir_inplace #15878

Merged
merged 2 commits into from
Oct 9, 2023

Conversation

slyubomirsky
Copy link
Contributor

As discussed in the October 3, 2023, community meeting, for cases like the KVCache in LLaMa in mlc-llm, it might be desirable to have a way to have packed calls inside dataflow blocks where the packed calls simply "consume" their output and mutate it in-place (performing no other side effects). While this is not pure, if the other conditions for doing an in-place call are met (no other live aliases of the consumed value), this is safe, though users should use caution if applying it. (The in-place update analyses could also be used to make sure the conditions are met.) This PR introduces a call_inplace_packed operator to signal this intent, though unlike call_tir_inplace, this operator really only conveys the intent—it is up to the implementation of the packed call to behave as it advertises.

@psrivas2 raised the idea of having a different name, which may be worth discussing. We could consider using the word "consume" in it or referring to linear typing if we do not feel "inplace" conveys the intent well enough.

if (call->op == call_inplace_packed_op_) {
// call_inplace_packed has its own attrs so we don't pass those down
auto ret = Call(call->args[0], Array<Expr>(call->args.begin() + 1, call->args.end()),
tvm::Attrs(), call->sinfo_args);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, we never use the attrs for call_pure_packed, so we could simply drop those and combine those cases

Comment on lines +330 to +333
assert result[0] == tvm_arr_a
assert (result[0].numpy() == sum).all()
assert result[1] != tvm_arr_a and result[1] != tvm_arr_b
assert (result[1].numpy() == sum).all()
Copy link
Contributor Author

@slyubomirsky slyubomirsky Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want, we can check reference equality/disequality at run time as part of the implementation of this op. I'm not sure it would be worth it, though.

@masahi
Copy link
Member

masahi commented Oct 5, 2023

it might be desirable to have a way to have packed calls inside dataflow blocks

Very strong +1 for this need. This week I integrated a KV cache update kernel, and faced a problem of how to call this op when the module being defined is already under a dataflow scope. To workaround this, I ended up returning in-place updated arguments from the kernel to make it look like a pure operation from outside, and call it via call_pure_packed. Although it gets the job done, it feels very awkward to return in-place updated arguments.

I wished that I could simply call _ = call_packed(...) inside a dataflow block and have transform passes not remove such dummy binding.

@slyubomirsky
Copy link
Contributor Author

Were you getting a well-formed error?

@masahi
Copy link
Member

masahi commented Oct 5, 2023

No, assuming that such check occurs automatically during relax.build().

@slyubomirsky
Copy link
Contributor Author

slyubomirsky commented Oct 5, 2023

Good to know. I'm glad it gets caught, though I'm curious how LLaMa is able to build and run despite also having packed calls in dataflow blocks.

@masahi
Copy link
Member

masahi commented Oct 5, 2023

I mean, the KV cache update operation is disguised as a pure operation in the model. So it does build and run, that's no problem. But semantically it is not quite right and I also need to return the updated cache from the model (even though it is in-place updated) so that passes that remove unused bindings in a df block don't accidentally remove the KV cache update op.

@slyubomirsky
Copy link
Contributor Author

Glad to know that this op sounds like it will address that problem.

Copy link
Member

@yongwww yongwww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tqchen tqchen merged commit dd57556 into apache:unity Oct 9, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants