-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Runtime] Allow parameter sharing between modules #3489
Conversation
@tqchen @icemelon9 Could you please help to review this change? Thanks! |
Thanks @yongsun given this changes the spec and api, it would be helpful to hold a small RFC discussion. Sorry for the delayed action on this PR. Can you also open an RFC thread to discuss the |
@tqchen Thanks so much for your replying, I just created a RFC thread here, https://discuss.tvm.ai/t/new-graphruntime-api-to-share-parameter-between-modules/3284
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be built on top of #3416. You can setup data_entry_'s for the input as before, continue to setup the tvm ops. And then free up the storage in data_entry_[i] with NDArray() for i in inputs. Then you can probably just use set_input_zero_copy
as usually.
When deploying the model on an embedded device, we are trying to avoid the unnecessary memory allocation in the first place. Allocate additional 7~8MB is not affordable for our case, even it can be freed later. |
@tqchen How could I access the CI logs? Thanks! |
You can clock on the details link to get access the CI logs |
It turns out that the CI site is not accessible from our corp net :( While I can access from a public network. |
But you have to allocate those inputs outside anyway? You peak memory shouldn't change, as you free them during initialization of the runtime. And normally input won't comprise 7-8MB. Have you tried and tested this for your case? |
In our use case, the RNN model was split to two parts, the output layer in part2 will share the weights of embedding layer in part1. We will not allocate memory for output layer, but directly refer the weight matrix to the |
OK. I think you can have data_entry_ as NDArray() and setup tvm ops with correct shape info (which is available). Then use set_input_zero_copy to set input. |
Thanks for your suggestion, it makes lot sense to me. I'll update the PR later. |
@yinghai sorry for the late update, just got some time on this, please help to review. Thanks! |
@yinghai I'm sorry that the revised pull request seems overwrote your review comments. Could you add you comments again? Thanks! BTW, the CI failed on doc-gpu task, should not be impacted by this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the lazy init part and it makes sense to me. But where is the parameter sharing part?
The sharing will be done by using
|
I see. LGTM then. Could you add some unittest? You can take https://github.com/dmlc/tvm/blob/master/tests/cpp/relay_build_module_test.cc as an example. |
Sure, I'll need to learn how to build a relay module in C++ api, and bind it to a json graph with |
@yinghai unittest case added and passed, please help to review. Thanks! |
@tqchen it's looking good to me. Could you do a pass? |
@tqchen could you please hep to review? Thanks! |
@yinghai is possible you may give the approval? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check whether all NDArrays have been allocated before run in the GraphRuntime?
src/runtime/graph/graph_runtime.cc
Outdated
std::vector<uint32_t> lazy_init_entries; | ||
for (auto const& name : attrs_.lazy_init_input) { | ||
int in_idx = GetInputIndex(name); | ||
CHECK_GE(in_idx, 0) << "input \"" << name << "\"does not exist!"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK_GE(in_idx, 0) << "input \"" << name << "\"does not exist!"; | |
CHECK_GE(in_idx, 0) << "input \"" << name << "\" does not exist!"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix in next rev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of checking if all NDArrays have been allocated before "run", since the set_input_zero_copy
will directly modify the DLTensor's data in op_args_
, the original unallocated NDArray instance in data_entry_
will remain unallocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set_input_zero_copy
doesn't mean that NDArray data ptr is nullptr
. You need to change the set_input_zero_copy
to set the allocate
to be true in the function.
I do think this check is necessary. Otherwise it will cause seg fault when you run the graph runtime and some NDArray is not allocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do following check, please help to review.
void GraphRuntime::Run() {
// setup the array and requirements.
for (size_t i = 0; i < op_execs_.size(); ++i) {
- if (op_execs_[i]) op_execs_[i]();
+ if (op_execs_[i]) {
+ auto& op_arg = op_args_[i];
+ if (op_arg) {
+ for (auto& arg : op_arg->args) {
+ CHECK(arg.data != nullptr) << "Un-initialized input!";
+ }
+ }
+ op_execs_[i]();
+ }
}
}
reader->BeginArray(); | ||
CHECK(reader->NextArrayItem()); | ||
reader->Read(&type); | ||
CHECK_EQ(type, "list_str"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use "list_int", just to be compatible with "arg_nodes"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it should be aligned with the type defined in json file?
"attrs": {
... ...
"lazy_init_input": [
"list_str",
[
"p0"
]
]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean you should use "list_int" in json file as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very likely customer will manually edit json file to specify lazy_init_input
entries. It would be much easier to use parameter name, rather than its numeric id.
As GraphRuntime does not provide control-flow logics, we have to split our model to two parts. While we need to share parameters between them to save memory usage. Solution: 1) add "lazy_init_input" in graph's attributes "attrs": { ... ... "lazy_init_input": [ "list_str", [ "p0" ] ] } 2) allow un-allocated NDArray entry in SetupStorage 3) utilize "set_input_zero_copy" function to set parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I've restarted CI and we can merge after CI passed.
@@ -53,7 +53,15 @@ inline size_t GetDataAlignment(const DLTensor& arr) { | |||
void GraphRuntime::Run() { | |||
// setup the array and requirements. | |||
for (size_t i = 0; i < op_execs_.size(); ++i) { | |||
if (op_execs_[i]) op_execs_[i](); | |||
if (op_execs_[i]) { | |||
auto& op_arg = op_args_[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yongsun @icemelon9 where is this op_args_? there is only op_args in the code, and my build failed because op_args_ cannot be found. How does this even get through CI?
This PR is reverted temporary in https://github.com/dmlc/tvm/pull/3884due to a problem in master(and ci test was against an older version). @yongsun please send a new PR that adds the change back. Sorry for the inconvenience |
As GraphRuntime does not provide control-flow logics, we have to split our model to two parts. While we need to share parameters between them to save memory usage. Solution: 1) add "lazy_init_input" in graph's attributes "attrs": { ... ... "lazy_init_input": [ "list_str", [ "p0" ] ] } 2) allow un-allocated NDArray entry in SetupStorage 3) utilize "set_input_zero_copy" function to set parameters
…" (apache#3884) This reverts commit 224cc24.
As GraphRuntime does not provide control-flow logics, we have to split our model to two parts. While we need to share parameters between them to save memory usage. Solution: 1) add "lazy_init_input" in graph's attributes "attrs": { ... ... "lazy_init_input": [ "list_str", [ "p0" ] ] } 2) allow un-allocated NDArray entry in SetupStorage 3) utilize "set_input_zero_copy" function to set parameters
…" (apache#3884) This reverts commit 224cc24.
As GraphRuntime does not provide control-flow logics, we have to split our model to two parts. While we need to share parameters between them to save memory usage. Solution: 1) add "lazy_init_input" in graph's attributes "attrs": { ... ... "lazy_init_input": [ "list_str", [ "p0" ] ] } 2) allow un-allocated NDArray entry in SetupStorage 3) utilize "set_input_zero_copy" function to set parameters
…" (apache#3884) This reverts commit 224cc24.
As GraphRuntime does not provide control-flow logics, we have to split our model to two parts. While we need to share parameters between them to save memory usage. Solution: 1) add "lazy_init_input" in graph's attributes "attrs": { ... ... "lazy_init_input": [ "list_str", [ "p0" ] ] } 2) allow un-allocated NDArray entry in SetupStorage 3) utilize "set_input_zero_copy" function to set parameters
…" (apache#3884) This reverts commit 224cc24.
As GraphRuntime does not provide control-flow logics, we have to split
our model to two parts. While we need to share parameters between them
to save memory usage.
Solution:
Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers.