[Runtime] Allow parameter sharing between modules #3489

yongsun · 2019-07-03T21:23:39Z

As GraphRuntime does not provide control-flow logics, we have to split
our model to two parts. While we need to share parameters between them
to save memory usage.

Solution:

add "lazy_init_input" in graph's attributes

   "attrs": {
     ... ...
     "lazy_init_input": [
       "list_str",
       [
         "p0"
       ]
     ]
    }

allow un-allocated NDArray entry in SetupStorage
utilize "set_input_zero_copy" function to set parameters

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers.

yongsun · 2019-07-10T15:43:34Z

@tqchen @icemelon9 Could you please help to review this change? Thanks!

tqchen · 2019-07-10T17:02:52Z

Thanks @yongsun given this changes the spec and api, it would be helpful to hold a small RFC discussion. Sorry for the delayed action on this PR. Can you also open an RFC thread to discuss the
changes to the graph and api?

cc @ajtulloch @yinghai @hlu1 @yzhliu @srkreddy1238

yongsun · 2019-07-10T17:22:04Z

@tqchen Thanks so much for your replying, I just created a RFC thread here, https://discuss.tvm.ai/t/new-graphruntime-api-to-share-parameter-between-modules/3284

Thanks @yongsun given this changes the spec and api, it would be helpful to hold a small RFC discussion. Sorry for the delayed action on this PR. Can you also open an RFC thread to discuss the
changes to the graph and api?

cc @ajtulloch @yinghai @hlu1 @yzhliu @srkreddy1238

yinghai

I think this can be built on top of #3416. You can setup data_entry_'s for the input as before, continue to setup the tvm ops. And then free up the storage in data_entry_[i] with NDArray() for i in inputs. Then you can probably just use set_input_zero_copy as usually.

yongsun · 2019-07-19T16:22:53Z

I think this can be built on top of #3416. You can setup data_entry_'s for the input as before, continue to setup the tvm ops. And then free up the storage in data_entry_[i] with NDArray() for i in inputs. Then you can probably just use set_input_zero_copy as usually.

When deploying the model on an embedded device, we are trying to avoid the unnecessary memory allocation in the first place. Allocate additional 7~8MB is not affordable for our case, even it can be freed later.

yongsun · 2019-07-19T16:23:19Z

@tqchen How could I access the CI logs? Thanks!

tqchen · 2019-07-19T16:32:48Z

You can clock on the details link to get access the CI logs

yongsun · 2019-07-19T16:39:41Z

You can clock on the details link to get access the CI logs

It turns out that the CI site is not accessible from our corp net :( While I can access from a public network.

yinghai · 2019-07-19T17:54:29Z

When deploying the model on an embedded device, we are trying to avoid the unnecessary memory allocation in the first place. Allocate additional 7~8MB is not affordable for our case, even it can be freed later.

But you have to allocate those inputs outside anyway? You peak memory shouldn't change, as you free them during initialization of the runtime. And normally input won't comprise 7-8MB. Have you tried and tested this for your case?

yongsun · 2019-07-19T18:42:53Z

When deploying the model on an embedded device, we are trying to avoid the unnecessary memory allocation in the first place. Allocate additional 7~8MB is not affordable for our case, even it can be freed later.

But you have to allocate those inputs outside anyway? You peak memory shouldn't change, as you free them during initialization of the runtime. And normally input won't comprise 7-8MB. Have you tried and tested this for your case?

In our use case, the RNN model was split to two parts, the output layer in part2 will share the weights of embedding layer in part1. We will not allocate memory for output layer, but directly refer the weight matrix to the NDArray from embedding layer. The output weight matrix size for a RNN language model is vocab_size * embedding_demin, even for a small/compat RNN model, 7~8MB is very common.

yinghai · 2019-07-19T19:41:22Z

OK. I think you can have data_entry_ as NDArray() and setup tvm ops with correct shape info (which is available). Then use set_input_zero_copy to set input.

yongsun · 2019-07-19T21:06:41Z

OK. I think you can have data_entry_ as NDArray() and setup tvm ops with correct shape info (which is available). Then use set_input_zero_copy to set input.

Thanks for your suggestion, it makes lot sense to me. I'll update the PR later.

yongsun · 2019-08-12T23:30:19Z

@yinghai sorry for the late update, just got some time on this, please help to review. Thanks!

yongsun · 2019-08-13T23:44:11Z

@yinghai I'm sorry that the revised pull request seems overwrote your review comments. Could you add you comments again? Thanks! BTW, the CI failed on doc-gpu task, should not be impacted by this change.

yinghai

I get the lazy init part and it makes sense to me. But where is the parameter sharing part?

yongsun · 2019-08-14T06:26:41Z

I get the lazy init part and it makes sense to me. But where is the parameter sharing part?

The sharing will be done by using set_input_zero_copy, something like this,

decoder.set_input_zero_copy("p0", encoder.get_input("p0"))

yinghai · 2019-08-14T07:16:42Z

I see. LGTM then. Could you add some unittest? You can take https://github.com/dmlc/tvm/blob/master/tests/cpp/relay_build_module_test.cc as an example.

yongsun · 2019-08-14T08:58:06Z

Sure, I'll need to learn how to build a relay module in C++ api, and bind it to a json graph with lazy_init_input attributes.

yongsun · 2019-08-14T22:41:27Z

@yinghai unittest case added and passed, please help to review. Thanks!

yinghai · 2019-08-15T03:48:36Z

@tqchen it's looking good to me. Could you do a pass?

yongsun · 2019-08-23T17:27:24Z

@tqchen could you please hep to review? Thanks!

yongsun · 2019-08-25T03:32:41Z

@yinghai is possible you may give the approval? Thanks!

icemelon

Could you check whether all NDArrays have been allocated before run in the GraphRuntime?

icemelon · 2019-08-26T21:30:31Z

src/runtime/graph/graph_runtime.cc

+  std::vector<uint32_t> lazy_init_entries;
+  for (auto const& name : attrs_.lazy_init_input) {
+    int in_idx = GetInputIndex(name);
+    CHECK_GE(in_idx, 0) << "input \"" << name << "\"does not exist!";


Suggested change

CHECK_GE(in_idx, 0) << "input \"" << name << "\"does not exist!";

CHECK_GE(in_idx, 0) << "input \"" << name << "\" does not exist!";

Will fix in next rev.

In terms of checking if all NDArrays have been allocated before "run", since the set_input_zero_copy will directly modify the DLTensor's data in op_args_, the original unallocated NDArray instance in data_entry_ will remain unallocated.

set_input_zero_copy doesn't mean that NDArray data ptr is nullptr. You need to change the set_input_zero_copy to set the allocate to be true in the function.
I do think this check is necessary. Otherwise it will cause seg fault when you run the graph runtime and some NDArray is not allocated.

I think we can do following check, please help to review.

void GraphRuntime::Run() { // setup the array and requirements. for (size_t i = 0; i < op_execs_.size(); ++i) { - if (op_execs_[i]) op_execs_[i](); + if (op_execs_[i]) { + auto& op_arg = op_args_[i]; + if (op_arg) { + for (auto& arg : op_arg->args) { + CHECK(arg.data != nullptr) << "Un-initialized input!"; + } + } + op_execs_[i](); + } } }

src/runtime/graph/graph_runtime.h

icemelon · 2019-08-28T16:46:17Z

src/runtime/graph/graph_runtime.h

+          reader->BeginArray();
+          CHECK(reader->NextArrayItem());
+          reader->Read(&type);
+          CHECK_EQ(type, "list_str");


Could you use "list_int", just to be compatible with "arg_nodes"?

I thought it should be aligned with the type defined in json file?

"attrs": { ... ... "lazy_init_input": [ "list_str", [ "p0" ] ] }

No, I mean you should use "list_int" in json file as well.

It is very likely customer will manually edit json file to specify lazy_init_input entries. It would be much easier to use parameter name, rather than its numeric id.

As GraphRuntime does not provide control-flow logics, we have to split our model to two parts. While we need to share parameters between them to save memory usage. Solution: 1) add "lazy_init_input" in graph's attributes "attrs": { ... ... "lazy_init_input": [ "list_str", [ "p0" ] ] } 2) allow un-allocated NDArray entry in SetupStorage 3) utilize "set_input_zero_copy" function to set parameters

icemelon

lgtm
I've restarted CI and we can merge after CI passed.

icemelon · 2019-09-03T03:53:56Z

Thanks @yongsun @yinghai

MarisaKirisame · 2019-09-03T04:28:24Z

src/runtime/graph/graph_runtime.cc

@@ -53,7 +53,15 @@ inline size_t GetDataAlignment(const DLTensor& arr) {
 void GraphRuntime::Run() {
  // setup the array and requirements.
  for (size_t i = 0; i < op_execs_.size(); ++i) {
-    if (op_execs_[i]) op_execs_[i]();
+    if (op_execs_[i]) {
+      auto& op_arg = op_args_[i];


@yongsun @icemelon9 where is this op_args_? there is only op_args in the code, and my build failed because op_args_ cannot be found. How does this even get through CI?

This reverts commit 224cc24.

…3884) This reverts commit 224cc24.

tqchen · 2019-09-03T07:32:04Z

This PR is reverted temporary in https://github.com/dmlc/tvm/pull/3884due to a problem in master(and ci test was against an older version). @yongsun please send a new PR that adds the change back. Sorry for the inconvenience

yongsun · 2019-09-03T18:26:54Z

@tqchen @icemelon9 @yinghai
Created a new PR basing on latest master branch: #3887
Please help to review, thanks!

As GraphRuntime does not provide control-flow logics, we have to split our model to two parts. While we need to share parameters between them to save memory usage. Solution: 1) add "lazy_init_input" in graph's attributes "attrs": { ... ... "lazy_init_input": [ "list_str", [ "p0" ] ] } 2) allow un-allocated NDArray entry in SetupStorage 3) utilize "set_input_zero_copy" function to set parameters

…" (apache#3884) This reverts commit 224cc24.

As GraphRuntime does not provide control-flow logics, we have to split our model to two parts. While we need to share parameters between them to save memory usage. Solution: 1) add "lazy_init_input" in graph's attributes "attrs": { ... ... "lazy_init_input": [ "list_str", [ "p0" ] ] } 2) allow un-allocated NDArray entry in SetupStorage 3) utilize "set_input_zero_copy" function to set parameters

…" (apache#3884) This reverts commit 224cc24.

As GraphRuntime does not provide control-flow logics, we have to split our model to two parts. While we need to share parameters between them to save memory usage. Solution: 1) add "lazy_init_input" in graph's attributes "attrs": { ... ... "lazy_init_input": [ "list_str", [ "p0" ] ] } 2) allow un-allocated NDArray entry in SetupStorage 3) utilize "set_input_zero_copy" function to set parameters

…" (apache#3884) This reverts commit 224cc24.

As GraphRuntime does not provide control-flow logics, we have to split our model to two parts. While we need to share parameters between them to save memory usage. Solution: 1) add "lazy_init_input" in graph's attributes "attrs": { ... ... "lazy_init_input": [ "list_str", [ "p0" ] ] } 2) allow un-allocated NDArray entry in SetupStorage 3) utilize "set_input_zero_copy" function to set parameters

…" (apache#3884) This reverts commit 224cc24.

tqchen added the status: need review label Jul 10, 2019

tqchen added the status: need RFC need RFC discussion label Jul 10, 2019

yinghai reviewed Jul 19, 2019

View reviewed changes

yinghai reviewed Aug 14, 2019

View reviewed changes

icemelon requested changes Aug 28, 2019

View reviewed changes

icemelon approved these changes Sep 2, 2019

View reviewed changes

icemelon merged commit 224cc24 into apache:master Sep 3, 2019

icemelon removed the status: need RFC need RFC discussion label Sep 3, 2019

icemelon added status: accepted and removed status: need review labels Sep 3, 2019

MarisaKirisame reviewed Sep 3, 2019

View reviewed changes

MarisaKirisame mentioned this pull request Sep 3, 2019

[TVM] Fix master build&CI #3883

Closed

tqchen added a commit that referenced this pull request Sep 3, 2019

Revert "[Runtime] Allow parameter sharing between modules (#3489)"

9d78851

This reverts commit 224cc24.

tqchen mentioned this pull request Sep 3, 2019

Revert "[Runtime] Allow parameter sharing between modules" #3884

Merged

tqchen added a commit that referenced this pull request Sep 3, 2019

Revert "[Runtime] Allow parameter sharing between modules (#3489)" (#…

6b0359b

…3884) This reverts commit 224cc24.

MarisaKirisame pushed a commit to MarisaKirisame/tvm that referenced this pull request Sep 7, 2019

Revert "[Runtime] Allow parameter sharing between modules (apache#3489)…

9e1f6aa

…" (apache#3884) This reverts commit 224cc24.

wweic pushed a commit to wweic/tvm that referenced this pull request Sep 16, 2019

Revert "[Runtime] Allow parameter sharing between modules (apache#3489)…

3ca5452

…" (apache#3884) This reverts commit 224cc24.

wweic pushed a commit to wweic/tvm that referenced this pull request Sep 16, 2019

Revert "[Runtime] Allow parameter sharing between modules (apache#3489)…

48730e0

…" (apache#3884) This reverts commit 224cc24.

wweic pushed a commit to neo-ai/tvm that referenced this pull request Sep 16, 2019

Revert "[Runtime] Allow parameter sharing between modules (apache#3489)…

2bd0c42

…" (apache#3884) This reverts commit 224cc24.

	CHECK_GE(in_idx, 0) << "input \"" << name << "\"does not exist!";
	CHECK_GE(in_idx, 0) << "input \"" << name << "\" does not exist!";

[Runtime] Allow parameter sharing between modules #3489

[Runtime] Allow parameter sharing between modules #3489

Conversation

yongsun commented Jul 3, 2019 • edited Loading

yongsun commented Jul 10, 2019

tqchen commented Jul 10, 2019 • edited Loading

yongsun commented Jul 10, 2019

yinghai left a comment

Choose a reason for hiding this comment

yongsun commented Jul 19, 2019 • edited Loading

yongsun commented Jul 19, 2019 • edited Loading

tqchen commented Jul 19, 2019

yongsun commented Jul 19, 2019 • edited Loading

yinghai commented Jul 19, 2019

yongsun commented Jul 19, 2019

yinghai commented Jul 19, 2019

yongsun commented Jul 19, 2019

yongsun commented Aug 12, 2019

yongsun commented Aug 13, 2019

yinghai left a comment

Choose a reason for hiding this comment

yongsun commented Aug 14, 2019

yinghai commented Aug 14, 2019

yongsun commented Aug 14, 2019

yongsun commented Aug 14, 2019

yinghai commented Aug 15, 2019

yongsun commented Aug 23, 2019

yongsun commented Aug 25, 2019

icemelon left a comment

Choose a reason for hiding this comment

icemelon Aug 26, 2019

Choose a reason for hiding this comment

yongsun Aug 28, 2019

Choose a reason for hiding this comment

yongsun Aug 28, 2019 • edited Loading

Choose a reason for hiding this comment

icemelon Aug 28, 2019

Choose a reason for hiding this comment

yongsun Aug 28, 2019

Choose a reason for hiding this comment

icemelon Aug 28, 2019

Choose a reason for hiding this comment

yongsun Aug 28, 2019 • edited Loading

Choose a reason for hiding this comment

icemelon Aug 28, 2019

Choose a reason for hiding this comment

yongsun Aug 28, 2019 • edited Loading

Choose a reason for hiding this comment

icemelon left a comment • edited Loading

Choose a reason for hiding this comment

icemelon commented Sep 3, 2019

MarisaKirisame Sep 3, 2019

Choose a reason for hiding this comment

tqchen commented Sep 3, 2019 • edited Loading

yongsun commented Sep 3, 2019

yongsun commented Jul 3, 2019 •

edited

Loading

tqchen commented Jul 10, 2019 •

edited

Loading

yongsun commented Jul 19, 2019 •

edited

Loading

yongsun commented Jul 19, 2019 •

edited

Loading

yongsun commented Jul 19, 2019 •

edited

Loading

yongsun Aug 28, 2019 •

edited

Loading

yongsun Aug 28, 2019 •

edited

Loading

yongsun Aug 28, 2019 •

edited

Loading

icemelon left a comment •

edited

Loading

tqchen commented Sep 3, 2019 •

edited

Loading