[BYOC] Enhance partitioning and external codegen #5310

zhiics · 2020-04-11T20:59:59Z

This PR

Removes the duplicated outputs to the followed subgraphs. Previously, there would be two outputs from A to (B, C) if both B and C use the output from A where actually one is enough.
Enables MobileNet test on DNNL by allocating the large constant array on the static section instead of the heap and then assigning then one by one. Note the compilation time would be slightly longer (< 2mins on a P2 instance).
Fixes the way we used to save the intermediate output by directly caching and returning it in the post order way.

@comaniac @masahi @soiferj @manupa-arm @trevor-m

src/relay/backend/contrib/dnnl/codegen.cc

tests/python/relay/test_pass_partition_graph.py

masahi · 2020-04-11T23:51:47Z

Great! Can you also enable mobilenet exec in the dnnl fuse test, by uncommenting this?
https://github.com/apache/incubator-tvm/blob/master/tests/python/relay/test_pass_partition_graph.py#L971-L973

zhiics · 2020-04-11T23:57:07Z

@masahi Thanks for the review. I uncommented those lines. It worked fine.

src/relay/backend/contrib/dnnl/codegen.cc

comaniac

LGTM. Just minor comments.

src/relay/backend/contrib/codegen_c/codegen.cc

masahi · 2020-04-12T00:50:35Z

@zhiics Do you know why compiling manually inlined constants is massively faster and memory efficient for static array than heap one? For my education :)

I guess it is because for inlined array on heap the compiler needs to generate instruction for every value?

zhiics · 2020-04-12T01:20:51Z

@masahi Yeah, I only have some hypothetic explanations which motivated me to make this change. I think the reason why heap was slow was because we generated a huge amount of stmts for array initialization. It makes the code size so huge even though many of them are simple. But each optimization and codegen on it would be still very slow (particularly they are pointers). However, after we move the array to the data segment, we can remove these stmts but only have one pointer. Therefore, the compilation should be much faster. In the latter case, linking and data loading would take more time, but it is still insignificant compared to compilation. Does this make sense?

Update: yeah, I didn't see your updated paragraph, but it is right.

src/relay/transforms/partition_graph.cc

masahi · 2020-04-12T02:00:43Z

Let's keep it open for a couple of days so that ARM people can have a look next week. @mbaret @lhutton1 @manupa-arm

src/relay/backend/contrib/codegen_c/codegen.cc

zhiics · 2020-04-13T15:19:51Z

@mbaret @lhutton1 @manupa-arm Could any of you take a look? I'd like to merge enhancement soon.

zhiics · 2020-04-13T21:05:55Z

Let's bring this in since ARM folks haven't really changed the DNNL codegen code and the fix in partitioning is simple.

zhiics · 2020-04-13T21:06:28Z

Thanks @masahi @comaniac

soiferj · 2020-04-14T00:14:39Z

Hi all, sorry for being late to the party, it's been a really busy last couple of months. I was just pulling these changes and trying them out, and I have a question on the handling of constants. @zhiics, is there a reason why you copy the entire content of the constants instead of just using the value of the pointer?

In other words, instead of float const_1[1000] = {......}, we could have float* const_1 = dl_tensor.data. We can pass that pointer around directly.

zhiics · 2020-04-14T00:53:20Z

The reason is because we need to serialize it. We cannot save the pointer but we need the data.

soiferj · 2020-04-14T01:33:11Z

You’re right, sorry about that. So if our target was CUDA, we would need to first memcpy back to the host?

It turns out the constant propagation in the PartitionGraph pass changed this behavior quite a bit.

Update: I think I see what’s happening - the value of the constant will be on the host. Nvm!

zhiics · 2020-04-14T01:52:53Z

It wasn't really constant before. It was variables. Therefore, you wouldn't have constants in the external codegen. With constant propagation, we will have them and leave it for the external codegen to handle. I think there is a thread in the discussion forum.

lhutton1 · 2020-04-14T08:59:40Z

Apologies, it was bank holiday in the UK :) Although this has now been merged, I've had a look into the constant tensor issue previously and this method of writing out the constant tensor causes very long compilation time for models like VGG16. I couldn't find a nice way to solve this when I was looking into it, but just wanted to raise it here so people are aware. Is this a pattern that we want people to follow in the future?

zhiics · 2020-04-14T15:47:16Z

This is what I think of we can do for CSourceModule style external codegen. I have some initial thoughts/work of having a different runtime so that we only need to serialize a relay program and interpret it (i.e. the one used as the minimal example). I don't have much cycle recently. I will send an RFC once I have bandwidth on it later. But anyway, let's not continue talking about it here as it is a different topic.

* Remove duplicated output args * address comment * fix codegen c * improve comment * VisitExprDefault_ * deduce type

Remove duplicated output args

069aea8

zhiics force-pushed the partition branch from 6dc8b77 to 069aea8 Compare April 11, 2020 21:06

masahi self-assigned this Apr 11, 2020

masahi reviewed Apr 11, 2020

View reviewed changes

src/relay/backend/contrib/dnnl/codegen.cc Outdated Show resolved Hide resolved

masahi reviewed Apr 11, 2020

View reviewed changes

tests/python/relay/test_pass_partition_graph.py Show resolved Hide resolved

address comment

f65411c

masahi reviewed Apr 12, 2020

View reviewed changes

src/relay/backend/contrib/dnnl/codegen.cc Show resolved Hide resolved

comaniac requested changes Apr 12, 2020

View reviewed changes

src/relay/backend/contrib/codegen_c/codegen.cc Outdated Show resolved Hide resolved

src/relay/backend/contrib/codegen_c/codegen.cc Outdated Show resolved Hide resolved

src/relay/backend/contrib/codegen_c/codegen.cc Outdated Show resolved Hide resolved

fix codegen c

986a668

masahi reviewed Apr 12, 2020

View reviewed changes

src/relay/transforms/partition_graph.cc Outdated Show resolved Hide resolved

improve comment

bb3f8de

masahi approved these changes Apr 12, 2020

View reviewed changes

VisitExprDefault_

5e6cb13

comaniac approved these changes Apr 12, 2020

View reviewed changes

masahi reviewed Apr 12, 2020

View reviewed changes

src/relay/backend/contrib/codegen_c/codegen.cc Outdated Show resolved Hide resolved

deduce type

8c6f8dd

trevor-m mentioned this pull request Apr 13, 2020

[BYOC] Prevent duplicate outputs in subgraph Tuple #5320

Merged

zhiics merged commit 5958d60 into apache:master Apr 13, 2020

zhiics deleted the partition branch April 13, 2020 21:06

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020

[BYOC] Enhance partitioning and external codegen (apache#5310)

841f9b3

* Remove duplicated output args * address comment * fix codegen c * improve comment * VisitExprDefault_ * deduce type

zhiics added a commit to neo-ai/tvm that referenced this pull request Apr 17, 2020

[BYOC] Enhance partitioning and external codegen (apache#5310)

d4cb8c4

* Remove duplicated output args * address comment * fix codegen c * improve comment * VisitExprDefault_ * deduce type

dpankratz pushed a commit to dpankratz/incubator-tvm that referenced this pull request Apr 24, 2020

[BYOC] Enhance partitioning and external codegen (apache#5310)

84c8ac5

* Remove duplicated output args * address comment * fix codegen c * improve comment * VisitExprDefault_ * deduce type

masahi mentioned this pull request May 28, 2020

[PatternLang] Add ConstantPattern #5689

Merged

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BYOC] Enhance partitioning and external codegen #5310

[BYOC] Enhance partitioning and external codegen #5310

zhiics commented Apr 11, 2020

masahi commented Apr 11, 2020

zhiics commented Apr 11, 2020

comaniac left a comment

masahi commented Apr 12, 2020 •

edited

Loading

zhiics commented Apr 12, 2020 •

edited

Loading

masahi commented Apr 12, 2020 •

edited

Loading

zhiics commented Apr 13, 2020

zhiics commented Apr 13, 2020

zhiics commented Apr 13, 2020

soiferj commented Apr 14, 2020

zhiics commented Apr 14, 2020

soiferj commented Apr 14, 2020 •

edited

Loading

zhiics commented Apr 14, 2020

lhutton1 commented Apr 14, 2020

zhiics commented Apr 14, 2020

[BYOC] Enhance partitioning and external codegen #5310

[BYOC] Enhance partitioning and external codegen #5310

Conversation

zhiics commented Apr 11, 2020

masahi commented Apr 11, 2020

zhiics commented Apr 11, 2020

comaniac left a comment

Choose a reason for hiding this comment

masahi commented Apr 12, 2020 • edited Loading

zhiics commented Apr 12, 2020 • edited Loading

masahi commented Apr 12, 2020 • edited Loading

zhiics commented Apr 13, 2020

zhiics commented Apr 13, 2020

zhiics commented Apr 13, 2020

soiferj commented Apr 14, 2020

zhiics commented Apr 14, 2020

soiferj commented Apr 14, 2020 • edited Loading

zhiics commented Apr 14, 2020

lhutton1 commented Apr 14, 2020

zhiics commented Apr 14, 2020

masahi commented Apr 12, 2020 •

edited

Loading

zhiics commented Apr 12, 2020 •

edited

Loading

masahi commented Apr 12, 2020 •

edited

Loading

soiferj commented Apr 14, 2020 •

edited

Loading