-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ConvertLayout] Support QNN ops. #5066
Conversation
235c079
to
a4c5092
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good to me.
src/relay/qnn/op/convolution.cc
Outdated
|
||
// Fill the layouts of remaining input tensors - scales and zero points. The layouts of these | ||
// tensors can be ignored as they dont go through any transformation. | ||
Layout ignore_layout = Layout("I"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are them always input channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can be scalar, or output channel. I initially thought of putting them as "C". But, chose "I" to be more specific. I am open to discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "C" is better. I don't have strong opinion though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* [ConvertLayout] Support QNN ops. * Changing layouts to C. * Fixing dilation. * Empty commit. Co-authored-by: Ubuntu <[email protected]>
* [ConvertLayout] Support QNN ops. * Changing layouts to C. * Fixing dilation. * Empty commit. Co-authored-by: Ubuntu <[email protected]>
Recently introduced Op strategy has disabled conversion from NHWC to NCHW in AlterOpLayout (which is correct thing to do). We can solve this problem by calling ConvertLayout in the parser if needed. However, this only works for FP32.
For quantized models, parsers give a QNN graph. And this QNN graph goes to relay.build. Relay build internally calls QNN Legalize passes to convert it to Relay-only ops. The problem is ConvertLayout does not work on QNN ops. Therefore, even if we call ConvertLayout after parser, the layouts will not change.
This PR implements ConvertLayout for QNN ops. In addition, I have changed the interface of FInferCorrectLayout to ingest an array of Relay Types instead of shapes. This is helpful in operators like Concatenate where we need to know the number of input data tensors.
@icemelon9 @zhiics @yzhliu