-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay] Option to select which convolution layers are quantized. #3173
Conversation
Not sure why caffe2 tests are failing. They pass when run locally and these changes shouldn't effect anything but quantization. |
@vinx13 @ZihengJiang @kazum @eqy can you help review this? |
@@ -105,7 +105,9 @@ def conv2d_cuda(cfg, data, kernel, strides, padding, dilation, layout='NCHW', ou | |||
return winograd_cuda(cfg, data, kernel, strides, padding, dilation, layout, out_dtype, | |||
pre_computed=False) | |||
if cfg.template_key == 'int8': | |||
return conv2d_NCHWc_int8(cfg, data, kernel, strides, padding, dilation, layout, out_dtype) | |||
if (data.dtype == 'int8' or data.dtype == 'uint8'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work with uint8
? conv2d compute and dp4a need some changes since they are hardcoded as int8 (although it should work with uint8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that it only supports int8 now but could be extended to uint8. I added this check because I was doing some quantization without autotuning, which caused the incorrect convolution algorithm to be chosen in some cases. Since autotuning is the intended workflow I think dropping this would be fine.
@tqchen Shouldn't we move this logic outside of quantization and instead rely on pre-passes to annotate regions for quantization. I thought that was the whole point of the discussion we had about VTA's quantization annotations. |
@jwfromm CI error is unrelated. Please rebase and get CI green |
ea4e99f
to
3868fc5
Compare
CI is now green @vinx13 please manage the PR |
Also added a small check that prints a warning when annotating a layer that doesn't have input channels divisible by 4 and so may not be quantizable. |
e8c1a9b
to
f06693e
Compare
|
||
# Check if the kernel is suitable for quantization, post warning if not. | ||
in_channels = new_args[1].data.shape[1] | ||
if in_channels % 4 != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why in channels divisible by 4 is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if input channels isn't divisible by 4 relay fails to convert from NCHW to NCHWc and throws an error. This is ok if not using dp4a but annoying otherwise. Printing out a simple warning and count in these cases makes it easier to use the skip_conv_layers option if needed and is easy to ignore if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest removing the warning here since it is unrelated to quantization. The default conv2d implementation can work with different types. So if input channels isn't divisible by 4 (or other factor) we should use the NCHW one. We have some assertions like this in NCHWc conv2d. However, we don't expect that any errors will be raised since you have no way to obtain the AutoTVM log using invalid true. So NCHWc one shouldn't be chosen in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, I've reverted the warning.
@jroesch I'm gonna merge this if you have no objection |
f06693e
to
bbf128a
Compare
ping @vinx13 |
@jwfromm Thanks, this is merged |
…che#3173) * Stashing for later maybe. * Added new option to leave specific layers unquantized. * Better error checking. * remove unneeded import * tab to spaces * pylint fixes * more pylint fixes
…che#3173) * Stashing for later maybe. * Added new option to leave specific layers unquantized. * Better error checking. * remove unneeded import * tab to spaces * pylint fixes * more pylint fixes
The current Qconfig only allows layers at the beginning of a network to be left at full precision. However many architectures might require layers near the output to remain at high precision. I've added an option to explicitly skip specific convolutions anywhere in the network.