Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FP requantize flow. Set float32 flow by default for llvm x86 targets with sse4.1 support. #9637

Merged
merged 1 commit into from
Jan 26, 2022

Conversation

Icemist
Copy link
Contributor

@Icemist Icemist commented Dec 2, 2021

Added a new calculation_flow_type parament to the relay.qnn.op.requantize. This parameter is controlling the implementation flow of this function. Valid values: "int64", "float32", "float64".
The basic idea is that for some targets implementations other than "int64" (the only one at the moment) will be more productive.
Below some measurements were made on AMD Ryzen 7 5800H with TVM_NUM_THREADS=1

Performance with "llvm -mcpu=core-avx2" target:
image

Performance with "llvm" target:
image

Accuracy with "llvm -mcpu=core-avx2" target:

requantize flow accuracy
UPWARD_float64 75,39%
UPWARD_float32 75,32%
UPWARD_int64 75,39%
TONEAREST_float64 75,39%
TONEAREST_float32 75,32%
TONEAREST_int64 75,39%

Accuracy with "llvm" target:

requantize flow accuracy
UPWARD_float64 75,38%
UPWARD_float32 75,30%
UPWARD_int64 75,38%
TONEAREST_float64 75,38%
TONEAREST_float32 75,30%
TONEAREST_int64 75,38%

Additional changes:

  • Added relay.qnn.op.requantize_config to use it at python "with" statement. This allows users to control the behavior of the requantize function not directly. It accepts two parameters: rounding and compute_dtype. It has a lower priority than these parameters passed directly to the requantize function. Note: compute_dtype will be "float32" for llvm x86 targets. For example:
mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)
with tvm.transform.PassContext(opt_level=3):
	with relay.qnn.op.requantize_config(rounding="UPWARD", compute_dtype="float64"):
		lib = relay.build_module.build(mod, target=target, params=params)
  • Added target_has_sse41 and target_is_x86 functions to tvm.topi.x86.utils python namespace
  • Registered target_has.* functions from tvm.topi.x86.utils to call them from C++ code
  • Added Floor, LogicalOr, Equal, Less and IsFinite relay operations in C++ tvm::relay namespace
  • Added requantize_config validation tests

python/tvm/relay/qnn/op/qnn.py Outdated Show resolved Hide resolved
src/relay/qnn/op/requantize.cc Outdated Show resolved Hide resolved
@masahi masahi self-assigned this Jan 9, 2022
@Icemist Icemist force-pushed the avoronov/float_requantize branch from 18824b4 to da09e5e Compare January 10, 2022 02:04
@Icemist Icemist changed the title Add FP requantize flow for llvm target Add FP requantize flow. Set this by default for llvm x86 targets Jan 10, 2022
@Icemist Icemist force-pushed the avoronov/float_requantize branch 2 times, most recently from 66e0220 to 5225f48 Compare January 10, 2022 02:29
@Icemist
Copy link
Contributor Author

Icemist commented Jan 10, 2022

Have you benchmarked on ARM? I think we should enable this only for x86 for now.
I turned it off for not x86. It is also now possible to manage the implementation of requantize that will be used.

@Icemist Icemist force-pushed the avoronov/float_requantize branch from 5225f48 to 457711e Compare January 10, 2022 11:13
@masahi
Copy link
Member

masahi commented Jan 13, 2022

cc @jwfromm

Copy link
Member

@masahi masahi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only minor comments

python/tvm/relay/qnn/op/qnn.py Outdated Show resolved Hide resolved
src/relay/qnn/op/requantize.cc Show resolved Hide resolved
@Icemist Icemist force-pushed the avoronov/float_requantize branch 2 times, most recently from 2eb8658 to b958076 Compare January 24, 2022 15:25
@masahi
Copy link
Member

masahi commented Jan 24, 2022

Please go through your change and remove all uses of the term calculation flow

@Icemist Icemist force-pushed the avoronov/float_requantize branch from b958076 to 81458dc Compare January 24, 2022 22:03
"amdfam10",
"athlon-4",
"athlon-xp",
"c3-2",
Copy link
Member

@masahi masahi Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this level of details? I prefer dropping them. I don't think people would ever specify these targets...

I think sse4.1 - vnni are enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, sse4.1 looks good. Users can always use requantize_config to change the default behavior.
Done.

Copy link
Member

@masahi masahi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, just more minor comments and I'll merge this.

@Icemist Icemist force-pushed the avoronov/float_requantize branch from 81458dc to 5b07e4c Compare January 25, 2022 11:54
@Icemist Icemist changed the title Add FP requantize flow. Set this by default for llvm x86 targets Add FP requantize flow. Set float32 flow by default for llvm x86 targets with sse4.1 support Jan 25, 2022
@masahi
Copy link
Member

masahi commented Jan 26, 2022

Please kick another CI job.

@Icemist Icemist changed the title Add FP requantize flow. Set float32 flow by default for llvm x86 targets with sse4.1 support Add FP requantize flow. Set float32 flow by default for llvm x86 targets with sse4.1 support. Jan 26, 2022
@masahi masahi merged commit ffff8dd into apache:main Jan 26, 2022
sunggg pushed a commit to sunggg/tvm that referenced this pull request Jan 29, 2022
ylc pushed a commit to ylc/tvm that referenced this pull request Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants