-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FP requantize flow. Set float32 flow by default for llvm x86 targets with sse4.1 support. #9637
Conversation
7068e63
to
9050d50
Compare
18824b4
to
da09e5e
Compare
66e0220
to
5225f48
Compare
|
5225f48
to
457711e
Compare
cc @jwfromm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, only minor comments
2eb8658
to
b958076
Compare
Please go through your change and remove all uses of the term |
b958076
to
81458dc
Compare
python/tvm/topi/x86/utils.py
Outdated
"amdfam10", | ||
"athlon-4", | ||
"athlon-xp", | ||
"c3-2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this level of details? I prefer dropping them. I don't think people would ever specify these targets...
I think sse4.1 - vnni are enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, sse4.1 looks good. Users can always use requantize_config to change the default behavior.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, just more minor comments and I'll merge this.
…ets with sse4.1 support
81458dc
to
5b07e4c
Compare
Please kick another CI job. |
…ets with (apache#9637) sse4.1 support
…ets with (apache#9637) sse4.1 support
Added a new calculation_flow_type parament to the relay.qnn.op.requantize. This parameter is controlling the implementation flow of this function. Valid values: "int64", "float32", "float64".
The basic idea is that for some targets implementations other than "int64" (the only one at the moment) will be more productive.
Below some measurements were made on AMD Ryzen 7 5800H with TVM_NUM_THREADS=1
Performance with "llvm -mcpu=core-avx2" target:
Performance with "llvm" target:
Accuracy with "llvm -mcpu=core-avx2" target:
Accuracy with "llvm" target:
Additional changes: