-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Add FP requantize flow. Set float32 flow by default for llvm x86 targets with sse4.1 support. #9637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7068e63 to
9050d50
Compare
18824b4 to
da09e5e
Compare
66e0220 to
5225f48
Compare
|
5225f48 to
457711e
Compare
|
cc @jwfromm |
masahi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, only minor comments
2eb8658 to
b958076
Compare
|
Please go through your change and remove all uses of the term |
b958076 to
81458dc
Compare
python/tvm/topi/x86/utils.py
Outdated
| "amdfam10", | ||
| "athlon-4", | ||
| "athlon-xp", | ||
| "c3-2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this level of details? I prefer dropping them. I don't think people would ever specify these targets...
I think sse4.1 - vnni are enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, sse4.1 looks good. Users can always use requantize_config to change the default behavior.
Done.
masahi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, just more minor comments and I'll merge this.
…ets with sse4.1 support
81458dc to
5b07e4c
Compare
|
Please kick another CI job. |
…ets with (apache#9637) sse4.1 support
…ets with (apache#9637) sse4.1 support
Added a new calculation_flow_type parament to the relay.qnn.op.requantize. This parameter is controlling the implementation flow of this function. Valid values: "int64", "float32", "float64".
The basic idea is that for some targets implementations other than "int64" (the only one at the moment) will be more productive.
Below some measurements were made on AMD Ryzen 7 5800H with TVM_NUM_THREADS=1
Performance with "llvm -mcpu=core-avx2" target:

Performance with "llvm" target:

Accuracy with "llvm -mcpu=core-avx2" target:
Accuracy with "llvm" target:
Additional changes: