-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib: add hint to generate more pipeline friendly code #3138
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UNLIKELY()
tag looks good, please just comment why the loop needs to be aligned, and can you add a PR comment with the benchmarks for that alignment?
#if defined(__aarch64__) | ||
__asm__ volatile(".p2align 4"); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining why this alignment is here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing this, @terrelln . The original alignment was added because:
- this loop is the hottest one in ZSTD_buildFSETable_body with small codegen and simple logic (no branch). A proper code alignment might get extra uplift.
- to ensure it has similar uplift across different Arm micro-architectures (may not be the best for a certain one).
It was tested with silesia on three micro-architectures I had. But with more tests of other corpuses it is found that this value (16B aligned) may not be a good choice, and could impact the performance stability of tests, like the variance of the lowest number and the highest number could be ~8% for some files. So I'd drop this code alignment part before I find a proper way to meet the goal.
I've updated the patch with only the UNLIKELY tag. Here is the uplift with UNLIKELY only on Arm N1:
file | L2 | L8 | L15 |
---|---|---|---|
dickens | 0.845% | 0.763% | 0.657% |
mozilla | 0.966% | 1.163% | 0.977% |
mr | 0.655% | 0.941% | 0.876% |
nci | 0.345% | 0.404% | 0.370% |
ooffice | 0.436% | 1.062% | 0.921% |
osdb | 0.514% | 0.528% | 0.381% |
reymont | 0.920% | 0.797% | 0.556% |
samba | 0.632% | 0.621% | 0.567% |
sao | 0.571% | 0.940% | 0.885% |
webster | 0.866% | 0.577% | 0.568% |
x-ray | 0.102% | 0.886% | 0.950% |
xml | 0.335% | 0.328% | 0.140% |
Thanks.
With statistic data of test data files of silesia the chance of position beyond highThreshold is very low (~1.3%@L8 in most cases, all <2.5%), and is in "lowprob area". Add the branch hint so compiler can get better pipiline codegen. With this change it is observed ~1% of mozilla and xml, and slight (0.3%~0.8%) but consistent uplift on other files on Arm N1. Signed-off-by: Jun He <[email protected]> Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade
With statistic data of test data files of silesia
the chance of position beyond highThreshold is very
low (~1.3%@L8 in most cases, all <2.5%), and is in
"lowprob area". Add the branch hint so compiler can
get better pipiline codegen.
With this change it is observed ~1% of mozilla and
xml, and slight (0.3%~0.8%) but consistent uplift on
other files on Arm N1.
Signed-off-by: Jun He [email protected]
Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade