Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: add hint to generate more pipeline friendly code #3138

Merged
merged 1 commit into from
Jul 29, 2022

Conversation

JunHe77
Copy link
Contributor

@JunHe77 JunHe77 commented May 22, 2022

With statistic data of test data files of silesia
the chance of position beyond highThreshold is very
low (~1.3%@L8 in most cases, all <2.5%), and is in
"lowprob area". Add the branch hint so compiler can
get better pipiline codegen.
With this change it is observed ~1% of mozilla and
xml, and slight (0.3%~0.8%) but consistent uplift on
other files on Arm N1.

Signed-off-by: Jun He [email protected]
Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade

@embg embg self-assigned this May 24, 2022
@terrelln terrelln self-assigned this Jun 2, 2022
@embg embg removed their assignment Jun 9, 2022
Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UNLIKELY() tag looks good, please just comment why the loop needs to be aligned, and can you add a PR comment with the benchmarks for that alignment?

Comment on lines 552 to 554
#if defined(__aarch64__)
__asm__ volatile(".p2align 4");
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining why this alignment is here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing this, @terrelln . The original alignment was added because:

  1. this loop is the hottest one in ZSTD_buildFSETable_body with small codegen and simple logic (no branch). A proper code alignment might get extra uplift.
  2. to ensure it has similar uplift across different Arm micro-architectures (may not be the best for a certain one).

It was tested with silesia on three micro-architectures I had. But with more tests of other corpuses it is found that this value (16B aligned) may not be a good choice, and could impact the performance stability of tests, like the variance of the lowest number and the highest number could be ~8% for some files. So I'd drop this code alignment part before I find a proper way to meet the goal.

I've updated the patch with only the UNLIKELY tag. Here is the uplift with UNLIKELY only on Arm N1:

file L2 L8 L15
dickens 0.845% 0.763% 0.657%
mozilla 0.966% 1.163% 0.977%
mr 0.655% 0.941% 0.876%
nci 0.345% 0.404% 0.370%
ooffice 0.436% 1.062% 0.921%
osdb 0.514% 0.528% 0.381%
reymont 0.920% 0.797% 0.556%
samba 0.632% 0.621% 0.567%
sao 0.571% 0.940% 0.885%
webster 0.866% 0.577% 0.568%
x-ray 0.102% 0.886% 0.950%
xml 0.335% 0.328% 0.140%

Thanks.

With statistic data of test data files of silesia
the chance of position beyond highThreshold is very
low (~1.3%@L8 in most cases, all <2.5%), and is in
"lowprob area". Add the branch hint so compiler can
get better pipiline codegen.
With this change it is observed ~1% of mozilla and
xml, and slight (0.3%~0.8%) but consistent uplift on
other files on Arm N1.

Signed-off-by: Jun He <[email protected]>
Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade
@terrelln terrelln merged commit ec5fdcd into facebook:dev Jul 29, 2022
@Cyan4973 Cyan4973 mentioned this pull request Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants