-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize AVX intrinsics for .NET8 #597
Draft
bitfaster
wants to merge
9
commits into
main
Choose a base branch
from
users/alexpeck/avx2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bitfaster
commented
Jun 18, 2024
Vector128<int> offset = Avx2.And(h, Vector128.Create(1)); | ||
Vector128<int> blockOffset = Avx2.Add(Vector128.Create(block), offset); // i - table index | ||
blockOffset = Avx2.Add(blockOffset, Vector128.Create(0, 2, 4, 6)); // + (i << 1) | ||
Vector256<ulong> indexLong = Avx2.PermuteVar8x32(Vector256.Create(index, Vector128<int>.Zero), Vector256.Create(0, 4, 1, 5, 2, 5, 3, 7)).AsUInt64(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this become readonlyspan as suggested here for the permute lookup table:
Tabular data
|
Tabular data| Method | Runtime | Size | Mean | Error | StdDev | Ratio | Code Size | Allocated | |--------------------- |--------- |---------- |----------:|---------:|---------:|------:|----------:|----------:| | IncFlat | .NET 6.0 | 512 | 23.32 ns | 0.057 ns | 0.051 ns | 1.00 | 852 B | - | | IncBlock | .NET 6.0 | 512 | 17.89 ns | 0.052 ns | 0.048 ns | 0.77 | 813 B | - | | IncBlockAvxNotPinned | .NET 6.0 | 512 | 15.91 ns | 0.040 ns | 0.033 ns | 0.68 | NA | - | | IncBlockAvxPinned | .NET 6.0 | 512 | 15.37 ns | 0.036 ns | 0.032 ns | 0.66 | 856 B | - | | | | | | | | | | | | IncFlat | .NET 8.0 | 512 | 10.31 ns | 0.011 ns | 0.009 ns | 1.00 | 1,019 B | - | | IncBlock | .NET 8.0 | 512 | 14.51 ns | 0.016 ns | 0.015 ns | 1.41 | 863 B | - | | IncBlockAvxNotPinned | .NET 8.0 | 512 | 14.22 ns | 0.010 ns | 0.009 ns | 1.38 | NA | - | | IncBlockAvxPinned | .NET 8.0 | 512 | 12.52 ns | 0.011 ns | 0.010 ns | 1.21 | 859 B | - | | | | | | | | | | | | IncFlat | .NET 9.0 | 512 | 10.07 ns | 0.012 ns | 0.010 ns | 1.00 | 1,003 B | - | | IncBlock | .NET 9.0 | 512 | 15.67 ns | 0.074 ns | 0.066 ns | 1.56 | 852 B | - | | IncBlockAvxNotPinned | .NET 9.0 | 512 | 13.64 ns | 0.006 ns | 0.006 ns | 1.35 | NA | - | | IncBlockAvxPinned | .NET 9.0 | 512 | 12.50 ns | 0.010 ns | 0.009 ns | 1.24 | 844 B | - | | | | | | | | | | | | IncFlat | .NET 6.0 | 1024 | 21.20 ns | 0.052 ns | 0.048 ns | 1.00 | 852 B | - | | IncBlock | .NET 6.0 | 1024 | 17.88 ns | 0.065 ns | 0.061 ns | 0.84 | 813 B | - | | IncBlockAvxNotPinned | .NET 6.0 | 1024 | 15.88 ns | 0.037 ns | 0.035 ns | 0.75 | NA | - | | IncBlockAvxPinned | .NET 6.0 | 1024 | 15.38 ns | 0.031 ns | 0.029 ns | 0.73 | 856 B | - | | | | | | | | | | | | IncFlat | .NET 8.0 | 1024 | 10.04 ns | 0.010 ns | 0.009 ns | 1.00 | 1,019 B | - | | IncBlock | .NET 8.0 | 1024 | 14.22 ns | 0.029 ns | 0.027 ns | 1.42 | 863 B | - | | IncBlockAvxNotPinned | .NET 8.0 | 1024 | 14.38 ns | 0.008 ns | 0.007 ns | 1.43 | NA | - | | IncBlockAvxPinned | .NET 8.0 | 1024 | 12.41 ns | 0.011 ns | 0.011 ns | 1.24 | 863 B | - | | | | | | | | | | | | IncFlat | .NET 9.0 | 1024 | 10.36 ns | 0.050 ns | 0.046 ns | 1.00 | 1,003 B | - | | IncBlock | .NET 9.0 | 1024 | 15.41 ns | 0.125 ns | 0.111 ns | 1.49 | 852 B | - | | IncBlockAvxNotPinned | .NET 9.0 | 1024 | 13.69 ns | 0.012 ns | 0.011 ns | 1.32 | NA | - | | IncBlockAvxPinned | .NET 9.0 | 1024 | 12.40 ns | 0.010 ns | 0.009 ns | 1.20 | 844 B | - | | | | | | | | | | | | IncFlat | .NET 6.0 | 32768 | 21.18 ns | 0.047 ns | 0.044 ns | 1.00 | 852 B | - | | IncBlock | .NET 6.0 | 32768 | 18.01 ns | 0.039 ns | 0.036 ns | 0.85 | 813 B | - | | IncBlockAvxNotPinned | .NET 6.0 | 32768 | 15.95 ns | 0.051 ns | 0.048 ns | 0.75 | NA | - | | IncBlockAvxPinned | .NET 6.0 | 32768 | 15.28 ns | 0.030 ns | 0.027 ns | 0.72 | 856 B | - | | | | | | | | | | | | IncFlat | .NET 8.0 | 32768 | 11.31 ns | 0.011 ns | 0.010 ns | 1.00 | 1,019 B | - | | IncBlock | .NET 8.0 | 32768 | 15.74 ns | 0.023 ns | 0.019 ns | 1.39 | 863 B | - | | IncBlockAvxNotPinned | .NET 8.0 | 32768 | 14.08 ns | 0.014 ns | 0.011 ns | 1.24 | NA | - | | IncBlockAvxPinned | .NET 8.0 | 32768 | 12.54 ns | 0.011 ns | 0.010 ns | 1.11 | 863 B | - | | | | | | | | | | | | IncFlat | .NET 9.0 | 32768 | 10.99 ns | 0.013 ns | 0.012 ns | 1.00 | 1,003 B | - | | IncBlock | .NET 9.0 | 32768 | 17.17 ns | 0.025 ns | 0.023 ns | 1.56 | 852 B | - | | IncBlockAvxNotPinned | .NET 9.0 | 32768 | 13.74 ns | 0.023 ns | 0.021 ns | 1.25 | NA | - | | IncBlockAvxPinned | .NET 9.0 | 32768 | 12.57 ns | 0.017 ns | 0.016 ns | 1.14 | 844 B | - | | | | | | | | | | | | IncFlat | .NET 6.0 | 524288 | 27.79 ns | 0.467 ns | 0.390 ns | 1.00 | 852 B | - | | IncBlock | .NET 6.0 | 524288 | 26.43 ns | 0.186 ns | 0.145 ns | 0.95 | 813 B | - | | IncBlockAvxNotPinned | .NET 6.0 | 524288 | 24.87 ns | 0.115 ns | 0.090 ns | 0.90 | NA | - | | IncBlockAvxPinned | .NET 6.0 | 524288 | 23.26 ns | 0.135 ns | 0.120 ns | 0.84 | 856 B | - | | | | | | | | | | | | IncFlat | .NET 8.0 | 524288 | 18.46 ns | 0.189 ns | 0.177 ns | 1.00 | 1,017 B | - | | IncBlock | .NET 8.0 | 524288 | 23.63 ns | 0.114 ns | 0.107 ns | 1.28 | 863 B | - | | IncBlockAvxNotPinned | .NET 8.0 | 524288 | 40.75 ns | 0.423 ns | 0.395 ns | 2.21 | NA | - | | IncBlockAvxPinned | .NET 8.0 | 524288 | 20.95 ns | 0.092 ns | 0.086 ns | 1.13 | 863 B | - | | | | | | | | | | | | IncFlat | .NET 9.0 | 524288 | 18.29 ns | 0.284 ns | 0.266 ns | 1.00 | 1,004 B | - | | IncBlock | .NET 9.0 | 524288 | 23.77 ns | 0.148 ns | 0.139 ns | 1.30 | 855 B | - | | IncBlockAvxNotPinned | .NET 9.0 | 524288 | 22.06 ns | 0.136 ns | 0.127 ns | 1.21 | NA | - | | IncBlockAvxPinned | .NET 9.0 | 524288 | 21.16 ns | 0.107 ns | 0.100 ns | 1.16 | 844 B | - | | | | | | | | | | | | IncFlat | .NET 6.0 | 8388608 | 83.25 ns | 0.347 ns | 0.324 ns | 1.00 | 482 B | - | | IncBlock | .NET 6.0 | 8388608 | 66.31 ns | 0.238 ns | 0.211 ns | 0.80 | 443 B | - | | IncBlockAvxNotPinned | .NET 6.0 | 8388608 | 51.84 ns | 0.437 ns | 0.409 ns | 0.62 | NA | - | | IncBlockAvxPinned | .NET 6.0 | 8388608 | 49.40 ns | 0.377 ns | 0.353 ns | 0.59 | 486 B | - | | | | | | | | | | | | IncFlat | .NET 8.0 | 8388608 | 56.88 ns | 0.444 ns | 0.415 ns | 1.00 | 655 B | - | | IncBlock | .NET 8.0 | 8388608 | 59.86 ns | 0.252 ns | 0.210 ns | 1.05 | 499 B | - | | IncBlockAvxNotPinned | .NET 8.0 | 8388608 | 50.07 ns | 0.210 ns | 0.197 ns | 0.88 | NA | - | | IncBlockAvxPinned | .NET 8.0 | 8388608 | 42.44 ns | 0.178 ns | 0.149 ns | 0.75 | 499 B | - | | | | | | | | | | | | IncFlat | .NET 9.0 | 8388608 | 57.79 ns | 0.586 ns | 0.548 ns | 1.00 | 655 B | - | | IncBlock | .NET 9.0 | 8388608 | 59.96 ns | 0.387 ns | 0.362 ns | 1.04 | 504 B | - | | IncBlockAvxNotPinned | .NET 9.0 | 8388608 | 49.38 ns | 0.290 ns | 0.257 ns | 0.85 | NA | - | | IncBlockAvxPinned | .NET 9.0 | 8388608 | 42.13 ns | 0.323 ns | 0.302 ns | 0.73 | 496 B | - | | | | | | | | | | | | IncFlat | .NET 6.0 | 134217728 | 111.60 ns | 0.761 ns | 0.712 ns | 1.00 | 482 B | - | | IncBlock | .NET 6.0 | 134217728 | 86.40 ns | 1.305 ns | 1.220 ns | 0.77 | 443 B | - | | IncBlockAvxNotPinned | .NET 6.0 | 134217728 | 63.80 ns | 0.369 ns | 0.346 ns | 0.57 | NA | - | | IncBlockAvxPinned | .NET 6.0 | 134217728 | 61.29 ns | 0.568 ns | 0.531 ns | 0.55 | 486 B | - | | | | | | | | | | | | IncFlat | .NET 8.0 | 134217728 | 84.00 ns | 0.551 ns | 0.516 ns | 1.00 | 655 B | - | | IncBlock | .NET 8.0 | 134217728 | 78.41 ns | 0.522 ns | 0.488 ns | 0.93 | 499 B | - | | IncBlockAvxNotPinned | .NET 8.0 | 134217728 | 62.13 ns | 0.821 ns | 0.728 ns | 0.74 | NA | - | | IncBlockAvxPinned | .NET 8.0 | 134217728 | 51.84 ns | 0.313 ns | 0.278 ns | 0.62 | 499 B | - | | | | | | | | | | | | IncFlat | .NET 9.0 | 134217728 | 84.03 ns | 1.088 ns | 1.017 ns | 1.00 | 655 B | - | | IncBlock | .NET 9.0 | 134217728 | 78.25 ns | 0.985 ns | 0.921 ns | 0.93 | 504 B | - | | IncBlockAvxNotPinned | .NET 9.0 | 134217728 | 61.43 ns | 0.626 ns | 0.586 ns | 0.73 | NA | - | | IncBlockAvxPinned | .NET 9.0 | 134217728 | 51.50 ns | 0.265 ns | 0.248 ns | 0.61 | 496 B | - | |
Tabular data
|
Tabular data| Method | Runtime | Size | Mean | Error | StdDev | Ratio | Code Size | Allocated | |--------------------------- |--------- |---------- |----------:|---------:|---------:|------:|----------:|----------:| | FrequencyFlat | .NET 6.0 | 512 | 25.43 ns | 0.100 ns | 0.089 ns | 1.00 | 314 B | - | | FrequencyBlock | .NET 6.0 | 512 | 21.17 ns | 0.063 ns | 0.059 ns | 0.83 | 342 B | - | | FrequencyBlockAvxNotPinned | .NET 6.0 | 512 | 27.04 ns | 0.024 ns | 0.023 ns | 1.06 | 620 B | - | | FrequencyBlockAvxPinned | .NET 6.0 | 512 | 25.57 ns | 0.012 ns | 0.011 ns | 1.01 | 521 B | - | | | | | | | | | | | | FrequencyFlat | .NET 8.0 | 512 | 14.95 ns | 0.015 ns | 0.014 ns | 1.00 | 482 B | - | | FrequencyBlock | .NET 8.0 | 512 | 17.10 ns | 0.014 ns | 0.012 ns | 1.14 | 460 B | - | | FrequencyBlockAvxNotPinned | .NET 8.0 | 512 | 24.96 ns | 0.012 ns | 0.012 ns | 1.67 | 721 B | - | | FrequencyBlockAvxPinned | .NET 8.0 | 512 | 23.30 ns | 0.020 ns | 0.019 ns | 1.56 | 614 B | - | | | | | | | | | | | | FrequencyFlat | .NET 9.0 | 512 | 14.45 ns | 0.012 ns | 0.011 ns | 1.00 | 472 B | - | | FrequencyBlock | .NET 9.0 | 512 | 17.43 ns | 0.015 ns | 0.014 ns | 1.21 | 463 B | - | | FrequencyBlockAvxNotPinned | .NET 9.0 | 512 | 24.68 ns | 0.009 ns | 0.008 ns | 1.71 | 718 B | - | | FrequencyBlockAvxPinned | .NET 9.0 | 512 | 23.29 ns | 0.016 ns | 0.015 ns | 1.61 | 611 B | - | | | | | | | | | | | | FrequencyFlat | .NET 6.0 | 1024 | 25.47 ns | 0.105 ns | 0.098 ns | 1.00 | 314 B | - | | FrequencyBlock | .NET 6.0 | 1024 | 21.15 ns | 0.018 ns | 0.017 ns | 0.83 | 342 B | - | | FrequencyBlockAvxNotPinned | .NET 6.0 | 1024 | 27.07 ns | 0.041 ns | 0.039 ns | 1.06 | 620 B | - | | FrequencyBlockAvxPinned | .NET 6.0 | 1024 | 25.57 ns | 0.020 ns | 0.019 ns | 1.00 | 521 B | - | | | | | | | | | | | | FrequencyFlat | .NET 8.0 | 1024 | 14.94 ns | 0.012 ns | 0.011 ns | 1.00 | 482 B | - | | FrequencyBlock | .NET 8.0 | 1024 | 17.05 ns | 0.012 ns | 0.010 ns | 1.14 | 460 B | - | | FrequencyBlockAvxNotPinned | .NET 8.0 | 1024 | 24.83 ns | 0.015 ns | 0.014 ns | 1.66 | 753 B | - | | FrequencyBlockAvxPinned | .NET 8.0 | 1024 | 23.29 ns | 0.011 ns | 0.010 ns | 1.56 | 614 B | - | | | | | | | | | | | | FrequencyFlat | .NET 9.0 | 1024 | 14.45 ns | 0.007 ns | 0.007 ns | 1.00 | 472 B | - | | FrequencyBlock | .NET 9.0 | 1024 | 17.46 ns | 0.008 ns | 0.007 ns | 1.21 | 463 B | - | | FrequencyBlockAvxNotPinned | .NET 9.0 | 1024 | 24.69 ns | 0.014 ns | 0.012 ns | 1.71 | 718 B | - | | FrequencyBlockAvxPinned | .NET 9.0 | 1024 | 23.30 ns | 0.002 ns | 0.001 ns | 1.61 | 611 B | - | | | | | | | | | | | | FrequencyFlat | .NET 6.0 | 32768 | 26.17 ns | 0.286 ns | 0.253 ns | 1.00 | 314 B | - | | FrequencyBlock | .NET 6.0 | 32768 | 22.31 ns | 0.013 ns | 0.011 ns | 0.85 | 342 B | - | | FrequencyBlockAvxNotPinned | .NET 6.0 | 32768 | 27.10 ns | 0.005 ns | 0.004 ns | 1.04 | 620 B | - | | FrequencyBlockAvxPinned | .NET 6.0 | 32768 | 25.60 ns | 0.017 ns | 0.015 ns | 0.98 | 521 B | - | | | | | | | | | | | | FrequencyFlat | .NET 8.0 | 32768 | 15.83 ns | 0.008 ns | 0.006 ns | 1.00 | 482 B | - | | FrequencyBlock | .NET 8.0 | 32768 | 17.96 ns | 0.010 ns | 0.009 ns | 1.13 | 460 B | - | | FrequencyBlockAvxNotPinned | .NET 8.0 | 32768 | 24.84 ns | 0.021 ns | 0.019 ns | 1.57 | 753 B | - | | FrequencyBlockAvxPinned | .NET 8.0 | 32768 | 23.28 ns | 0.005 ns | 0.004 ns | 1.47 | 614 B | - | | | | | | | | | | | | FrequencyFlat | .NET 9.0 | 32768 | 15.66 ns | 0.012 ns | 0.011 ns | 1.00 | 472 B | - | | FrequencyBlock | .NET 9.0 | 32768 | 18.34 ns | 0.022 ns | 0.019 ns | 1.17 | 463 B | - | | FrequencyBlockAvxNotPinned | .NET 9.0 | 32768 | 24.77 ns | 0.025 ns | 0.022 ns | 1.58 | 718 B | - | | FrequencyBlockAvxPinned | .NET 9.0 | 32768 | 23.38 ns | 0.015 ns | 0.013 ns | 1.49 | 611 B | - | | | | | | | | | | | | FrequencyFlat | .NET 6.0 | 524288 | 42.61 ns | 0.183 ns | 0.162 ns | 1.00 | 314 B | - | | FrequencyBlock | .NET 6.0 | 524288 | 31.42 ns | 0.493 ns | 0.461 ns | 0.74 | 342 B | - | | FrequencyBlockAvxNotPinned | .NET 6.0 | 524288 | 31.10 ns | 0.480 ns | 0.449 ns | 0.73 | 620 B | - | | FrequencyBlockAvxPinned | .NET 6.0 | 524288 | 25.79 ns | 0.032 ns | 0.027 ns | 0.61 | 521 B | - | | | | | | | | | | | | FrequencyFlat | .NET 8.0 | 524288 | 25.08 ns | 0.244 ns | 0.228 ns | 1.00 | 482 B | - | | FrequencyBlock | .NET 8.0 | 524288 | 24.91 ns | 0.094 ns | 0.083 ns | 0.99 | 460 B | - | | FrequencyBlockAvxNotPinned | .NET 8.0 | 524288 | 27.71 ns | 0.546 ns | 0.536 ns | 1.10 | 753 B | - | | FrequencyBlockAvxPinned | .NET 8.0 | 524288 | 23.41 ns | 0.018 ns | 0.015 ns | 0.93 | 614 B | - | | | | | | | | | | | | FrequencyFlat | .NET 9.0 | 524288 | 24.64 ns | 0.478 ns | 0.424 ns | 1.00 | 472 B | - | | FrequencyBlock | .NET 9.0 | 524288 | 24.60 ns | 0.188 ns | 0.167 ns | 1.00 | 463 B | - | | FrequencyBlockAvxNotPinned | .NET 9.0 | 524288 | 25.83 ns | 0.144 ns | 0.128 ns | 1.05 | 681 B | - | | FrequencyBlockAvxPinned | .NET 9.0 | 524288 | 23.44 ns | 0.011 ns | 0.010 ns | 0.95 | 611 B | - | | | | | | | | | | | | FrequencyFlat | .NET 6.0 | 8388608 | 142.28 ns | 0.667 ns | 0.624 ns | 1.00 | 314 B | - | | FrequencyBlock | .NET 6.0 | 8388608 | 102.97 ns | 0.343 ns | 0.321 ns | 0.72 | 342 B | - | | FrequencyBlockAvxNotPinned | .NET 6.0 | 8388608 | 85.47 ns | 0.596 ns | 0.557 ns | 0.60 | 620 B | - | | FrequencyBlockAvxPinned | .NET 6.0 | 8388608 | 57.11 ns | 0.198 ns | 0.175 ns | 0.40 | 521 B | - | | | | | | | | | | | | FrequencyFlat | .NET 8.0 | 8388608 | 84.30 ns | 0.549 ns | 0.513 ns | 1.00 | 477 B | - | | FrequencyBlock | .NET 8.0 | 8388608 | 69.33 ns | 0.131 ns | 0.116 ns | 0.82 | 460 B | - | | FrequencyBlockAvxNotPinned | .NET 8.0 | 8388608 | 72.32 ns | 0.129 ns | 0.101 ns | 0.86 | 753 B | - | | FrequencyBlockAvxPinned | .NET 8.0 | 8388608 | 50.19 ns | 0.171 ns | 0.160 ns | 0.60 | 614 B | - | | | | | | | | | | | | FrequencyFlat | .NET 9.0 | 8388608 | 83.38 ns | 0.610 ns | 0.541 ns | 1.00 | 472 B | - | | FrequencyBlock | .NET 9.0 | 8388608 | 68.75 ns | 0.159 ns | 0.141 ns | 0.82 | 463 B | - | | FrequencyBlockAvxNotPinned | .NET 9.0 | 8388608 | 69.98 ns | 0.238 ns | 0.223 ns | 0.84 | 718 B | - | | FrequencyBlockAvxPinned | .NET 9.0 | 8388608 | 50.57 ns | 0.380 ns | 0.356 ns | 0.61 | 611 B | - | | | | | | | | | | | | FrequencyFlat | .NET 6.0 | 134217728 | 186.33 ns | 0.907 ns | 0.804 ns | 1.00 | 314 B | - | | FrequencyBlock | .NET 6.0 | 134217728 | 146.52 ns | 2.765 ns | 2.586 ns | 0.79 | 342 B | - | | FrequencyBlockAvxNotPinned | .NET 6.0 | 134217728 | 103.13 ns | 1.148 ns | 1.074 ns | 0.55 | 620 B | - | | FrequencyBlockAvxPinned | .NET 6.0 | 134217728 | 70.37 ns | 0.525 ns | 0.491 ns | 0.38 | 521 B | - | | | | | | | | | | | | FrequencyFlat | .NET 8.0 | 134217728 | 115.60 ns | 0.415 ns | 0.388 ns | 1.00 | 477 B | - | | FrequencyBlock | .NET 8.0 | 134217728 | 87.00 ns | 0.971 ns | 0.908 ns | 0.75 | 460 B | - | | FrequencyBlockAvxNotPinned | .NET 8.0 | 134217728 | 86.52 ns | 1.106 ns | 1.034 ns | 0.75 | 753 B | - | | FrequencyBlockAvxPinned | .NET 8.0 | 134217728 | 62.39 ns | 0.426 ns | 0.399 ns | 0.54 | 614 B | - | | | | | | | | | | | | FrequencyFlat | .NET 9.0 | 134217728 | 114.19 ns | 0.759 ns | 0.673 ns | 1.00 | 472 B | - | | FrequencyBlock | .NET 9.0 | 134217728 | 86.99 ns | 0.496 ns | 0.439 ns | 0.76 | 463 B | - | | FrequencyBlockAvxNotPinned | .NET 9.0 | 134217728 | 83.31 ns | 1.528 ns | 1.429 ns | 0.73 | 718 B | - | | FrequencyBlockAvxPinned | .NET 9.0 | 134217728 | 62.77 ns | 0.410 ns | 0.363 ns | 0.55 | 611 B | - | |
…tFaster.Caching into users/alexpeck/avx2
Tabular data
|
Tabular data
|
Tabular data
|
Tabular data
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We saw good gains from AVX on .NET6, but not on .NET8/9. Not clear whether this is dynamic PGO, and whether the benchmarks are really representative of real world performance.
2 changes here:
MethodImpl(MethodImplOptions.AggressiveInlining)
As tested, this is always faster on AMD, only faster at large sizes on Intel.
TODO: