Add benchmark for the number of minimum cpu cores #5127

alexggh · 2024-07-24T14:50:10Z

Fixes: #5122.

This PR extends the existing single core benchmark_cpu to also build a score of the entire processor by spawning EXPECTED_NUM_CORES(8) threads and averaging their throughput.

This is better than simply checking the number of cores, because also covers multi-tenant environments where the OS sees a high number of available CPUs, but because it has to share it with the rest of his neighbours its total throughput does not satisfy the minimum requirements.

TODO

Obtain reference values on the reference hardware.

Signed-off-by: Alexandru Gheorghe <[email protected]>

paritytech-cicd-pr · 2024-07-24T15:27:01Z

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 2/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6802189

sandreim

A PRDoc is needed for operators. Otherwise LGTM!

polkadot/node/service/src/lib.rs

substrate/client/sysinfo/src/sysinfo.rs

ggwpez · 2024-07-25T10:42:34Z

I ran this 5 times on ref hardware and it is very consistent on 1022 MiBs for the new BLAKE2 parallel metric. Kind of expected on a homogeneous system 😆

ggwpez · 2024-07-25T10:56:40Z

PS: Apparently we did not update these metrics after devops moved the servers from GCP to Scaleway, so these numbers should still be fine for GCP servers:

If we update them now, then we basically raise the requirement from GCP default to something better. Personally i think we can keep it for now since node operators rely on this and use it quite often.

alexggh · 2024-07-25T11:15:23Z

PS: Apparently we did not update these metrics after devops moved the servers from GCP to Scaleway, so these numbers should still be fine for GCP servers:

You mean our current ref hardware it is not the same as the ones we generated reference_hardware.json from ?

If we update them now, then we basically raise the requirement from GCP default to something better. Personally i think we can keep it for now since node operators rely on this and use it quite often.

Yeah, I concur with you that we probably don't want to change the values for existing benchmarks, although I'm not sure where should I generate the new one I want to add.

Signed-off-by: Alexandru Gheorghe <[email protected]>

ggwpez · 2024-07-25T13:04:36Z

You mean our current ref hardware it is not the same as the ones we generated reference_hardware.json from ?

Yea. We migrated the server in the meantime. But the definition of the ref HW in the wiki is still the same, so i think we should keep the values.

Yeah, I concur with you that we probably don't want to change the values for existing benchmarks, although I'm not sure where should I generate the new one I want to add.

Given that the multi core score was pretty much identical to the single thread score on the ref hardware, i think its fine to use the same value in the JSON config file. Any concerns about that?

alexggh · 2024-07-25T13:47:01Z

Yeah, I concur with you that we probably don't want to change the values for existing benchmarks, although I'm not sure where should I generate the new one I want to add.

Given that the multi core score was pretty much identical to the single thread score on the ref hardware, i think its fine to use the same value in the JSON config file. Any concerns about that?

Yeah, that should be fine as well, although I will try to see if I can get my hands on this type of machine.

ggwpez · 2024-07-25T13:55:12Z

You can request access by opening an issue here: https://github.com/paritytech/devops/issues. This is the machien: https://github.com/paritytech/devops/pull/3210

Signed-off-by: Alexandru Gheorghe <[email protected]>

alexggh · 2024-07-30T13:59:07Z

PS: Apparently we did not update these metrics after devops moved the servers from GCP to Scaleway, so these numbers should still be fine for GCP servers:

If we update them now, then we basically raise the requirement from GCP default to something better. Personally i think we can keep it for now since node operators rely on this and use it quite often.

Did some measurements and spelunking on the following hardware-types and what I can tell is that the speed-up in benchmark comes actually from the fact that we introduced SIMD optimisations on the benchmarked functions here
Blake2 and SR25519-Verify without updating the reference performance.

GCP: n2-standard: recommended here: https://wiki.polkadot.network/docs/maintain-guides-how-to-validate-polkadot#reference-hardware and it seems to be the machine type we previously used for benchmarking.

Master

+----------+-----------------------+-------------+-------------+-------------------+
| Category | Function              | Score       | Minimum     | Result            |
+==================================================================================+
| CPU      | BLAKE2-256            | 1.00 GiBs   | 783.27 MiBs | ✅ Pass (131.3 %) |
|----------+-----------------------+-------------+-------------+-------------------|
| CPU      | BLAKE2-256-Parallel-8 | 1.00 GiBs   | 783.27 MiBs | ✅ Pass (131.1 %) |
|----------+-----------------------+-------------+-------------+-------------------|
| CPU      | SR25519-Verify        | 701.33 KiBs | 560.67 KiBs | ✅ Pass (125.1 %) |

Without SIMD optimisations

+----------+-----------------------+-------------+-------------+-------------------+
| Category | Function              | Score       | Minimum     | Result            |
+==================================================================================+
| CPU      | BLAKE2-256            | 805.98 MiBs | 783.27 MiBs | ✅ Pass (102.9 %) |

Scaleway machine type that we currently use for reference hardware, from here https://github.com/paritytech/devops/pull/3210

Master

+==================================================================================+
| CPU      | BLAKE2-256            | 969.47 MiBs | 783.27 MiBs | ✅ Pass (123.8 %) |
|----------+-----------------------+-------------+-------------+-------------------|
| CPU      | BLAKE2-256-Parallel-8 | 966.19 MiBs | 783.27 MiBs | ✅ Pass (123.4 %) |
|----------+-----------------------+-------------+-------------+-------------------|
| CPU      | SR25519-Verify        | 583.56 KiBs | 560.67 KiBs | ✅ Pass (104.1 %) |
|----------+-----------------------+-------------+-------------+-------------------|

Without SIMD optimizations

+----------+-----------------------+-------------+-------------+-------------------+
| Category | Function              | Score       | Minimum     | Result            |
+==================================================================================+
| CPU      | BLAKE2-256            | 782.02 MiBs | 783.27 MiBs | ✅ Pass ( 99.8 %) |

AWS: c6i-4xlarge: recommended here: https://wiki.polkadot.network/docs/maintain-guides-how-to-validate-polkadot#reference-hardware
** Master **

+==================================================================================+
| CPU      | BLAKE2-256            | 1.03 GiBs   | 783.27 MiBs | ✅ Pass (134.3 %) |
|----------+-----------------------+-------------+-------------+-------------------|
| CPU      | BLAKE2-256-Parallel-8 | 1.02 GiBs   | 783.27 MiBs | ✅ Pass (133.8 %) |
|----------+-----------------------+-------------+-------------+-------------------|
| CPU      | SR25519-Verify        | 655.97 KiBs | 560.67 KiBs | ✅ Pass (117.0 %) |
|----------+-----------------------+-------------+-------------+-------------------|

Consequences

Every weight update we did after Blake2 (March 2023) got merged gets the 20-30% cpu speed up, but every validator using the benchmarks to determine if their validator is in parameters gets a false OK, because the reference value have not been increased. That means that there is potential that the weights are understimated with around 20-30%.

What next ?

From my perspective we have a few options here:

We increase the reference hardware benchmarks to reflect the optimisations, the unfortunate immediate consequence of that is that every validator that is around baseline will fail the check, and from https://telemetry.polkadot.io/ these numbers don't seem to be negligible and people will probably not be happy to get this out of the sudden.
Kusama

Polkadot

Regenerate the weights that got the speed up and future weights on a slower hardware closee to the baseline.
Do nothing now, since this might not be a problem(it's been like that for 1.5y) yet and just use the values with the speed up for the newly introduced parallel benchmark BLAKE2-256-Parallel-8 which isn't planned to be enforced right away. This practically enforce validators to slowly converge to hardware where the single core BLAKE2-256 is also in sync with the reference hardware we use for generating the weights.

@ggwpez @koute @PierreBesson, thoughts ?

ggwpez · 2024-07-30T14:21:15Z

Then i think we should bump the numbers. Otherwise we silently and accidentally reduced the single core requirements by merging these dependency updates and not updating them.

Good find, thanks for investigating!

Signed-off-by: Alexandru Gheorghe <[email protected]>

alvicsam · 2024-08-01T10:10:51Z

For transparency, CI is still using GCP machines and we are not planning to change it at least until we finish the ci migration.

alexggh · 2024-08-01T10:49:30Z

For transparency, CI is still using GCP machines and we are not planning to change it at least until we finish the ci migration.

@alvicsam These #5196 updated numbers are pretty similar between GCP and scaleway, overall the conclusion is that the speedup did not came from changing cloud providers, but simply from the optimisations that the code suffered since the reference numbers were last updated.

…5196) Since `May 2023` after paritytech/substrate#13548 optimization, `Blake2256` is faster with about 30%, that means that there is a difference of ~30% between the benchmark values we ask validators to run against and the machine we use for generating the weights.So if all validators, just barely pass the benchmarks our weights are potentially underestimated with about ~20%, so let's bring this two in sync. Same thing happened when we merged #2524 in `Nov 2023` SR25519-Verify became faster with about 10-15% ## Results Generated on machine from here: paritytech/devops#3210 ``` +----------+----------------+--------------+-------------+-------------------+ | Category | Function | Score | Minimum | Result | +============================================================================+ | CPU | BLAKE2-256 | 1.00 GiBs | 783.27 MiBs | ✅ Pass (130.7 %) | |----------+----------------+--------------+-------------+-------------------| | CPU | SR25519-Verify | 637.62 KiBs | 560.67 KiBs | ✅ Pass (113.7 %) | |----------+----------------+--------------+-------------+-------------------| | Memory | Copy | 12.19 GiBs | 11.49 GiBs | ✅ Pass (106.1 %) | ``` Discovered and discussed here: #5127 (comment) ## Downsides Machines that barely passed the benchmark will suddenly find themselves bellow the benchmark, but since that is just an warning and everything else continues as before it shouldn't be too impactful and should give the validators the necessary information that they need to become compliant, since they actually aren't when compared with the used weights. --------- Signed-off-by: Alexandru Gheorghe <[email protected]>

…u_score

Signed-off-by: Alexandru Gheorghe <[email protected]>

polkadot/node/service/src/lib.rs

substrate/client/sysinfo/src/lib.rs

substrate/client/sysinfo/src/sysinfo.rs

ggwpez

Going to approve now so feel free to merge.

alexggh · 2024-08-15T10:43:42Z

Going to approve now so feel free to merge.

Thanks, my plan is to merge this only after https://polkadot.subsquare.io/referenda/1051 passes and I do the necessary updates in: https://wiki.polkadot.network/docs/maintain-guides-how-to-validate-polkadot#reference-hardware.

…aritytech#5196) Since `May 2023` after paritytech/substrate#13548 optimization, `Blake2256` is faster with about 30%, that means that there is a difference of ~30% between the benchmark values we ask validators to run against and the machine we use for generating the weights.So if all validators, just barely pass the benchmarks our weights are potentially underestimated with about ~20%, so let's bring this two in sync. Same thing happened when we merged paritytech#2524 in `Nov 2023` SR25519-Verify became faster with about 10-15% ## Results Generated on machine from here: https://github.com/paritytech/devops/pull/3210 ``` +----------+----------------+--------------+-------------+-------------------+ | Category | Function | Score | Minimum | Result | +============================================================================+ | CPU | BLAKE2-256 | 1.00 GiBs | 783.27 MiBs | ✅ Pass (130.7 %) | |----------+----------------+--------------+-------------+-------------------| | CPU | SR25519-Verify | 637.62 KiBs | 560.67 KiBs | ✅ Pass (113.7 %) | |----------+----------------+--------------+-------------+-------------------| | Memory | Copy | 12.19 GiBs | 11.49 GiBs | ✅ Pass (106.1 %) | ``` Discovered and discussed here: paritytech#5127 (comment) ## Downsides Machines that barely passed the benchmark will suddenly find themselves bellow the benchmark, but since that is just an warning and everything else continues as before it shouldn't be too impactful and should give the validators the necessary information that they need to become compliant, since they actually aren't when compared with the used weights. --------- Signed-off-by: Alexandru Gheorghe <[email protected]>

Signed-off-by: Alexandru Gheorghe <[email protected]>

…u_score

substrate/client/sysinfo/src/lib.rs

substrate/client/sysinfo/src/sysinfo.rs

substrate/utils/frame/benchmarking-cli/src/machine/reference_hardware.json

Signed-off-by: Alexandru Gheorghe <[email protected]>

…u_score

Signed-off-by: Alexandru Gheorghe <[email protected]>

alexggh · 2024-09-05T12:08:20Z

Referenda passed: https://polkadot.subsquare.io/referenda/1051, wiki page updated with w3f/polkadot-wiki#6202, merging it ...

(cherry picked from commit a947cb8)

…5613) This backports #5127, to the stable branch. Unfortunately https://polkadot.subsquare.io/referenda/1051 passed after the cut-off deadline and I missed the window of getting this PR merged. The change itself is super low-risk it just prints a new message to validators that starting with January 2025 the required minimum of hardware cores will be 8, I see value in getting this in front of the validators as soon as possible. Since we did not release things yet and it does not invalidate any QA we already did, it should be painless to include it in the current release. (cherry picked from commit a947cb8)

Add benchmark for the number of the minimum cpu cores

9662ded

Signed-off-by: Alexandru Gheorghe <[email protected]>

alexggh requested a review from koute as a code owner July 24, 2024 14:50

alexggh requested review from sandreim and eskimor July 24, 2024 14:50

alexggh added the T0-node This PR/Issue is related to the topic “node”. label Jul 24, 2024

alexggh requested a review from ggwpez July 24, 2024 14:58

Make clippy happy

213953c

Signed-off-by: Alexandru Gheorghe <[email protected]>

sandreim approved these changes Jul 24, 2024

View reviewed changes

polkadot/node/service/src/lib.rs Outdated Show resolved Hide resolved

substrate/client/sysinfo/src/sysinfo.rs Outdated Show resolved Hide resolved

substrate/client/sysinfo/src/sysinfo.rs Outdated Show resolved Hide resolved

koute reviewed Jul 25, 2024

View reviewed changes

substrate/client/sysinfo/src/sysinfo.rs Show resolved Hide resolved

Make benchmark parametrizable

62f00f6

Signed-off-by: Alexandru Gheorghe <[email protected]>

Fix comment

05c9874

Signed-off-by: Alexandru Gheorghe <[email protected]>

alexggh mentioned this pull request Jul 31, 2024

Bring reference_hardware.json inline with machine used for weights #5196

Merged

Update parallel with the reference_hw value

0df211c

Signed-off-by: Alexandru Gheorghe <[email protected]>

alexggh mentioned this pull request Aug 14, 2024

Update polkadot CPU score to reflect 8 cores are minimum required #5122

Closed

3 tasks

alexggh added 2 commits August 14, 2024 17:26

Merge remote-tracking branch 'origin/master' into alexggh/increase_cp…

ad7a0be

…u_score

Update warning tests

48f3d0f

Signed-off-by: Alexandru Gheorghe <[email protected]>

ggwpez reviewed Aug 14, 2024

View reviewed changes

ggwpez approved these changes Aug 15, 2024

View reviewed changes

alexggh added 2 commits August 28, 2024 18:01

Address review findings

2892f94

Signed-off-by: Alexandru Gheorghe <[email protected]>

Merge remote-tracking branch 'origin/master' into alexggh/increase_cp…

caff938

…u_score

koute reviewed Aug 28, 2024

View reviewed changes

alexggh added 3 commits August 29, 2024 12:15

Address review findings

d3d5bab

Signed-off-by: Alexandru Gheorghe <[email protected]>

Merge remote-tracking branch 'origin/master' into alexggh/increase_cp…

aa3213e

…u_score

Add prdoc

4d71718

Signed-off-by: Alexandru Gheorghe <[email protected]>

koute approved these changes Sep 2, 2024

View reviewed changes

alexggh and others added 3 commits September 3, 2024 13:10

Print the number of expected cores

db943bc

Signed-off-by: Alexandru Gheorghe <[email protected]>

Merge branch 'master' into alexggh/increase_cpu_score

a9cc919

Update pr_5127.prdoc

6f57715

alexggh enabled auto-merge September 5, 2024 12:08

alexggh added this pull request to the merge queue Sep 5, 2024

Merged via the queue into master with commit a947cb8 Sep 5, 2024
162 of 198 checks passed

alexggh deleted the alexggh/increase_cpu_score branch September 5, 2024 13:09

alexggh added a commit that referenced this pull request Sep 6, 2024

[backport] Add benchmark for the number of minimum cpu cores (#5127)

d955a77

(cherry picked from commit a947cb8)

alexggh mentioned this pull request Sep 6, 2024

[backport] Add benchmark for the number of minimum cpu cores (#5127) #5613

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark for the number of minimum cpu cores #5127

Add benchmark for the number of minimum cpu cores #5127

alexggh commented Jul 24, 2024 •

edited

Loading

paritytech-cicd-pr commented Jul 24, 2024

sandreim left a comment

ggwpez commented Jul 25, 2024

ggwpez commented Jul 25, 2024

alexggh commented Jul 25, 2024

ggwpez commented Jul 25, 2024

alexggh commented Jul 25, 2024

ggwpez commented Jul 25, 2024

alexggh commented Jul 30, 2024

ggwpez commented Jul 30, 2024

alvicsam commented Aug 1, 2024

alexggh commented Aug 1, 2024 •

edited

Loading

ggwpez left a comment

alexggh commented Aug 15, 2024

alexggh commented Sep 5, 2024

Add benchmark for the number of minimum cpu cores #5127

Add benchmark for the number of minimum cpu cores #5127

Conversation

alexggh commented Jul 24, 2024 • edited Loading

TODO

paritytech-cicd-pr commented Jul 24, 2024

sandreim left a comment

Choose a reason for hiding this comment

ggwpez commented Jul 25, 2024

ggwpez commented Jul 25, 2024

alexggh commented Jul 25, 2024

ggwpez commented Jul 25, 2024

alexggh commented Jul 25, 2024

ggwpez commented Jul 25, 2024

alexggh commented Jul 30, 2024

Consequences

What next ?

ggwpez commented Jul 30, 2024

alvicsam commented Aug 1, 2024

alexggh commented Aug 1, 2024 • edited Loading

ggwpez left a comment

Choose a reason for hiding this comment

alexggh commented Aug 15, 2024

alexggh commented Sep 5, 2024

alexggh commented Jul 24, 2024 •

edited

Loading

alexggh commented Aug 1, 2024 •

edited

Loading