`approval-voting`: implement parallel processing of signature checks. #731

sandreim · 2023-01-23T12:09:56Z

I've been experimenting with using a thread pool to handle VRF signature checks which appear to be most expensive operation that we are doing in approval voting. After running some benchmarks I got these results on AMD EPYC 7601 32-Core Processor:

check/no-pool           time:   [208.94 ms 209.10 ms 209.31 ms]
                        thrpt:  [4.7777 Kelem/s 4.7825 Kelem/s 4.7861 Kelem/s]
check/pool_size_1       time:   [267.76 ms 271.14 ms 276.34 ms]
                        thrpt:  [3.6187 Kelem/s 3.6881 Kelem/s 3.7346 Kelem/s]
check/pool_size_2       time:   [162.28 ms 163.93 ms 165.20 ms]
                        thrpt:  [6.0532 Kelem/s 6.1001 Kelem/s 6.1621 Kelem/s]
check/pool_size_4       time:   [111.01 ms 112.44 ms 113.99 ms]
                        thrpt:  [8.7728 Kelem/s 8.8934 Kelem/s 9.0084 Kelem/s]
check/pool_size_8       time:   [84.792 ms 85.514 ms 85.961 ms]
                        thrpt:  [11.633 Kelem/s 11.694 Kelem/s 11.794 Kelem/s]

I expect this change to work very well with #732 because it will allow us to multiplex all the CPU intensive work of the subsystem to multiple CPU cores, improving our current single threaded design.

Important note: The number of blocking threads used needs to be bounded and we would also need an upper limit at which we add backpressure.

The text was updated successfully, but these errors were encountered:

burdges · 2023-01-24T06:55:41Z

We do have a batch verification for VRFs in https://github.com/w3f/schnorrkel/blob/master/src/vrf.rs#L536 which likely saves 40%, which works across multiple signers, but slightly increases gossip overhead by 32 bytes per message. I've an unimplemented variant that avoids this 32 byte overhead even.

We could merge all the tranche zero VRFs by the same signer too. We've two options:

We keep the individual outputs for RelayVrfModulo but produce only one signature/proof using https://github.com/w3f/schnorrkel/blob/master/src/vrf.rs#L433 This permits authorities to semi-secretly refuse assignments, but each additional assignment has some marginal cost, maybe 75% savings. These marginal costs do not stack with batch verification.
We just do only one RelayVrfModulo per authority and derive multiple assignments from the output. We'll need an explicit bitfield if we want authorities to refuse some assignments, but this saves us maybe 95%, and this stacks with batch verification.

I doubt being secretive about refused assignments matters much. I doubt either 1 or 2 helps RelayVrfDelay much, but we should tune parameters so that RelayVrfModulo represents maybe 90% or 85% of assignments. Batch verification helps RelayVrfDelay just fine.

All told, we should save over 80% by doing 2, double checking parameters, and maybe doing batch verifications.

sandreim · 2023-01-24T11:28:47Z

This issue (as well #732) are focused on improving performance from an engineering point of view, like solving the bottleneck of having a single threaded approach for processing distribution and import of assignments and votes.

We do have a batch verification for VRFs in https://github.com/w3f/schnorrkel/blob/master/src/vrf.rs#L536 which likely saves 40%, which works across multiple signers, but slightly increases gossip overhead by 32 bytes per message. I've an unimplemented variant that avoids this 32 byte overhead even.

IIUC in our case we have a single signer and this would mean that we could batch it's own RelayVrfDelay assignments for the same tranche (different candidates). If my understanding correct ?

We could merge all the tranche zero VRFs by the same signer too. We've two options:

We keep the individual outputs for RelayVrfModulo but produce only one signature/proof using https://github.com/w3f/schnorrkel/blob/master/src/vrf.rs#L433 This permits authorities to semi-secretly refuse assignments, but each additional assignment has some marginal cost, maybe 75% savings. These marginal costs do not stack with batch verification.

We just do only one RelayVrfModulo per authority and derive multiple assignments from the output. We'll need an explicit bitfield if we want authorities to refuse some assignments, but this saves us maybe 95%, and this stacks with batch verification.

I doubt being secretive about refused assignments matters much. I doubt either 1 or 2 helps RelayVrfDelay much, but we should tune parameters so that RelayVrfModulo represents maybe 90% or 85% of assignments. Batch verification helps RelayVrfDelay just fine.

All told, we should save over 80% by doing 2, double checking parameters, and maybe doing batch verifications.

2 sounds very good to me, but I am not cryptography guy. Can you detail a bit the pros and cons of having RelayVrfModulo represent 85% of assignments in tranche 0?

I will create a ticket for further discussion of these improvements.

burdges · 2023-01-24T12:42:35Z

Answered in the other thread.

sandreim · 2023-03-06T18:17:23Z

FWIW we could go even further by sharding the state and input by (BlockHash, CandidateIndex) and have 2-4 workers that truly work in parallel for importing assignments/votes. We would need to query each worker to be able respond to GetApprovalSignaturesForCandidate and ApprovedAncestor subsystem messages.

rphmeier · 2023-03-10T01:07:52Z

Yeah, I think something along those lines is possible. I don't remember all the details, but I think candidates have to be specifically approved under each fork, right? If so, we can shard by (BlockHash, CandidateIndex) without any issues, except contended DB access.

sandreim · 2023-03-10T07:07:07Z

Since the assignments can also claim more candidates, (BlockHash, ValidatorIndex) makes sense to use. Yes, we track approval of candidates across forks. The DB is structured into BlockEntries, CandidateEntries containing ApprovalEntry. To handle multiple core assignments (as of paritytech/polkadot#6782) assignments from same validator are duplicated in all the CandidateEntries they claim, so we cannot really shard these per candidate. IMO it should be easy for each worker to have it's own DB storage so I assume there is no additional contention.

I expect more latency when handling ApprovedAncestor andGetApprovalSignaturesForCandidate messages but we would just use unbounded sends to workers to prioritise against imports.

burdges · 2023-03-10T07:53:41Z

We approve candidates under each fork because assignment VRFs are seeded by relay chain VRFs. We could move assignments and votes across forks when relay chain block producers equivocate though, which maybe useful.

You might've bigger fish to fry after you merge the tranche 0 assignments, but conversely all those delay assignments add up quickly whenever many no-shows happen.

At a high level, we process gossip messages containing assignments and votes, which result in database writes and deduplication checks, and then our approvals loop reads this database. We should not afaik spend too much time in the approvals loop itself, so assignment VRF signatures could be checked by workers who then push valid assignments into a queue for insertion into the database. At the extreme this could be made no blocking, no?

Bumps [lru](https://github.com/jeromefroe/lru-rs) from 0.7.6 to 0.7.7. - [Release notes](https://github.com/jeromefroe/lru-rs/releases) - [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md) - [Commits](jeromefroe/lru-rs@0.7.6...0.7.7) --- updated-dependencies: - dependency-name: lru dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

sandreim added U1-asap labels Jan 23, 2023

sandreim mentioned this issue Aug 24, 2023

approval-voting: experiment with batching signature checks #730

Open

sandreim mentioned this issue Aug 24, 2023

approval-voting: merge tranche 0 assignments #729

Open

sandreim added the I8-refactor label Mar 6, 2023

Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023

the-right-joyce added I4-refactor Code needs refactoring. T8-parachains_engineering and removed I8-refactor labels Aug 25, 2023

sandreim mentioned this issue Sep 12, 2023

[WIP] approval-voting: batch import assignments #1517

Closed

4 tasks

the-right-joyce added this to parachains team board Oct 23, 2023

the-right-joyce moved this to Backlog in parachains team board Oct 23, 2023

the-right-joyce removed the T8-parachains_engineering label Oct 23, 2023

bkchr pushed a commit that referenced this issue Apr 10, 2024

verify ADDITIONAL_MESSAGE_BYTE_DELIVERY_WEIGHT constant value (#731)

89b0f7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`approval-voting`: implement parallel processing of signature checks. #731

`approval-voting`: implement parallel processing of signature checks. #731

sandreim commented Jan 23, 2023

burdges commented Jan 24, 2023

sandreim commented Jan 24, 2023 •

edited

Loading

burdges commented Jan 24, 2023

sandreim commented Mar 6, 2023 •

edited

Loading

rphmeier commented Mar 10, 2023

sandreim commented Mar 10, 2023 •

edited

Loading

burdges commented Mar 10, 2023

approval-voting: implement parallel processing of signature checks. #731

approval-voting: implement parallel processing of signature checks. #731

Comments

sandreim commented Jan 23, 2023

burdges commented Jan 24, 2023

sandreim commented Jan 24, 2023 • edited Loading

burdges commented Jan 24, 2023

sandreim commented Mar 6, 2023 • edited Loading

rphmeier commented Mar 10, 2023

sandreim commented Mar 10, 2023 • edited Loading

burdges commented Mar 10, 2023

`approval-voting`: implement parallel processing of signature checks. #731

`approval-voting`: implement parallel processing of signature checks. #731

sandreim commented Jan 24, 2023 •

edited

Loading

sandreim commented Mar 6, 2023 •

edited

Loading

sandreim commented Mar 10, 2023 •

edited

Loading