Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

approval-voting: implement parallel processing of signature checks. #731

Open
sandreim opened this issue Jan 23, 2023 · 7 comments
Open
Labels
I4-refactor Code needs refactoring.

Comments

@sandreim
Copy link
Contributor

I've been experimenting with using a thread pool to handle VRF signature checks which appear to be most expensive operation that we are doing in approval voting. After running some benchmarks I got these results on AMD EPYC 7601 32-Core Processor:

check/no-pool           time:   [208.94 ms 209.10 ms 209.31 ms]
                        thrpt:  [4.7777 Kelem/s 4.7825 Kelem/s 4.7861 Kelem/s]
check/pool_size_1       time:   [267.76 ms 271.14 ms 276.34 ms]
                        thrpt:  [3.6187 Kelem/s 3.6881 Kelem/s 3.7346 Kelem/s]
check/pool_size_2       time:   [162.28 ms 163.93 ms 165.20 ms]
                        thrpt:  [6.0532 Kelem/s 6.1001 Kelem/s 6.1621 Kelem/s]
check/pool_size_4       time:   [111.01 ms 112.44 ms 113.99 ms]
                        thrpt:  [8.7728 Kelem/s 8.8934 Kelem/s 9.0084 Kelem/s]
check/pool_size_8       time:   [84.792 ms 85.514 ms 85.961 ms]
                        thrpt:  [11.633 Kelem/s 11.694 Kelem/s 11.794 Kelem/s]

I expect this change to work very well with #732 because it will allow us to multiplex all the CPU intensive work of the subsystem to multiple CPU cores, improving our current single threaded design.

Important note: The number of blocking threads used needs to be bounded and we would also need an upper limit at which we add backpressure.

@burdges
Copy link

burdges commented Jan 24, 2023

We do have a batch verification for VRFs in https://github.com/w3f/schnorrkel/blob/master/src/vrf.rs#L536 which likely saves 40%, which works across multiple signers, but slightly increases gossip overhead by 32 bytes per message. I've an unimplemented variant that avoids this 32 byte overhead even.

We could merge all the tranche zero VRFs by the same signer too. We've two options:

  1. We keep the individual outputs for RelayVrfModulo but produce only one signature/proof using https://github.com/w3f/schnorrkel/blob/master/src/vrf.rs#L433 This permits authorities to semi-secretly refuse assignments, but each additional assignment has some marginal cost, maybe 75% savings. These marginal costs do not stack with batch verification.
  2. We just do only one RelayVrfModulo per authority and derive multiple assignments from the output. We'll need an explicit bitfield if we want authorities to refuse some assignments, but this saves us maybe 95%, and this stacks with batch verification.

I doubt being secretive about refused assignments matters much. I doubt either 1 or 2 helps RelayVrfDelay much, but we should tune parameters so that RelayVrfModulo represents maybe 90% or 85% of assignments. Batch verification helps RelayVrfDelay just fine.

All told, we should save over 80% by doing 2, double checking parameters, and maybe doing batch verifications.

@sandreim
Copy link
Contributor Author

sandreim commented Jan 24, 2023

This issue (as well #732) are focused on improving performance from an engineering point of view, like solving the bottleneck of having a single threaded approach for processing distribution and import of assignments and votes.

We do have a batch verification for VRFs in https://github.com/w3f/schnorrkel/blob/master/src/vrf.rs#L536 which likely saves 40%, which works across multiple signers, but slightly increases gossip overhead by 32 bytes per message. I've an unimplemented variant that avoids this 32 byte overhead even.

IIUC in our case we have a single signer and this would mean that we could batch it's own RelayVrfDelay assignments for the same tranche (different candidates). If my understanding correct ?

We could merge all the tranche zero VRFs by the same signer too. We've two options:

  1. We keep the individual outputs for RelayVrfModulo but produce only one signature/proof using https://github.com/w3f/schnorrkel/blob/master/src/vrf.rs#L433 This permits authorities to semi-secretly refuse assignments, but each additional assignment has some marginal cost, maybe 75% savings. These marginal costs do not stack with batch verification.
  2. We just do only one RelayVrfModulo per authority and derive multiple assignments from the output. We'll need an explicit bitfield if we want authorities to refuse some assignments, but this saves us maybe 95%, and this stacks with batch verification.

I doubt being secretive about refused assignments matters much. I doubt either 1 or 2 helps RelayVrfDelay much, but we should tune parameters so that RelayVrfModulo represents maybe 90% or 85% of assignments. Batch verification helps RelayVrfDelay just fine.

All told, we should save over 80% by doing 2, double checking parameters, and maybe doing batch verifications.

2 sounds very good to me, but I am not cryptography guy. Can you detail a bit the pros and cons of having RelayVrfModulo represent 85% of assignments in tranche 0?

I will create a ticket for further discussion of these improvements.

@burdges
Copy link

burdges commented Jan 24, 2023

Answered in the other thread.

@sandreim
Copy link
Contributor Author

sandreim commented Mar 6, 2023

FWIW we could go even further by sharding the state and input by (BlockHash, CandidateIndex) and have 2-4 workers that truly work in parallel for importing assignments/votes. We would need to query each worker to be able respond to GetApprovalSignaturesForCandidate and ApprovedAncestor subsystem messages.

@rphmeier
Copy link
Contributor

Yeah, I think something along those lines is possible. I don't remember all the details, but I think candidates have to be specifically approved under each fork, right? If so, we can shard by (BlockHash, CandidateIndex) without any issues, except contended DB access.

@sandreim
Copy link
Contributor Author

sandreim commented Mar 10, 2023

Since the assignments can also claim more candidates, (BlockHash, ValidatorIndex) makes sense to use. Yes, we track approval of candidates across forks. The DB is structured into BlockEntries, CandidateEntries containing ApprovalEntry. To handle multiple core assignments (as of paritytech/polkadot#6782) assignments from same validator are duplicated in all the CandidateEntries they claim, so we cannot really shard these per candidate. IMO it should be easy for each worker to have it's own DB storage so I assume there is no additional contention.

I expect more latency when handling ApprovedAncestor andGetApprovalSignaturesForCandidate messages but we would just use unbounded sends to workers to prioritise against imports.

@burdges
Copy link

burdges commented Mar 10, 2023

We approve candidates under each fork because assignment VRFs are seeded by relay chain VRFs. We could move assignments and votes across forks when relay chain block producers equivocate though, which maybe useful.

You might've bigger fish to fry after you merge the tranche 0 assignments, but conversely all those delay assignments add up quickly whenever many no-shows happen.

At a high level, we process gossip messages containing assignments and votes, which result in database writes and deduplication checks, and then our approvals loop reads this database. We should not afaik spend too much time in the approvals loop itself, so assignment VRF signatures could be checked by workers who then push valid assignments into a queue for insertion into the database. At the extreme this could be made no blocking, no?

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024
Bumps [lru](https://github.com/jeromefroe/lru-rs) from 0.7.6 to 0.7.7.
- [Release notes](https://github.com/jeromefroe/lru-rs/releases)
- [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md)
- [Commits](jeromefroe/lru-rs@0.7.6...0.7.7)

---
updated-dependencies:
- dependency-name: lru
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I4-refactor Code needs refactoring.
Projects
Status: Backlog
Development

No branches or pull requests

4 participants