-
Notifications
You must be signed in to change notification settings - Fork 1.6k
approval-distribution: process messages while waiting after approval-voting #7393
base: master
Are you sure you want to change the base?
approval-distribution: process messages while waiting after approval-voting #7393
Conversation
…voting In, the current implementation every time we process an assignment or an approval that needs checking in the approval voting, we will wait till approval-voting answers the message. Given that approval-voting will execute some signatures checks that take significant time(between 400us and 1 millis) per message, that's where most of the time in the approval-distribution, see https://github.com/paritytech/polkadot/issues/6608#issuecomment-1590942235 for the numbers. So, modify approval-distribution, so that it picks another message from the queue while the approval-voting is busy doing it's work. This will have a few benefits: 1. Better pipelinening of the messages, approval-voting will always have work to do and it won't have to wait for the approval-distribution to send it a message. Additionally, some of the works of the approval-distribution will be executed in parallel with work in approval-voting instead of serially. 2. By allowing approval-distribution to process messages from it's queue while approval-voting confirms that a message is valid we give the approval-distribution the ability to build a better view about what messages other peers already know, so it won't decide to gossip messages to some of it's peers once we confirm that message as being correct. 3. It opens the door for other optimizations in approval-voting subsystem, which would still be the bottleneck. Note! I still expect the amount of work the combo of this two systems can do, to still be bounded by the numbers of signatures checks it has to do, so we would have to stack this with other optimizations we have in the queue. - https://github.com/paritytech/polkadot/issues/6608 - https://github.com/paritytech/polkadot/issues/6831 [] Evaluate impact in versi [] Cleanup code an make CI happy to make the PR meargeable. Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same approach as in #6247 but it's missing the back pressure mechanism or bounds on the futures unordered
@@ -837,6 +880,26 @@ impl State { | |||
return | |||
} | |||
|
|||
// The approval is in process of being verified, nothing to do here, we don't want to check it multiple times | |||
// just mark that the peer knew about it, so we don't send it to him again | |||
if entry.waiting_to_be_verified.contains(&message_subject, message_kind) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message_kind is assignment and assignment comes before the approval. This should be done in import approval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the comment is wrong it should say.
The assignment is in the process of being verified.
} | ||
.boxed(); | ||
self.answers_from_approval_voting.push(await_future); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This essentially creates an unbounded subsystem internal channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need to limit the number of futures we can have at one time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it is not finished yet, in current state it is mostly to assess the behaviour in versi.
Signed-off-by: Alexandru Gheorghe <[email protected]>
#6285 is actually the latest work on the subject |
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
The CI pipeline was cancelled due to failure one of the required jobs. |
With the fixed implementation the behaviour of a node that process things in parallel versus one that doesn't looks like this.
Comparing the time awaiting approval-voting in the Parallelised version with the ToF for approval-distribution-subsystem in master, gives us a sense of which system processes assignment/approval faster, and looking at the above two pictures I would say it seems clearer that the parallelised version is significantly faster at processing approvals/assignments, with just rare occasions where we have 1-2 messages being in the 400ms-1.6s bucket. With this data, I tend to concur that regardless of the directions we go with coalescing the approvals(paritytech/polkadot-sdk#701) to reduce the amount of work, we would still benefit from implementing a mechanism to push more work towards approval-voting and process the cpu intensive work(signatures checks for approvals and assignments) in parallel. Note!: |
In, the current implementation every time we process an assignment or an approval that needs checking in the approval voting, we will wait till approval-voting answers the message.
Given that approval-voting will execute some signatures checks that take significant time(between 400us and 1 millis) per message, that's where most of the time in the approval-distribution, see
paritytech/polkadot-sdk#732 for the numbers.
So, modify approval-distribution, so that it picks another message from the queue while the approval-voting is busy doing it's work.
This will have a few benefits:
Note!
I still expect the amount of work the combo of this two systems can do, to still be bounded by the numbers of signatures checks it has to do, so we would have to stack this with other optimizations we have in the queue.
approval-distribution
: process assignments and votes in parallel polkadot-sdk#732TODO:
[] Evaluate impact in versi
[] Cleanup code an make CI happy to make the PR meargeable.