Cache request protocol version in availability-recovery #3127

alindima · 2024-01-30T10:15:09Z

Prerequisite: #1644

See #1644 (comment) for details of the improvements

alindima · 2024-05-24T07:31:51Z

There are a couple of ways to implement this:

Record the protocol used for receiving the response from our peer and cache it in the subsystem. This gets quite messy to implement since we run recoveries in parallel for multiple candidates so we'd need to have and mutate a shared cache between recovery tasks.
Expose the responses of the Identify protocol and record them in the subsystem. These contain the list of supported protocols of our peer and is being fetched on a new connection. This needs some extra support in substrate

Now, the caveat of both approaches is that they are an optimisation that's only effective while not all validators are upgraded. Once they're all upgraded, the code will be redundant and would potentially send/record unnecessary events.
Moreover, production networks rarely do chunk recovery for now. Most of the time they simply fetch the full data from backers (since most POVs are less than 128Kib in compressed size).

In the worst case, with a mixed validator set (half updated, half unupdated), the updated nodes will make an extra round-trip when fetching chunks from unupdated nodes.

I measured this in practice and the cost is negligible considering total POV recovery time.

Measuring this with subsystem-bench (with an extra latency of 100ms for the second request):

The first half simulates all nodes making 2 round trips for all chunk requests.

I also measured this in versi, with 50 validators and 9 glutton parachains and POVs of 2.5 mib.

The average PoV recovery time with all unupgraded nodes is 528ms.
The average PoV recovery time will half upgraded and half unupgraded nodes is 674 ms.

As you can see, the large consumer of recovery time is reed-solomon.

Considering all the above, I'll close this issue and conclude that this small optimisation is not worth implementing

alindima added T0-node This PR/Issue is related to the topic “node”. I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T8-polkadot This PR/Issue is related to/affects the Polkadot network. labels Jan 30, 2024

alindima added this to parachains team board Jan 30, 2024

github-project-automation bot moved this to Backlog in parachains team board Jan 30, 2024

alindima mentioned this issue May 10, 2024

Add availability-recovery from systematic chunks #1644

Merged

10 tasks

alindima closed this as completed May 24, 2024

github-project-automation bot moved this from Backlog to Completed in parachains team board May 24, 2024

alindima mentioned this issue May 28, 2024

subsystem-bench: add emulated network latency for messages and requests coming from local validator #4611

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache request protocol version in availability-recovery #3127

Cache request protocol version in availability-recovery #3127

alindima commented Jan 30, 2024

alindima commented May 24, 2024

Cache request protocol version in availability-recovery #3127

Cache request protocol version in availability-recovery #3127

Comments

alindima commented Jan 30, 2024

alindima commented May 24, 2024