Increase `max_pov_size` to 10MB #5334

sandreim · 2024-08-13T08:05:44Z

Currently we run relay chain with a max PoV size of 5MB on Polkadot/Kusama. We've recently discovered during benchmarking that the storage proof overhead increases significantly with the number of keys in storage such that the parachain throughput drops to 50% with 1 mil accounts for example.

Based on the numbers in #4399 we should be able to double the max pov size and still only require 50% of hw spec bandwidth at worst case.
I'd expect the CPU cost to increase to 100% at worst for the erasure encoding/decoding, we should determine it using subsystem benchmarks and see how it fits with the upcoming new hw specs.

CC @eskimor @burdges

burdges · 2024-08-13T15:17:02Z

I do not have strong feelings on increasing PoV sizes near-term, but certianly if we put gluttons on kusama then it's nice to be able to pressure them properly.

We've recently discovered during benchmarking that the storage proof overhead increases significantly with the number of keys in storage such that the parachain throughput drops to 50% with 1 mil accounts for example.

Yes, we make storage proofs 4x larger than necessarily by inherreting the stupid radix 16 trie from ETH.

s0me0ne-unkn0wn · 2024-08-21T09:31:48Z

So, a rough roadmap could be like that. In the next SDK release, we bump MAX_POV_SIZE to 10 Mb. Runtimes stay with their old 5 Mb limit compiled in, and they will not allow to build a block longer than 5 Mb. We're starting to prepare runtimes based on that new release while people upgrade nodes. When the supermajority upgrades, we're ready to enact new runtimes. After the runtimes are enacted, we get parablocks with 5 Mb proofs (10 Mb / 2) limited by collators (so all the 5 Mb may be used by user transactions without 25% reservation). Then, we can lift that halving limitation pointwise where we need it.

The other option is to make the proof size configurable, but I'm not sure how much sense it makes to do that, given that it's a complication anyway and the event of changing this constant is so rare.

CC @dmitry-markin any concerns from the networking layer side?

CC @eskimor @bkchr @skunert

Feel free to CC other relevant people.

bkchr · 2024-08-21T09:46:56Z

Maximum code size is already controlled by the host configuration:

polkadot-sdk/polkadot/runtime/parachains/src/configuration.rs

Line 73 in 717bbb2

pub max_code_size: u32,

I strongly hope that we have not hard coded anywhere this 5MiB limit.

s0me0ne-unkn0wn · 2024-08-21T10:12:02Z

I strongly hope that we have not hard coded anywhere this 5MiB limit.

As I see from the code, the MAX_POV_SIZE constant in primitives rules everything. Do you think bringing it to the configuration makes sense? I mean, if we want to change it every six months, then it makes sense definitely, but now we're seeing it as a big move and a one-shot change.

sandreim · 2024-08-21T15:35:59Z

The MAX_POV_SIZE constant is also present in req/response protocols config.

I think we should remove all the constant references and use the configuration value.

In the future we might want to raise it above 10MB, I don't see any reason why not if the validators have enough bandwidth and CPU. We'd still be capped at 16MB on the parachain maximum response size of the block request protocol:

polkadot-sdk/substrate/client/network/sync/src/block_request_handler.rs

Line 92 in 717bbb2

16 * 1024 * 1024,

bkchr · 2024-08-21T21:24:03Z

Do you think bringing it to the configuration makes sense?

Otherwise there is no consensus on this number. Which leads to the situation that we need to please all validators to upgrade. This is not really how a decentralized network should work. Using some upper number on the node makes sense, but it should give some leeway to the on chain value.

alexggh · 2024-08-22T07:10:30Z

Do you think bringing it to the configuration makes sense?

Otherwise there is no consensus on this number. Which leads to the situation that we need to please all validators to upgrade. This is not really how a decentralized network should work. Using some upper number on the node makes sense, but it should give some leeway to the on chain value.

The runtime set code does something like this, so I guess that was the leeway which we now want to increase.

if self.max_pov_size > MAX_POV_SIZE {
     return Err(MaxPovSizeExceedHardLimit { max_pov_size: self.max_pov_size })
}

eskimor · 2024-08-27T12:45:40Z

There are two values here:

The limit on networking req/response protocols.
The actual runtime configuration.

(1) poses an upper limit on (2). Yes ideally we would derive the network limit from the runtime configuration, but that would require some refactoring as we currently configure that value on node startup, when setting up the protocol.

It is worth double checking that (2) is correctly used everywhere in validation and not the constant. Other than that the process is as follows:

Bump the networking limit ... this is backwards compatible and can be back ported.
Once enough validators have upgraded we can change the runtime configuration.

Assuming we use the runtime configuration correctly (and persisted validation data, which should be derived from it), there can be no consensus issue. We nevertheless must have the majority of validators upgraded to the higher networking limit, otherwise honest nodes would not be able to fetch a large PoV, which could cause a finality stall.

eskimor · 2024-08-27T12:46:53Z

Do you think bringing it to the configuration makes sense?

Otherwise there is no consensus on this number. Which leads to the situation that we need to please all validators to upgrade. This is not really how a decentralized network should work. Using some upper number on the node makes sense, but it should give some leeway to the on chain value.

The runtime set code does something like this, so I guess that was the leeway which we now want to increase.
if self.max_pov_size > MAX_POV_SIZE {
     return Err(MaxPovSizeExceedHardLimit { max_pov_size: self.max_pov_size })
}

Indeed. First the node side limit, then the runtime value.

dmitry-markin · 2024-08-27T13:02:55Z

One thing regarding the networking req/resp protocols limit to keep in mind, is that the block response limit was set for minimal supported network bandwidth. I.e., it should not time out even on the slowest supported connections.

I don't have the numbers at hand, but the 16 MB block response should not time out in 20 seconds, so the minimum bandwidth is presumably 1 MB/sec in the spec. If we raise this limit, we should also make sure we either increase the request-response protocol timeout (but this can introduce more latency with unresponsive peers) or raise the bandwidth requirements.

alexggh · 2024-08-27T13:09:59Z

From: https://wiki.polkadot.network/docs/maintain-guides-how-to-validate-polkadot#reference-hardware

The minimum symmetric networking speed is set to 500 Mbit/s (= 62.5 MB/s). This is required to support a large number of parachains and allow for proper congestion control in busy network situations.

So at least from theoretical point of view we should have bandwidth for validators.

dmitry-markin · 2024-08-27T13:50:30Z

From: https://wiki.polkadot.network/docs/maintain-guides-how-to-validate-polkadot#reference-hardware

The minimum symmetric networking speed is set to 500 Mbit/s (= 62.5 MB/s). This is required to support a large number of parachains and allow for proper congestion control in busy network situations.

So at least from theoretical point of view we should have bandwidth for validators.

Also, regular full nodes must be able to follow the chain.

aaron2048 · 2024-09-10T19:50:48Z

+1 for this. Moonbeam has adopted Asynchronous backing and increased the EVM gas per block and we're running into difficulties due to this limitation.

AndreiEres · 2024-09-19T12:52:43Z

Moving comments from #5753

We should consider networking speed limitations if we want to increase the maximum PoV size to 10 MB. The current PoV request timeout is set to 1.2s to handle 5 consecutive requests during a 6s block. With the number of parallel requests set to 10, validators will need the following networking speeds:

5 MB PoV: at least 42 MB/s, ideally 50 MB/s.
10 MB PoV: at least 84 MB/s, ideally 100 MB/s.

The current required speed of 50 MB/s aligns with the 62.5 MB/s specified in the reference hardware requirements. Increasing the PoV size to 10 MB may require a higher networking speed. This is worst case scenario, when all blocks you need to recover are full.

sandreim · 2024-09-19T14:17:38Z

Moving comments from #5753

We should consider networking speed limitations if we want to increase the maximum PoV size to 10 MB. The current PoV request timeout is set to 1.2s to handle 5 consecutive requests during a 6s block.

With the number of parallel requests set to 10, validators will need the following networking speeds:
Where does this number come from ?

5 MB PoV: at least 42 MB/s, ideally 50 MB/s.

If we have around 6-7 approvals per validator, then we'd need to recover 5 * 7 = 35 MB per relay chain block, that is almost 6MB/s required to keep up with finality.

For backing, assuming there async backing limits (max_candidate_depth 3, max ancestry len 2), there should be at most 5 * 3 * 2 = 5 MB/s

Which means a total of lets say 11MB/s for all PoV fetches.

10 MB PoV: at least 84 MB/s, ideally 100 MB/s.

This would mean 22MB/s based on the above math which is well under the spec leaving room for all other gossip and relay chain block fetching.

burdges · 2024-09-19T14:36:07Z

We think 100MB/s = 1Gb/s has good odds of handling 15 approvals per valudators then? Or do we think there needs to be a lot more slack somehow?

sandreim · 2024-09-20T10:47:39Z

We think 100MB/s = 1Gb/s has good odds of handling 15 approvals per valudators then? Or do we think there needs to be a lot more slack somehow?

Networking should probably be enough even now for 15. I'd be worried about CPU usage, you'd have 15 * 2 = 30s of execution every 6s seconds on top of backing duties. With updated hw specs means 5cores are busy leaving 3 more for relay chain, networking and parachain consensus. Out of those 3 cores, we'd be easily using one just for erasure coding.

albertov19 · 2024-09-24T12:15:27Z

Hey, are there timelines for when we might bump the PoV to 10 MB?

One of the current issues is that with Async Backing, we bumped the execution time to 2 seconds, but the PoV was kept the same, so we are not reaping the full benefits of increasing the execution time from 0.5 to 2 seconds.

sandreim · 2024-09-26T07:29:24Z

The plan is to get it done this year. In terms of code changes this is very little work, but requires significant testing before we go to Polkadot.

s0me0ne-unkn0wn · 2024-09-28T10:50:43Z

@bkchr After some research, this is what the situation looks like.

I was pleased to discover that max_pov_size is already part of both the host configuration and the persisted validation data. However, this value is never used in runtime (though the original issue paritytech/polkadot#1572 presumed such a usage). It is used in the candidate validation subsys to check the PoV is not too large before performing the actual validation, and also it is used in the collator code to prevent collator from building over this limit (but the check in the collator obviously ignores per dispatch class limits).

In the runtime, the hardcoded MAX_POV_SIZE constant is always used. I believe this is due to the chicken-and-egg situation: both persisted validation data and abridged host configuration data, from where the limit may be derived, are set by the set_validation_data inherent, but by the time of executing the inherents we must already have block length and PoV size limits in place.

One thing that could be done is to set MAX_POV_SIZE constant to the maximum possible value and use it as a bootstrap value when starting to build a parablock, and then, when set_validation_data inherent arrives, overwrite the limits with new ones. The implications are frame_system::BlockWeights and BlockLength couldn't be pallet::constants anymore as we'd have to change them in runtime.

Are you okay with such an approach? Or do you have any better ideas?

bkchr · 2024-09-28T15:54:23Z

Yeah the situation not being the best. I would assume that at least the runtime side on the relay chain is using the value from the HostConfiguration and at best also the backing etc subsystems use the variable pov size. So, this value can be changed without requiring coordinated upgrades of all validators etc.

For the parachain runtime, using a const sounds sensible. This value doesn't change that often and the assumption that it only goes up is also fine to assume. So, the only downside would be that a parachain may not be able to use the full pov size until they have done a runtime upgrade. However, I think this is reasonable.

s0me0ne-unkn0wn · 2024-09-30T12:13:16Z

I would assume that at least the runtime side on the relay chain is using the value from the HostConfiguration

I'm not sure I'm following your thoughts here. When a validator needs to validate a parablock, be it backing or approvals or whatever, it comes through the candidate validation subsystem, which, before instantiating the PVF, surely checks that the PoV is not larger then the limit noted in the persistent validation data, but that's an offchain check.

Still, we need to change MAX_POV_SIZE constant and perform a relay chain runtime upgrade before we can increase the limit because the host configuration pallet will not allow us to go over that hardcoded limit. With this limit being a Polkadot primitive, I'm not sure if we need an RFC or something to bump it.

bkchr · 2024-09-30T17:45:21Z

I'm not sure I'm following your thoughts here. When a validator needs to validate a parablock, be it backing or approvals or whatever, it comes through the candidate validation subsystem, which, before instantiating the PVF, surely checks that the PoV is not larger then the limit noted in the persistent validation data, but that's an offchain check.

Perfect! That is what I meant! max_pov_size is fetched from the on chain state.

Still, we need to change MAX_POV_SIZE constant and perform a relay chain runtime upgrade before we can increase the limit because the host configuration pallet will not allow us to go over that hardcoded limit.

That the hardcoded limit is used there again, is IMO not correct. The max_pov_size should be controlled by the HostConfiguration and not limited by some "random" constant.

With this limit being a Polkadot primitive, I'm not sure if we need an RFC or something to bump it.

I would argue that we don't need any RFC. Or better, depends a little bit if the maximum request/response size for the individual protocols is specced. In a perfect world I would assume that these values are not specced.

@bkchr

Quoting @bkchr (from [here](#5334 (comment))): > That the hardcoded limit is used there again, is IMO not correct. The `max_pov_size` should be controlled by the `HostConfiguration` and not limited by some "random" constant. This PR aims to change the hard limit to a not-so-random constant, allowing more room for maneuvering in the future.

aaron2048 · 2024-10-02T19:58:23Z

The plan is to get it done this year. In terms of code changes this is very little work, but requires significant testing before we go to Polkadot.

Does that mean target is to have it live on Polkadot by year end? Since fully adopting asynchronous backing on Moonbeam, we expanded from 15M gas per block to 60M but since the POV is the same, cost for some tx that are bound by proof size have quadrupled so we're hoping to get this change as soon as possible.

sandreim · 2024-10-02T20:56:03Z

Does that mean target is to have it live on Polkadot by year end?

yes

Since fully adopting asynchronous backing on Moonbeam, we expanded from 15M gas per block to 60M but since the POV is the same, cost for some tx that are bound by proof size have quadrupled so we're hoping to get this change as soon as possible.

I was looking at some metrics for Moonbeam proof size and it doesn't seem to be a bottleneck, see below (source https://www.polkadot-weigher.com/history)

albertov19 · 2024-10-03T09:22:31Z

@sandreim, thanks for providing this. I'll forward it internally. However, the issue is that if there is a block with 60M gas of PoV-heavy transactions, you might see these numbers go up to 100%.

Our blocks were 15M Gas with a 5 MB PoV. Due to the increased execution time we bumped it to 60M gas but for the same 5 MB PoV. Hence, we had to penalize PoV-heavy Ethereum transactions from a gas perspective by bumping the gas estimation 4x. Only those transactions are affected. We estimate gas for execution, storage growth, and PoV and use the estimation or the worst-case scenario.

bkchr · 2024-10-09T20:03:47Z

@albertov19 I still think that you should build a reclaim like functionality that also pays back fees.

crystalin · 2024-11-25T15:12:32Z

Giving an update from Moonbeam side.
We have implement the PoV refund which helps in some situation but not so much. As we are offering Smart Contract execution, it means that the Contract bytecode has to be included in the PoV, so even simple doing low computation would have consequent PoV.

I think the idea of supporting storage in the relay for PoV is also a good direction to investigate (considering the few challenges I don't expect it to be that quick).

In the meantime, I also think we could start to consider increasing it, maybe by small steps (7.5Mb and then 10Mb)

burdges · 2024-11-25T20:42:53Z

The contract should only be included once per PoV, because PoV state proofs should always work like that, so the collator would've denser blocks if they could select many executions of the same contract.

In principle, relay chain storage should've identical costs to PVF size, so common contracts could be inlined into the PVF, but once the PVF gets large then this becomes wasteful since you reupload the whole PVF.

We should run some asymptotic numbers on relay chain storage costs..

The naive raw costs wind up being 33 x + 32 log m for PoV bandwidth, where m is your total storage size, and 1000 x for relay chain storage. We multiply the PoV bandwidth cost by the number f of blocks that require the contract between PVF updates or per day or something like that.

AWS pricing has bandwidth costing 2-5 times what storage costs, but relay chain storage depends upon validator specs, so if we imagine AWS pricing then 500 x to 200 x yields comparable financial costs. AWS pricing fits us poorly, likely our bandwidth winds up statically capped, making it almost priceless eventually, but storage also costs more too since every validator must upgrade. I'll take 1000 x as the near term estimate, but if polkadot becomes popular this changes.

We've this dumb radix 4 tree, but assuming a saner binary tree like NOMT then you need m > 2^x before this f 32 log m term dominates the f 33 x term. We're curious when this happens:

1000 x < f 33 x + f 32 log m
(31.25 / f - 1.031) x < log m

If we take the daily model here, then relay chain storage makes sense if your accessing the contract at least once every 460 blocks. If otoh you access the contract once every say 920 blocks, then you need m > 2^x to justify using relay chain state. In particular, if the contract is 30 bytes, then your state needs 1 billion entries, more than three trimes the size of ethereum.

All this suggests "Would you roll this contract into your PVF?" provides a good litmus test for using relay chain state.

We could improve many things here: Allow PVFs to be split into multiple files, initially just data, but later dynamically linked. Allow these files to be shared across parachains. Spend money improving dynamic linking in Rust ala crABI.

crystalin · 2024-11-25T21:21:02Z

I agree, it needs to make sense economically (Already bandwidth is very expensive for polkadot/kusama, not so much about hte PoV part but all the rest of p2p broadcasting (with the default amount of peers we are already at 50Mb/s on kusama)

To give numbers, I'll take a simple case of a marketplace that is updating its price regularly on Moonbeam. Their design is not optimized for PoV because the limiting factor (specially for ethereum based contract) was on the execution side. They update their price every 30s (too often in my opinion but that is not my project :p ). The transaction simply updates around 10 storage values total but has to access multiple contracts. So the PoV for it is 240kb (around 4% of a block). That is expensive for something that doesn't compute much.

An ideal solution would be to refactor the projects to be PoV aware (But that costs a lot, has risks, and usually push projects away of the ecosystem), or to allow the relay to maintain those contracts (usually between 10-24kb each). There could also be a temporary "cache" for parachain to re-using same storage/contract accross multiple blocks.

If 10Mb is too much for a PoV, we definitely need to search for a way to optimize it somewhere

burdges · 2024-11-25T21:58:24Z

Afaik 10mb is our roadmap, but maybe snags occured.

It's modifying 10ish contracts every 30s? Could you send them a PR that makes it pull the prices from one central location in state? That'd cost 32 log_2 moonbeem_trie_entries bytes per user transaction in NOMT, or maybe 4x that right now. That's 1kb per user tx in NOMT, or 4kb right now.

At some level, we're all middleware so we must fix out customers code for them sometimes when the customer is important enough. That's how our advertising budgets should be spent. ;)

As contracts are byte arrays, then in principle they could be hashed using blake3, so then you could do a blake3 edit proof into the middle of the contract, because blake3 is a merkle tree internally. I'm unsure if anybody implements blake3 edit proofs yet, but they're possible. This isn't going to work too well for contracts, but it blurs the line between data and subtries in a way that seems useful.

sandreim added this to parachains team board Aug 13, 2024

github-project-automation bot moved this to Backlog in parachains team board Aug 13, 2024

sandreim assigned s0me0ne-unkn0wn Aug 14, 2024

eskimor mentioned this issue Aug 23, 2024

Backports #5444

Closed

5 tasks

sandreim mentioned this issue Aug 27, 2024

Use maximum allowed response size for request/response protocols #5503

Closed

burdges mentioned this issue Sep 19, 2024

Correct validator rewards polkadot-fellows/RFCs#119

Draft

sandreim mentioned this issue Sep 19, 2024

Use maximum allowed response size for request/response protocols #5753

Merged

sandreim mentioned this issue Sep 20, 2024

Deploy canary Glutton parachains on Kusama #5593

Open

This was referenced Oct 1, 2024

Set PoV size limit to 10 Mb #5884

Open

Do not enforce PoV size hard limit in config #5887

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase `max_pov_size` to 10MB #5334

Increase `max_pov_size` to 10MB #5334

sandreim commented Aug 13, 2024 •

edited

Loading

burdges commented Aug 13, 2024

s0me0ne-unkn0wn commented Aug 21, 2024

bkchr commented Aug 21, 2024

s0me0ne-unkn0wn commented Aug 21, 2024

sandreim commented Aug 21, 2024

bkchr commented Aug 21, 2024

alexggh commented Aug 22, 2024 •

edited

Loading

eskimor commented Aug 27, 2024

eskimor commented Aug 27, 2024

dmitry-markin commented Aug 27, 2024

alexggh commented Aug 27, 2024

dmitry-markin commented Aug 27, 2024

aaron2048 commented Sep 10, 2024

AndreiEres commented Sep 19, 2024

sandreim commented Sep 19, 2024 •

edited

Loading

burdges commented Sep 19, 2024

sandreim commented Sep 20, 2024

albertov19 commented Sep 24, 2024 •

edited

Loading

sandreim commented Sep 26, 2024 •

edited

Loading

s0me0ne-unkn0wn commented Sep 28, 2024

bkchr commented Sep 28, 2024

s0me0ne-unkn0wn commented Sep 30, 2024

bkchr commented Sep 30, 2024

aaron2048 commented Oct 2, 2024

sandreim commented Oct 2, 2024

albertov19 commented Oct 3, 2024

bkchr commented Oct 9, 2024

crystalin commented Nov 25, 2024

burdges commented Nov 25, 2024

crystalin commented Nov 25, 2024

burdges commented Nov 25, 2024

Increase max_pov_size to 10MB #5334

Increase max_pov_size to 10MB #5334

Comments

sandreim commented Aug 13, 2024 • edited Loading

burdges commented Aug 13, 2024

s0me0ne-unkn0wn commented Aug 21, 2024

bkchr commented Aug 21, 2024

s0me0ne-unkn0wn commented Aug 21, 2024

sandreim commented Aug 21, 2024

bkchr commented Aug 21, 2024

alexggh commented Aug 22, 2024 • edited Loading

eskimor commented Aug 27, 2024

eskimor commented Aug 27, 2024

dmitry-markin commented Aug 27, 2024

alexggh commented Aug 27, 2024

dmitry-markin commented Aug 27, 2024

aaron2048 commented Sep 10, 2024

AndreiEres commented Sep 19, 2024

sandreim commented Sep 19, 2024 • edited Loading

burdges commented Sep 19, 2024

sandreim commented Sep 20, 2024

albertov19 commented Sep 24, 2024 • edited Loading

sandreim commented Sep 26, 2024 • edited Loading

s0me0ne-unkn0wn commented Sep 28, 2024

bkchr commented Sep 28, 2024

s0me0ne-unkn0wn commented Sep 30, 2024

bkchr commented Sep 30, 2024

aaron2048 commented Oct 2, 2024

sandreim commented Oct 2, 2024

albertov19 commented Oct 3, 2024

bkchr commented Oct 9, 2024

crystalin commented Nov 25, 2024

burdges commented Nov 25, 2024

crystalin commented Nov 25, 2024

burdges commented Nov 25, 2024

Increase `max_pov_size` to 10MB #5334

Increase `max_pov_size` to 10MB #5334

sandreim commented Aug 13, 2024 •

edited

Loading

alexggh commented Aug 22, 2024 •

edited

Loading

sandreim commented Sep 19, 2024 •

edited

Loading

albertov19 commented Sep 24, 2024 •

edited

Loading

sandreim commented Sep 26, 2024 •

edited

Loading