-
Notifications
You must be signed in to change notification settings - Fork 680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time Disputes #742
Comments
It occurs to me that time disputes are somewhat likely to be inconclusive, as honest validators may be split so that neither side achieves the necessary 2/3. This likelihood increases if some number of malicious validators withhold their votes in disputes. Keeping many disputes alive through their entire timeout period seems like a potential DOS vector. Perhaps the large number of simultaneous legitimate disputes could be leveraged to spam consistent dispute statements from malicious validators. Those dispute statements would at least bypass spam slots since they reference disputes for included candidates. Perhaps the total message volume produced this way would still be insignificant? This seems less dangerous if backers are still charged era points for validation time approvers take beyond the backing timeout regardless of whether disputes conclude. As long as the cost to malicious backers is sufficiently high they can't spam time disputes like I described. Is my thinking on the right track here? |
We might've some miss-conceptions here, so maybe you want to chat with @eskimor and/or me, but roughly speaking.. We should call these time overruns, not time disputes, because disputes causes confusion: time overruns are not disputes, do not take dispute slots, do not trigger validators to do extra work, and do not need 2/3rd of anything. We just need some mechanism by which we compute the median execution time declared by the approval votes. Anyone not declaring a time de facto votes 2s. We do not rerun the approval logic on-chain so any approval vote counts in the median, which sucks since whales could cheat, but if they do so then we need governance to manually slash them. In future, we could likely move this whole penalties system off-chain with the off-chain rewards system, but I've not thought much about doing so yet, but it'll hopefully solve the whales issue. We might "bill" backers nominators stake, not just take era point, if we foresee time overrun costs could exceed what era points pay, but doing so should not count as a slash. It really depends upon how much parathread blocks cost. We do also have real time disputes in which we claim a block is invalid for taking way way too long. We've three possibilities here:
Assuming 2/3rd honest, we judge 3 to be a serious code bug, which requires a bug fix & host upgrade, not slashing. In particular, our validators who'd raise time dispute were all replaced as no-shows already, so approvals being secure means we de facto escalated already in case 1, due to all honest nodes being no-shows, or else we approved in case 1 or 2, so likely this dispute comes after the block was already finalized, and governance sorts it out anyways. |
With regards to the question posted by @eskimor: Question? One potential way of exploiting this I see is a Assuming we use an average of time reports:
Assuming we use a median as pointed by @burdges (better but still seems exploitable):
Is there anything that protects us from that? Real time disputes issue:
Also @burdges you mention "We've four possibilities here:" but only list 3 options. |
Yes, adversaries could unjustly bill people anytime they've enough approval checkers for a block (and wish to target the backers). We'd need to fix this in governance, aka victims must convince others to run the block, report faster times, and eventually pass some refund motion. I've previously mentioned a percentile higher than median here too, which reduces risks but requires correspondingly higher fees, not really sure there. We do not slash incorrect voters 100% in that case, only the backers. I suggested a tiered slashing for invalid blocks, so 100% for backers, 10% for approval checkers, and 1% for voters. We didn't choose that option I think, but now I've forgotten why. |
Thanks to @Overkillus for clarifying what I meant. I was indeed thinking of attacks exploiting disputes caused by extreme time overruns, which you described better. @burdges Mind explaining the reasoning that brings you to this judgement? |
@burdges The chance that we had more than 50% honest approvers for all 42 parablocks is: And during an hour we have 25200 attempts instead of 42 which practically guarantees (e-237 that it doesn't) that at least once there will be an approval-checker group with more than 50% malicious actors making the attack a possibility. If the attack is basically guaranteed to occur every hour how can we reasonably say that governance can manually handle that? If the situation was expected to occur once every 10 years maybe you could say the governance can step in and make an adjustment, but at this point those are too frequent for manual intervention. With regards to the slashing response in Real Time Disputes: Also what justifies the slashing of the backer in that case?:
In the above example it seems the backer shouldn't get slashed and in the change proposed by @eskimor the backer will only be fined based on the time overrun reports. It should also mean that the total fine should cover the costs that normally would be applied to the 25% honest validators that were tricked into raising the dispute. Is that correct? The edge case in that regard is such a block that splits the validator set in 33.3% voting invalid and 66.7% voting valid (including the backer). Based on the approval-checkers timeout constant and the expected execution time variance one would need to calculate what execution time splits the community in such a way. Then calibrate the expected time overrun fees for that particular execution time to be higher than the costs normally incurred by the 33.3% group of validators voting invalid in a dispute that concludes valid (unsuccessful dispute). In the analogous case where the backer picks a block such that that 75% of the time it takes MORE than the approval-checkers timeout a backer should be directly slashed as he will loose the dispute when it comes to it. |
@BradleyOlson64 We need higher time overrun fees if 3 ever happens, but yes more a parameterization error than a code bug, but we fix it if it happens, and it's still our mistake so no penalties. @Overkillus We're not limited to refunding in governance. We should be able to manually slash from governance too, which sounds very appropriate for the grieffing attack you describe. We've other flavors where 1/3 targets a small-ish number of validators, avoids harming themselves, avoids bystanders, etc., which all make the attack less unrealistic than pure grieffing but also offers less frequent attack windows. We do not much care though because governance refunds and governance slashing remains the overall solution. We've many places where exceptional behavior falls upon governance, and we'll add more this year by removing some complexity from slashing, so really we need some "polkadot constitution" document that says how our design expect governance to react to various exceptional situations situations. We need code to enforce soundness, safety, and liveness each within some reasonable parameters, deal with fast attacks, etc. We never thought code could cover every case correctly though because afaik all other peer-to-peer networks require human intervention sometimes. We won't slash the backer in your case: At some point someone escalates by saying the block runs longer than the approval checker timeout, which actually does not matter much because.. We've already de facto escalated long before this happens however, since every 12s or soon 24s we'll have a new batch of no-shows, which includes everyone honest. In any case, we now have 2/3rds who claim the block runs insanely slow, even if they claim valid, so we're already charging the dishonest backer enough to pay for the dispute, so either valid or invalid results sound fine. We do however punish one honest node here for raising the time dispute. We've discussed this no longer being slashing per se, but merely fees like the time overrun fees the backer pays or related. We could, and likely should, reduce this fee by whatever the backers pay, which requires yet more careful balancing of course. Anyways.. Yes, there are a bunch of easy implementation mistakes, like say ignoring the invalid votes when computing the median, not having the backers pay enough, etc., so yes we should spell all this out as much as possible. In my mind, we've basically one really tricky question: Can or should the off-chain rewards protocol handle time overruns? We'll discuss this in future I think. |
Yes this is what I would suggest. If we know that the execution time was at the high end, the voting invalid validators will pay nothing. (The backers already paid)
One thing to consider here is that the no-show timeout should be much lower than the worst case execution timeout. Therefore providing such large numbers does mean escalation and we would be getting much more checkers. Consequences:
With 2) adversaries' effectiveness should be greatly reduced. But I agree with @Overkillus - potential frequency of such an attack matters. We should do some proper worst case calculations and design it so, that resorting to governance/human intervention is feasible. E.g. if we assume execution time of up to 12 seconds will never/rarely cause a no-show, we could simply not charge anything up to that time (no escalation - everything is fine). Therefore with approval voters reporting times less than those 12 seconds, no harm is done - if they go above, we cover them as if they were a no-show and thus get more time values in. For frequencies that are acceptable: I don't think we need to go to 10 years, any time frame that allows humans to react will do. E.g. once per week should be totally acceptable:
As long as those 2 are maintained, potential frequency of such an attack matters not too much - although it is a good idea to limit it for defense in depth. As long as 1 is maintained the security of the network is not at risk, therefore this might also be one of the few occasions where it would be acceptable to loosen our byzantine assumptions:
-> Highly unlikely to find 1/3 of validators trying to do this. Anyhow, I actually don't think this will be necessary.
As long as f+1 validators voted valid we can assume it is fine (at least one honest validator voted in favor of the candidate). Who has to pay what can then be determined by reported timing information.
This would conclude the dispute as "candidate invalid". We do not currently slash approval voters, so this would only slash backers which have not been honest (they obviously ignored the backing timeout).
Yes.
Correct. Thanks @Overkillus ! Very useful input. |
Interesting, we'd ensure that reporting time overruns creates more checkers who perhaps contradict your report. Ain't so simple however because our approvals counter loop actually un-counts no-shows once they finally voted. We could complicate it's logic of course, like by counting and not un-counting your declared no-shows, but.. We do make various compromises all over the place, but we do prioritize comparatively higher priority design like soundness over comparatively lower priority design like correct billing. I'm thus hesitant to even slightly complicate the approvals counter unless it's really the right solutions, as it's soundness code. Also, if an off-chain solution works in future then we could likely count the minimum of message arrival time and the declared time, which likely fixes this cleanly without touching the approvals counter. That's many future ifs but it's reason not to do this now. Anyways, do we really need this? Case 1. An honest backer makes a block that runs under 2s but haters report it run slow. I still think governance could handle this case all by itself. In other words, the human backer should report the haters by sharing the PoV and the approval signatures with someone in governance, who then runs the block and reports under 2s. After this, more humans in governance run the block, and then finally they vote to refund the backer and slash the haters. Yes, your suggestion automates this somewhat, and maybe simple enough to justify, but again maybe not if off-chain ever works. Case 2. A dishonest backer makes a borderline block. Can adversarial nodes inflating runtimes help the adversary? It perhaps increase their own fees, but afaik minimal other effect. We could delay doing this until after we discover if punishments fit into the off-chain rewards system?
Once per minute is kinda acceptable if governance eventually manually slashes the haters. Yeah, they can make us look bad by bringing everything to a halt, but only like once, and then they'll wind up gone for good. Yes, longer is better however. Again the real problem here is that anybody can vote, not just the approval checkers, which happens because we're doing this on-chain instead of doing it off-chain. I should do a proper write up for the new slashing and punishments system soon, so maybe the off-chain rewards system this makes a good companion write up.
We do however abandon a disputed fork that never achieves 2f+1 though, right? |
This issue has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/polkadot-dispute-storm-the-postmortem/2550/1 |
I've been adding a time overruns section to the implementer's guide and found it helpful to try to model what the actual charge would be. One way to think about it is in terms of an unlikely worst case scenario: a dispute that concludes valid, but where 33% of approval checkers voted invalid due to reaching the execution timeout. Assuming a 0.1% slash for disputes where the supermajority concludes the candidate is valid, 1000 paravalidators, and equal amounts bonded between validators; then the maximum amount collectively slashed from approval checkers disputing a valid candidate is equal to 33% of the backer's bond. We can ask how long the median should be in order for us to assume the approval checkers who voted invalid actually timed out and therefore shouldn't be slashed. A few questions I'm unclear about:
Other than that we could use something like:
Then a median overrun of 58% of the approval checking timeout (7.8s with a 12s timeout or 24s with a 40s timeout) will result in the backer being charged 33% and the timed out approval checkers not being slashed at all. On the other hand, a median overrun of 10% (3s with a 12s timeout or 5.8s with a 40s timeout) would result in a charge to the backer of 1%. |
Some results of our latest discussion: Slashing LogicSlashing resolution on concluding valid disputes is going to be changed based on time information: Use of previous logicIf time report variations are within reasonable limits from the calculated median (of all reported votes), then the slashing logic is the same as it was before: The validators voting invalid will get slashed some amount that is supposed to make up for the wasted effort of the network. Time information based slashesOn the flip side, if there are time reports that differ a lot from the median, who voted invalid will be irrelevant, instead we charge whoever deviates too much from the median. Also invalid votes should contain time information - we should not just assume max time, explanation in the section "Backers raising a dispute". This means, if the median is rather low, voters (valid or invalid) with high values will get slashed - they are either unacceptably slow or have been dishonest with their values. If the median is on the high end, then we will assume backers have been messing with us and they get slashed (together with all approval checkers also reporting such low times). If the median is somewhere in the middle, it can happen that we will end up slashing people on both ends. The likely explanation for such a situation would be that there are people on both ends trying to mess with us at the same time. Interaction with charging backers without a disputeIn case of an actual dispute, any charging of backers due to approval votes will be dropped/replaced with the actual dispute resolution as described above. In case of a dispute, we get time information from everyone, so with our 2/3 honest assumption we actually get a reliable median so the one of the tranch 0 approval checkers becomes superseded. Backer raising a disputeSolution to the above problems that malicious approval checkers could be messing with backers, reporting blown up times to get backers paying based on tranch 0 time information: What a backer can do to defend himself is raising a dispute! If the backer notices approval votes with reported times that result in a median that would result in him having to pay, he can equivocate and send an additional explicit invalid vote- with the same time as the backing timeout (or the actual time it took it to validate the candidate). Then assuming the dispute resolves for the candidate (which should be the case if the backer is honest) the above slashing logic kicks in and the backer will not have to pay anything, but instead the approval voters, who tried messing with the backer. The equivocation does not matter either, because we are not slashing based on valid/invalid. For this to work values have to be picked carefully: It should not be possible to have a backer charged (a significant amount/at all), but at the same time have dispute resolution result in case one, where there are no votes deviating too much from the median. So thresholds for charging have to be in sync with this, which seems to be only logical, but worth mentioning anyways. Summary: With this simple mechanism a 2% chance is totally acceptable, because it would no longer be risk free for the attackers, in fact having to pay the bill is virtually guaranteed. At least together with the fact that there is not even a direct incentive to do the attack in the first place, it is hard to imagine that with this in place people would even try. Which is disputes serving their purpose: "Being there for them not ever being needed to run, because of their mere existence.". And if they tried anyways, they would pay the bill. Open QuestionsExact numbers. E.g. we assume "normal" fluctuations in time to be maxing out at around 6, this results in a something like accepted sqrt(6) for expected deviations from the median (or not?) - in any case it should be smaller. If reports are within that window, nobody should be charged based on time information and dispute resolution would be option 1. |
We do want the backer who disputes to set some "valid" flag in his dispute. In fact, we'll want them to quote a vote with a very different time. It'll help debug the system obviously, but also. Any dispute should leave the backer on-the-hook for whatever fees other voters do not pay. If not, the backer can dispute themselves, but since everyone agrees on the time, then everybody checks but nobody pays. We'll know someone pays if the backer must quote some vote with a very different time. I think backers should not raise these time disputes just because a few nodes have very different times, but only do so when they risk being charged, so maybe |
Yes, that what I wrote:
The backer would only dispute if the median would result in him getting charged. I also don't understand how a dispute can be raised without a slash: There are two options:
Hence if there is no attack on the backer, raising a dispute will get the backer slashed. Same as for any other unjustified dispute. |
What we discussed is actually changing this to 1% of the minimum bond of the backer and all approval checkers, split between everyone who votes against a valid candidate and the backer according to the time overrun curve. So in this scenario where there's no overrun charge it's 1% split between all the dishonest approval checkers vs. .1% each we've previously suggested. |
I'm not sure using the standard deviation is correct. In my draft of the guide section I'm proposing just 3x, so starting charging at 3x backing timeout, maxxing out at 1/3 approval checking timeout, and detecting inflation when |
…ch#742) * Starts working on weak subjectivity period check * Adds weak subjectivity check. * fmt * Adds WeakSubjectivityPeriod config to snowblink and snowbridge runtime. * Fix tests. * Fix tab. * Converts weak subjectivity check to system time instead of block time. Adds bridge blocked flag. * Fix tests and fmt * Tiny update * Reverts some of the logic to make way for long range attack governance handling. * Refactors finalized header state into a single storage item. Use config for weak subjectivity period check. * fmt and fix benchmarks, tests * Fix tests Co-authored-by: claravanstaden <Cats 4 life!>
Initial implementation for the plan discussed here: #701 Built on top of #1178 v0: paritytech/polkadot#7554, ## Overall idea When approval-voting checks a candidate and is ready to advertise the approval, defer it in a per-relay chain block until we either have MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what candidates we have available. This should allow us to reduce the number of approvals messages we have to create/send/verify. The parameters are configurable, so we should find some values that balance: - Security of the network: Delaying broadcasting of an approval shouldn't but the finality at risk and to make sure that never happens we won't delay sending a vote if we are past 2/3 from the no-show time. - Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 & MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from the measurements we did on versi, it bottlenecks approval-distribution/approval-voting when increase significantly the number of validators and parachains - Block storage: In case of disputes we have to import this votes on chain and that increase the necessary storage with MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that disputes are not the normal way of the network functioning and we will limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this should be good enough. Alternatively, we could try to create a better way to store this on-chain through indirection, if that's needed. ## Other fixes: - Fixed the fact that we were sending random assignments to non-validators, that was wrong because those won't do anything with it and they won't gossip it either because they do not have a grid topology set, so we would waste the random assignments. - Added metrics to be able to debug potential no-shows and mis-processing of approvals/assignments. ## TODO: - [x] Get feedback, that this is moving in the right direction. @ordian @sandreim @eskimor @burdges, let me know what you think. - [x] More and more testing. - [x] Test in versi. - [x] Make MAX_APPROVAL_COALESCE_COUNT & MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration. - [x] Make sure the backwards compatibility works correctly - [x] Make sure this direction is compatible with other streams of work: #635 & #742 - [x] Final versi burn-in before merging --------- Signed-off-by: Alexandru Gheorghe <[email protected]>
* Use `pallet::getter` syntax * Leave `current_block_hash`
Latest design here.
Basic Idea/Design Considerations
New objectives:
time, because it is taking long - approval voting is secure regardless.
finalize.
to a <2s execution.
the backing timeout then this is put on chain.
exponential and reaches the cost of raising a concluding valid dispute. Thus
we can actually have very long approval voting timeouts.
if that bill has already been paid by the backers via 5. (Getting honest
validators slashed is thus no longer possible.) Basically validators voting
invalid only need to get charged whatever the backers were not charged yet.
approval checkers (and dispute participants) only if there is no (small
enough) time report of approval checkers. This way there is an incentive to
actually validate (and not just always vote valid -which would be the case if we always only slashed backers).
Questions?
charged? Assuming enough honest approval checkers, this should not be
possible.
Threats/Complications:
Getting those approval votes (time) on chain:
Tl;DR: By charging backers for excess time proportional to the amount, we can afford very long approval checking timeouts. This way natural fluctuations in load/performance should no longer realistically cause a dispute and even if that timeout is triggered by malicious backers, they would end up paying the bill. We would slash backers on concluding invalid disputes, but also other nodes voting valid if the timing reported by approval checkers is low enough, that it can be deemed unlikely that the nodes who voted invalid did so because of the timeout. This way approval checkers keep having an incentive to actually do the validation and not just always vote valid (which would be risk free, if we only ever slashed backers).
The text was updated successfully, but these errors were encountered: