-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jailing circuit breaker #404
Comments
Nice writeup, I'll throw out some ideas, but your throttling mechanism or some variation of it seems like a reasonable way forward. ThrottlingLimit the time it takes for a high volume of slash packets to actually affect validator voting power. This is what you've described above.
Panic ThresholdAlternatively, set a threshold for which a certain # of slash packets, or a certain amount of slashed voting % (within some time window), triggers the provider chain to panic and therefore halt. Validators could then evaluate the situation, and take steps from there.
TLDRDo we want the provider to be halted autonomously under certain conditions? Or do we want throttling to occur, where provider validators would have enough time to react to the attack manually |
I'd like to avoid the possibility for malicious consumer code to halt the provider |
@smarshall-spitzbart Not sure that I follow. A slash event will result in the validator getting jailed, which means removing its entire stake from the total stake of the validator set. Why would we do it in multiple smaller steps over multiple endblocks? |
That was just an idea, it was not feasible. cosmos/ibc#869 is implemented in such a way that the jailing always happens atomically, but a validator with a large % voting power would cause the slash meter to go negative, meaning no more slash packets will be handled until the meter is replenished to a positive value |
@danwt just brought up a great point that if we're talking about the scenario where many slash packets are being received by the provider in a short period of time, we should consider the potential size of the packet queue. Is there a param we could set s.t we start to drop slash packets when an unreasonable amount are being received by a certain chain? Unreasonable being defined as: it would infeasible to store all those packets on chain. Edit: Just talked through some solutions with Jehan. The way we're going to alleviate the issue is by adjusting the protocol to drop (not queue) slash packets which are relevant to a validator that is already jailed/tombstoned. This way, there is a limit (per consumer chain) to the amount of slash packets that can actually clog up the queue. We'll have to think about this issue deeper when we start talking about a large number of consumers. See https://github.com/smarshall-spitzbart/ibc/blob/main/spec/app/ics-028-cross-chain-validation/methods.md?plain=1#L1660 |
Just posting this here for awareness, there is a plan and intent to have a more general circuit breaker built-in natively in the SDK: cosmos/cosmos-sdk#926 |
Problem
I'd like to find a way to mitigate the worst-case scenario possible in ICS. This scenario is one where an attacker sneaks code into a consumer chain which is able to send many downtime or double signing packets at once. The attacker then creates 175 validators just below the Hub's active set, and slashes every real validator at once. These validators are then jailed, and control of the chain passes over to the attacker's 175 validators, enabling them to steal all tokens bridged to the Hub over IBC.
To mitigate this scenario, it would be good to put a circuit breaker into the slashing packet receiving code on the provider. This circuit break would make it impossible to jail more than x% (probably between 1-5% would be good) of the power on the provider per hour. This would make the provider takeover attack take around a day, allowing the remaining validators to be alerted and halt the chain.
The design of this feature is not too difficult but will require some thought. Here's a naive design:
I'm not sure if this is correct/optimal tbh
Closing criteria
When this feature is implemented.
TODOs
The text was updated successfully, but these errors were encountered: