Burst Denial of Service in Proof of Time #2142

vanhauser-thc · 2023-10-20T15:27:28Z

[Medium] Burst Denial of Service in Proof of Time

Summary

An attacker can abuse the high cost associated with verifying Proof of Time (PoT).

Issue details

Based on the AES Latency Report, an estimated ASIC is expected to perform the PoT generation 2.4x faster than a COTS hardware.

When an ASIC produces PoTs every second, COTS hardware takes 2.4 seconds to generate it independently and 300ms to verify a PoT message, assuming it can execute 8 parallel tasks, one for each checkpoint.

Attack scenario 1: If up to seven received PoT messages are in the queue for an iteration
let correct_proof = if potentially_matching_proofs.len() < EXPECTED_POT_VERIFICATION_SPEEDUP { [1]
then one PoT message after another is checked, and the first that is successfully verified is taken. An attacker then sends a handful of faulty PoT messages from different nodes early into the network (e.g. so that the first messages are the faulty ones).
If e.g. the 6th message is a valid, then the node has spent 5 * 300ms = 1.5 seconds for verification.

As the iterations to be performed are allowed to be 50% above the current iteration number on the chain, this becomes 2.25 seconds (proof.slot_iterations.get() <= current_slot_iterations.get() * 3 / 2).

Attack scenario 2: If eight or more PoT messages were received, the node itself does the PoT generation which will cost 2.4 seconds.

A mitigation mechanism already in place bans nodes that send a faulty PoT. However, an attacker can easily spawn new instances with fresh IP addresses, especially when using IPv6.

Risk

A node, while busy with verification, can miss its opportunity to create a block or perform other actions on the chain.

Mitigation

Changes to the iteration can only occur at the begin of an era which is currently not checked when allowing the 50% increase on the number iterations. This would then allow for EXPECTED_POT_VERIFICATION_SPEEDUP to be increased by 1 and be at worst case have the same speed as calculating PoT itself.

Additionally, if less than EXPECTED_POT_VERIFICATION_SPEEDUP messages are present, a random selection mechanism for the next PoT to be verified could also help reducing the impact of such a burst DoS attack.

Lastly, this could operate in two modes: the existing implementation and a trusted mode. In the trusted mode, if no faulty PoT message has been received so far, the EXPECTED_POT_VERIFICATION_SPEEDUP could be doubled.

[1] https://github.com/subspace/subspace/blob/d31fe47911151223d4750bf740524265118e4241/crates/sc-proof-of-time/src/source/gossip.rs#L420

The text was updated successfully, but these errors were encountered:

nazar-pc · 2023-10-23T10:56:22Z

When an ASIC produces PoTs every second, COTS hardware takes 2.4 seconds to generate it independently and 300ms to verify a PoT message, assuming it can execute 8 parallel tasks, one for each checkpoint.

I'm not sure you understand it quite the same way, so I will clarify. It is not 8 parallel tasks that we are being used, it is instruction-level parallelism in modern processors. Modern processors can verify 8 checkpoints (16 effective independent computations since we can do both AES encoding and decoding in opposite direction due to the fact that we know inputs and outputs and can go both ways and check if middle value matches) on a single core ~8 times faster than prover, we described it in https://forum.subspace.network/t/proof-of-time-optimistic-vs-pessimistic-verification-analysis/1623 (sections "Energy" and "Commodity Hardware"). By the time we have ASICs that are faster, we expect AVX512 (or AES10 or whatever they end up calling it) will also have wider AES instructions (VAES) enabled on all modern processors (currently it is "complicated") and verification time as well as efficiency will shrink further. I recommed you to read linked forum post in full, it contains a lot of insigths.

Attack scenario 1: If up to seven received PoT messages are in the queue for an iteration
let correct_proof = if potentially_matching_proofs.len() < EXPECTED_POT_VERIFICATION_SPEEDUP { [1]
then one PoT message after another is checked, and the first that is successfully verified is taken. An attacker then sends a handful of faulty PoT messages from different nodes early into the network (e.g. so that the first messages are the faulty ones).
If e.g. the 6th message is a valid, then the node has spent 5 * 300ms = 1.5 seconds for verification.

As the iterations to be performed are allowed to be 50% above the current iteration number on the chain, this becomes 2.25 seconds (proof.slot_iterations.get() <= current_slot_iterations.get() * 3 / 2).

Attack scenario 2: If eight or more PoT messages were received, the node itself does the PoT generation which will cost 2.4 seconds.

First of all proof.slot_iterations.get() <= current_slot_iterations.get() * 3 / 2 is only a quck check, it doesn't lead to immediate verification. This is a sanity check, essentially we don't expect number of iterations to change more than 50% during a short period of time.

For those potential proofs that did pass all of the superficial (but cheap) checks of PotGossipValidator we will store them for the future when they become candidates for the future proof. When that happens in PotGossipWorker::handle_next_slot_input we will do some more cheap checks, including the check that the inputs used for generating potential proofs are matching.

Only if all of those checks succeded (if any of them do not, reputation of the peer is decreasesd by a bit) and we have several candidates we check whether it is cheaper computation-wise to generate proof ourselves or to verify candidates. Either way this those peers that sent us the incorrect proof will be banned on the network level and will not be able to send us proofs going forward.

A mitigation mechanism already in place bans nodes that send a faulty PoT. However, an attacker can easily spawn new instances with fresh IP addresses, especially when using IPv6.

Yes, the bigger question of identifying "under attack" conditions and become more concervative with connections going forward is still applicable, but more of a network-related topic. I believe there are more cases like this in the protocol and it is not just Subspace, you can attack any blockchain node this way even though the cost here is much larger.

@dariolina we should probably prioritize work in this direction.

Additional proposed mitigation

I'm not sure focus on EXPECTED_POT_VERIFICATION_SPEEDUP is important, the cost is high either way and +50% doesn't change the situation drastically, it is still bad if this attack happens.

One of the mitigations I can think of for PoT specifically is to always default to verification, but sort peers by reputation and pick smaller number of those that have the highest reputation first. Maybe even check the quorum and if most of those with high reputation have sent us the same set of checkpoints, chances are those are the checkpoints we should verify before checking anything else.

vanhauser-thc · 2023-10-24T07:34:41Z

When an ASIC produces PoTs every second, COTS hardware takes 2.4 seconds to generate it independently and 300ms to verify a PoT message, assuming it can execute 8 parallel tasks, one for each checkpoint.

I'm not sure you understand it quite the same way, so I will clarify. It is not 8 parallel tasks that we are being used, it is instruction-level parallelism in modern processors. Modern processors can verify 8 checkpoints ([16 effective independent computations since we can do both AES encoding and decoding in opposite direction due to the fact that we know inputs and outputs and can go both ways and check if middle value matches]

yes we understood it like that, however it is difficult to put into exact words :)

One of the mitigations I can think of for PoT specifically is to always default to verification, but sort peers by reputation and pick smaller number of those that have the highest reputation first. Maybe even check the quorum and if most of those with high reputation have sent us the same set of checkpoints, chances are those are the checkpoints we should verify before checking anything else.

IMHO that is the better way to handle this. this way an attacker first has to play nice for quite some time for just a short attack span.

dariolina · 2023-10-24T10:25:38Z

for PoT specifically is to always default to verification, but sort peers by reputation and pick smaller number of those that have the highest reputation first

I agree this sounds promising. Since we already have to store information about the sender, we might as well prioritize verification of PoT from our "usual" Timekeepers.
However, I can't entirely agree with removing the limit on verifications as this makes verification time theoretically unbounded. We can do a mixed approach, if we have many PoT messages that passed all preliminary checks, we can try to verify the top-3/5/7 and if that fails, we fall back to local evaluation.

nazar-pc · 2023-10-24T10:50:44Z

I think we can even reduce it to one, just pick the sender with highest reputation, it should be the most likely to send us correct proof. We can fall back to more or to proving in case that fails, which will not happen in most cases.

dariolina · 2023-10-26T11:25:17Z

In R&D sync we have discussed a need for a generally more robust "under attack" mode. The approach suggested here to verify from the "best" peer first, surely doesn't hurt and prevent some attacks on the lower-effort end of the spectrum. @nazar-pc I will add this to spec, if you are up to it.

nazar-pc · 2023-10-26T11:27:16Z

Please do, it is a non-beaking protocol change we can introduce any time for those who upgrade. I'll implement it in the near future.

nazar-pc · 2023-11-06T21:41:47Z

There is lack of Substrate API for reputation querying right now, see paritytech/polkadot-sdk#2185

I'll look into exposing it in Substrate once I get some feedback on what it might look like upstream.

Un-assigning Dariia now, since the immediate fix for the scope of this issue is implementation-only.

nazar-pc · 2023-12-13T09:43:54Z

Fix implemented in #2320

vanhauser-thc added the audit Audit results label Oct 20, 2023

nazar-pc assigned nazar-pc and dariolina Oct 23, 2023

nazar-pc added this to Subspace core (node, farmer, etc.) Oct 23, 2023

nazar-pc moved this to In Progress in Subspace core (node, farmer, etc.) Oct 23, 2023

nazar-pc added this to the Proof of Time milestone Oct 23, 2023

nazar-pc added node Node (service library/node app) performance Related to performance measurement or improvement labels Oct 25, 2023

nazar-pc unassigned dariolina Nov 6, 2023

nazar-pc moved this from In Progress to Blocked in Subspace core (node, farmer, etc.) Nov 6, 2023

nazar-pc added the upstream Blocked by upstream label Nov 6, 2023

nazar-pc mentioned this issue Nov 18, 2023

Add ability to query current peer reputation paritytech/polkadot-sdk#2185

Closed

2 tasks

nazar-pc moved this from Blocked to In Progress in Subspace core (node, farmer, etc.) Dec 13, 2023

nazar-pc mentioned this issue Dec 13, 2023

Prefer peer with the highest reputation if multiple PoT proofs were gossiped #2320

Merged

1 task

nazar-pc removed the upstream Blocked by upstream label Dec 13, 2023

nazar-pc moved this from In Progress to Under Review in Subspace core (node, farmer, etc.) Dec 13, 2023

nazar-pc closed this as completed in #2320 Dec 13, 2023

github-project-automation bot moved this from Under Review to Closed in Subspace core (node, farmer, etc.) Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Burst Denial of Service in Proof of Time #2142

Burst Denial of Service in Proof of Time #2142

vanhauser-thc commented Oct 20, 2023

nazar-pc commented Oct 23, 2023 •

edited

Loading

vanhauser-thc commented Oct 24, 2023

dariolina commented Oct 24, 2023

nazar-pc commented Oct 24, 2023

dariolina commented Oct 26, 2023

nazar-pc commented Oct 26, 2023

nazar-pc commented Nov 6, 2023

nazar-pc commented Dec 13, 2023

Burst Denial of Service in Proof of Time #2142

Burst Denial of Service in Proof of Time #2142

Comments

vanhauser-thc commented Oct 20, 2023

Summary

Issue details

Risk

Mitigation

nazar-pc commented Oct 23, 2023 • edited Loading

Additional proposed mitigation

vanhauser-thc commented Oct 24, 2023

dariolina commented Oct 24, 2023

nazar-pc commented Oct 24, 2023

dariolina commented Oct 26, 2023

nazar-pc commented Oct 26, 2023

nazar-pc commented Nov 6, 2023

nazar-pc commented Dec 13, 2023

nazar-pc commented Oct 23, 2023 •

edited

Loading