feat: blacklist peers sharing more than x IPs #1749

alrevuelta · 2023-05-19T11:36:09Z

Problem

Each node has a maximum amount of in/out connections. If an attacker creates multiple peers under the same IP and connects to us, we can run out of in slots to serve other honest peers. This attack can be extended to not just one node but the whole network, and will end up leaving very few available in slots for honest peers to connect. This attack has a huge impact on service protocols like store, because the network (depending on its size) can run out of slots to serve this protocols.

There is no single solution to prevent this attack, but more like a combination of:

1. churn: rotating peers from time to time (not implemented)
1. scoring: keeping the peers with high score, and disconnecting from the ones with low. Trickier in service protocols since its altruistic nature makes it not possible to score the client (as its not giving anything in return) (not implemented)
1. limit ips: limit the amount of peers we can see for each ip (not implemented)
1. enforce in/out: leave some slots for out peers, so that the node can control some of the peers that is connected to (implemented)

Alternatives considered

Gossipsub scoring takes this into account. This parameter is given a weight and is part of a final score with other parameters. imho limiting the peers we see from each IP should be either ok or ban (disconnect). It should be possible to tweak the weights to simulate this behaviour, so that if the amount of peers behind a given ip is > x, then the score automatically drops bellow the threshold and a disconnection is triggered.

Main problem with gossipsub scoring, is that it only applies to gossipsub, and we need protection also for service protocols. Gossipsub scoring can be integrated later on, but I would considering adding this layer on top that restricts the peers per IP for all connections.

The text was updated successfully, but these errors were encountered:

arnetheduck · 2023-05-22T11:16:33Z

Per discussion:

A strategy that works well is to introduce two limits: a soft limit and a hard limit. The hard limit is set high and acts as an absolute point beyond which no more connections are accepted.

The soft limit is used as a target for the number of peers or resources that are allowed and periodically the software takes action to reach that limit either by connecting to more peers or disconnecting poor peers. Critically, the software continues to make outgoing connections and accept incoming connections (usually with a token bucket-controlled limit to incoming connection rate) in this state.

Disconnection is done according to a varied set of criteria that focuses on certain metrics, primarily diversity - diversity can roughly be defined as covering as many bases as possible: geographic, utility, in/out, old/new etc.

In this strategy, "connecting from the same IP" doesn't get a point for "additional IP diversity" and therefore scores lower.

Looking at an example, with soft limit 100, hard limit 200, connection rate 10 peers/s and one "cleanup" per second, it's easy to see that even if the spammer creates 200 connections, these will get closed quickly down to 100 meaning there are 100 free "slots" for others to use to get an initial connection - even if the spammer keeps connecting, there are enough free "slots" in the queue for others to connect as well because of the rate limiting - in each round of cleanup, the connection of the spammer keeps going down whereas others get a fair chance based on their diversity contribution.

This strategy can further be "strenghened" by applying a two-level "token-bucket" strategy with a per-ip token bucket and a global one.

The two-level token bucket strategy also applies to all kinds of other scenarios for rate limiting and spam control: an individual limit to balance resources across peers without significantly affecting burst performance and a global limit to protect the node itself.

alrevuelta · 2023-05-22T12:06:09Z

Looking at an example, with soft limit 100, hard limit 200, connection rate 10 peers/s and one "cleanup" per second, it's easy to see that even if the spammer creates 200 connections, these will get closed quickly down to 100 meaning there are 100 free "slots" for others to use to get an initial connection - even if the spammer keeps connecting, there are enough free "slots" in the queue for others to connect as well because of the rate limiting - in each round of cleanup, the connection of the spammer keeps going down whereas others get a fair chance based on their diversity contribution.

But isn't it better to not even allow these connections in the first place? Than allowing them and then having to prune? If instead of 1 spammer we have 5, isn't this an attack vector? Why having this "optimistic connection" instead of a "strict blacklisting"?

usually with a token bucket-controlled limit to incoming connection rate

This is indeed interesting. Can see that prysm has some leaky bucket for ips see.

arnetheduck · 2023-05-22T12:20:03Z

Why having this "optimistic connection" instead of a "strict blacklisting"?

Because being permissive is more useful in general, specially for serving nodes - ie it's more common that there are legitimate users than that there are spammers, when the balance of cost for the two is "neutral", in terms of resource usage.

A spammer that connects without negatively affecting the service isn't achieving their goal whereas with this strategy, you are achieving your goal of serving legitimate users even in the presence of a multi-connection spammer. This explains why it's not worse to be permissive for legitimate users but it's worse for the spammer.

To understand why it's better to be permissive, you focus on the fact that you don't know that 5 connections from the same IP is bad or wrong: you're trying to catch spammers, not 5-connections-from-the-same-ip but the two are not the same. One case of malicious use seen right now in the network does not generalize to "all cases of 5 connections from the same IP are bad". 10 students in the same university classroom using a nat will exhibit this pattern too so your proposed solution catches some spammers and some legitimate users.

In short, blacklisting is sometimes bad and sometimes good. Being permissive is never bad and sometimes good. The cost of accepting a connection is assumed to be negligible here: it is made negligible by the rate limiter which ensures that the majority of resources are spent on legitimate connected users, over time.

Ivansete-status · 2023-05-22T17:31:58Z

Thanks for raising the issue @alrevuelta !

imo, I would follow the point iii (limit ips) and set it as a hard limit, collocationFactor, configurable, and 20 by default.

We could carry on with the permissive approach as advised by @arnetheduck, using the soft limit as an implicit blacklist.
However, I think that the soft limit shouldn't be configurable because it may have complex factors that might be difficult to understand from a user point of view.

On the other hand, regardless the connections being legit or spammers, that hard limit is very important as a mean of protection. For example, there could be a legit user that runs a client app which inadvertently tries to establish as many connections as possible due to a particular bug in the client's app.

arnetheduck · 2023-05-22T20:03:40Z

One more reason why soft limits work better: actually allowing the spammer to connect often slows down the connect / drop cycle - the technique is generally known as tarpitting and works well against buggy clients in particular tat sit in a tight connect loop by mistake - the "spammer" in this case has to deal with not knowing if their connection attempt is slow on their side or the target side.

Regarding hard limits, these are actually there mainly to protect against bugs in nwaku itself - in the example I used, it would never be hit because the "cleanup" procedure would run frequently enough to always stay clear of it when combined with rate limiting - it's simply there in case everything goes wrong at once, to not crash.

Keep in mind that the OS already deals with a lot of spam if you just let it - ie it has its own incoming connection queue strategy which works well if you only let it do its work (and don't prematurely accept connections for example).

alrevuelta · 2023-05-24T08:20:06Z

Tracking the inbound connection rate limit here.
#1757

connection rate 10 peers/s
@arnetheduck Out of curiosity, is this 10 peers/s a reasonable number? Seems a bit high, but ofc guess it will depends on the total amount of allowed inbound peers. Any rule of thumb? Like x ratio of the max amount of inbound peers?

arnetheduck · 2023-05-24T08:38:45Z

10 peers/s a reasonable number?

I don't think this matters greatly so long as some limit exists and 10 is as good a starting point as any - consider the numbers: a connection attempt is more or less a few packets the TCP/IP level - the node on the other hand handles thousands of valid network packets per second so as long as the ratio between the two is kept under control, it's fine.

Key is to not call the OS accept function too often, keep the listen backlog more or less in tune with the rate and the OS will take care of the rest.

One thing to note is that if the queue is too short or the number is too low, it will be slightly easier for a spammer to fill the queue - this is where the local/global limit comes in where the global limit, set at a higher rate, should control the accept call while the per-ip limit should be used to delay (or abort) per-ip connection negotiation after accept.

alrevuelta added the track:production label May 19, 2023

oskarth added this to Vac Research May 19, 2023

alrevuelta mentioned this issue May 22, 2023

Nwaku maximum connections reached for wss clients waku-org/pm#23

Closed

2 tasks

jm-clius added this to Waku May 23, 2023

alrevuelta self-assigned this May 23, 2023

alrevuelta added this to the Release 0.18.0 milestone May 23, 2023

alrevuelta moved this to In Progress in Waku May 23, 2023

alrevuelta mentioned this issue May 24, 2023

Networking MVP: Refactor + extend functionality #1353

Closed

17 tasks

alrevuelta mentioned this issue May 29, 2023

feat(networking): prune peers from same ip beyond colocation limit #1765

Merged

danisharora099 mentioned this issue May 29, 2023

Dogfood peer-exchange curation in web-chat example waku-org/js-waku#1309

Closed

alrevuelta closed this as completed in #1765 May 31, 2023

github-project-automation bot moved this to Done in Vac Research May 31, 2023

github-project-automation bot moved this from In Progress to Done in Waku May 31, 2023

chaitanyaprem mentioned this issue Aug 8, 2023

Introduce Peer Management waku-org/go-waku#594

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: blacklist peers sharing more than x IPs #1749

feat: blacklist peers sharing more than x IPs #1749

alrevuelta commented May 19, 2023 •

edited

Loading

arnetheduck commented May 22, 2023

alrevuelta commented May 22, 2023

arnetheduck commented May 22, 2023

Ivansete-status commented May 22, 2023

arnetheduck commented May 22, 2023

alrevuelta commented May 24, 2023

arnetheduck commented May 24, 2023

feat: blacklist peers sharing more than x IPs #1749

feat: blacklist peers sharing more than x IPs #1749

Comments

alrevuelta commented May 19, 2023 • edited Loading

Problem

Suggested solution

Alternatives considered

arnetheduck commented May 22, 2023

alrevuelta commented May 22, 2023

arnetheduck commented May 22, 2023

Ivansete-status commented May 22, 2023

arnetheduck commented May 22, 2023

alrevuelta commented May 24, 2023

arnetheduck commented May 24, 2023

alrevuelta commented May 19, 2023 •

edited

Loading