Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: blacklist peers sharing more than x IPs #1749

Closed
Tracked by #1353
alrevuelta opened this issue May 19, 2023 · 7 comments · Fixed by #1765
Closed
Tracked by #1353

feat: blacklist peers sharing more than x IPs #1749

alrevuelta opened this issue May 19, 2023 · 7 comments · Fixed by #1765
Assignees

Comments

@alrevuelta
Copy link
Contributor

alrevuelta commented May 19, 2023

Problem

Each node has a maximum amount of in/out connections. If an attacker creates multiple peers under the same IP and connects to us, we can run out of in slots to serve other honest peers. This attack can be extended to not just one node but the whole network, and will end up leaving very few available in slots for honest peers to connect. This attack has a huge impact on service protocols like store, because the network (depending on its size) can run out of slots to serve this protocols.

There is no single solution to prevent this attack, but more like a combination of:

    1. churn: rotating peers from time to time (not implemented)
    1. scoring: keeping the peers with high score, and disconnecting from the ones with low. Trickier in service protocols since its altruistic nature makes it not possible to score the client (as its not giving anything in return) (not implemented)
    1. limit ips: limit the amount of peers we can see for each ip (not implemented)
    1. enforce in/out: leave some slots for out peers, so that the node can control some of the peers that is connected to (implemented)

Suggested solution

This issue suggests to implement 3) as follows. It aims to mitigate the above explained attack by relying on the fact that sybil attacking with IPs is way harder than with peerIds (which is trivial). It suggests to limit the amount of peers that we see for each IP, so that if a given IP has multiple peers behind, we ignore them (exceeding a threshold):

Solution:

  • Define a collocationFactor that limits the amount of peers we allow from each ip. Example: 5.
  • Add a new table to track the amount of peers that we see behind each IP in peerstore. If we detect that ip ip_1 is shared among p1..5 peers then this table will store ip1->5.
  • Use this metric in the peerstore so that:
    • If a new p6 is discovered and has ip1, don't add it to the peerstore and skip it (since collocationFactor = 5). This protects our peerstore to be full of possible spammers.
    • Reject incomming connections from peers if we have > collocationFactor conns from the same IP (unsure if this can be done in nim-libp2p, in golibp2p there is ConnectionGater that allows to define a handle that is executed before completing the connection)
    • Dont attempt connections to peers if their IP is tracked and shared by > collocationFactor. This shouldn't happen if they are never added into the peerstore.

Note. This can ben enforced on two ways:

  • Just for connections. So that we never connect to more than collocationFactor sharing the same IP.
  • For the peerstore in general. So we dont even track/store peers that share the same up (just up to collocationFactor)

Inspiration:

Alternatives considered

Gossipsub scoring takes this into account. This parameter is given a weight and is part of a final score with other parameters. imho limiting the peers we see from each IP should be either ok or ban (disconnect). It should be possible to tweak the weights to simulate this behaviour, so that if the amount of peers behind a given ip is > x, then the score automatically drops bellow the threshold and a disconnection is triggered.

Main problem with gossipsub scoring, is that it only applies to gossipsub, and we need protection also for service protocols. Gossipsub scoring can be integrated later on, but I would considering adding this layer on top that restricts the peers per IP for all connections.

@arnetheduck
Copy link
Contributor

Per discussion:

A strategy that works well is to introduce two limits: a soft limit and a hard limit. The hard limit is set high and acts as an absolute point beyond which no more connections are accepted.

The soft limit is used as a target for the number of peers or resources that are allowed and periodically the software takes action to reach that limit either by connecting to more peers or disconnecting poor peers. Critically, the software continues to make outgoing connections and accept incoming connections (usually with a token bucket-controlled limit to incoming connection rate) in this state.

Disconnection is done according to a varied set of criteria that focuses on certain metrics, primarily diversity - diversity can roughly be defined as covering as many bases as possible: geographic, utility, in/out, old/new etc.

In this strategy, "connecting from the same IP" doesn't get a point for "additional IP diversity" and therefore scores lower.

Looking at an example, with soft limit 100, hard limit 200, connection rate 10 peers/s and one "cleanup" per second, it's easy to see that even if the spammer creates 200 connections, these will get closed quickly down to 100 meaning there are 100 free "slots" for others to use to get an initial connection - even if the spammer keeps connecting, there are enough free "slots" in the queue for others to connect as well because of the rate limiting - in each round of cleanup, the connection of the spammer keeps going down whereas others get a fair chance based on their diversity contribution.

This strategy can further be "strenghened" by applying a two-level "token-bucket" strategy with a per-ip token bucket and a global one.

The two-level token bucket strategy also applies to all kinds of other scenarios for rate limiting and spam control: an individual limit to balance resources across peers without significantly affecting burst performance and a global limit to protect the node itself.

@alrevuelta
Copy link
Contributor Author

Looking at an example, with soft limit 100, hard limit 200, connection rate 10 peers/s and one "cleanup" per second, it's easy to see that even if the spammer creates 200 connections, these will get closed quickly down to 100 meaning there are 100 free "slots" for others to use to get an initial connection - even if the spammer keeps connecting, there are enough free "slots" in the queue for others to connect as well because of the rate limiting - in each round of cleanup, the connection of the spammer keeps going down whereas others get a fair chance based on their diversity contribution.

But isn't it better to not even allow these connections in the first place? Than allowing them and then having to prune? If instead of 1 spammer we have 5, isn't this an attack vector? Why having this "optimistic connection" instead of a "strict blacklisting"?

usually with a token bucket-controlled limit to incoming connection rate

This is indeed interesting. Can see that prysm has some leaky bucket for ips see.

@arnetheduck
Copy link
Contributor

Why having this "optimistic connection" instead of a "strict blacklisting"?

Because being permissive is more useful in general, specially for serving nodes - ie it's more common that there are legitimate users than that there are spammers, when the balance of cost for the two is "neutral", in terms of resource usage.

A spammer that connects without negatively affecting the service isn't achieving their goal whereas with this strategy, you are achieving your goal of serving legitimate users even in the presence of a multi-connection spammer. This explains why it's not worse to be permissive for legitimate users but it's worse for the spammer.

To understand why it's better to be permissive, you focus on the fact that you don't know that 5 connections from the same IP is bad or wrong: you're trying to catch spammers, not 5-connections-from-the-same-ip but the two are not the same. One case of malicious use seen right now in the network does not generalize to "all cases of 5 connections from the same IP are bad". 10 students in the same university classroom using a nat will exhibit this pattern too so your proposed solution catches some spammers and some legitimate users.

In short, blacklisting is sometimes bad and sometimes good. Being permissive is never bad and sometimes good. The cost of accepting a connection is assumed to be negligible here: it is made negligible by the rate limiter which ensures that the majority of resources are spent on legitimate connected users, over time.

@Ivansete-status
Copy link
Collaborator

Thanks for raising the issue @alrevuelta !

imo, I would follow the point iii (limit ips) and set it as a hard limit, collocationFactor, configurable, and 20 by default.

We could carry on with the permissive approach as advised by @arnetheduck, using the soft limit as an implicit blacklist.
However, I think that the soft limit shouldn't be configurable because it may have complex factors that might be difficult to understand from a user point of view.

On the other hand, regardless the connections being legit or spammers, that hard limit is very important as a mean of protection. For example, there could be a legit user that runs a client app which inadvertently tries to establish as many connections as possible due to a particular bug in the client's app.

@arnetheduck
Copy link
Contributor

One more reason why soft limits work better: actually allowing the spammer to connect often slows down the connect / drop cycle - the technique is generally known as tarpitting and works well against buggy clients in particular tat sit in a tight connect loop by mistake - the "spammer" in this case has to deal with not knowing if their connection attempt is slow on their side or the target side.

Regarding hard limits, these are actually there mainly to protect against bugs in nwaku itself - in the example I used, it would never be hit because the "cleanup" procedure would run frequently enough to always stay clear of it when combined with rate limiting - it's simply there in case everything goes wrong at once, to not crash.

Keep in mind that the OS already deals with a lot of spam if you just let it - ie it has its own incoming connection queue strategy which works well if you only let it do its work (and don't prematurely accept connections for example).

@jm-clius jm-clius added this to Waku May 23, 2023
@alrevuelta alrevuelta self-assigned this May 23, 2023
@alrevuelta alrevuelta added this to the Release 0.18.0 milestone May 23, 2023
@alrevuelta alrevuelta moved this to In Progress in Waku May 23, 2023
@alrevuelta
Copy link
Contributor Author

Tracking the inbound connection rate limit here.
#1757

connection rate 10 peers/s
@arnetheduck Out of curiosity, is this 10 peers/s a reasonable number? Seems a bit high, but ofc guess it will depends on the total amount of allowed inbound peers. Any rule of thumb? Like x ratio of the max amount of inbound peers?

@arnetheduck
Copy link
Contributor

10 peers/s a reasonable number?

I don't think this matters greatly so long as some limit exists and 10 is as good a starting point as any - consider the numbers: a connection attempt is more or less a few packets the TCP/IP level - the node on the other hand handles thousands of valid network packets per second so as long as the ratio between the two is kept under control, it's fine.

Key is to not call the OS accept function too often, keep the listen backlog more or less in tune with the rate and the OS will take care of the rest.

One thing to note is that if the queue is too short or the number is too low, it will be slightly easier for a spammer to fill the queue - this is where the local/global limit comes in where the global limit, set at a higher rate, should control the accept call while the per-ip limit should be used to delay (or abort) per-ip connection negotiation after accept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants