Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: preventing P2P proxy nodes. #126

Open
Boog900 opened this issue Oct 31, 2024 · 3 comments
Open

Discussion: preventing P2P proxy nodes. #126

Boog900 opened this issue Oct 31, 2024 · 3 comments

Comments

@Boog900
Copy link

Boog900 commented Oct 31, 2024

Background

While investigating this issue: monero-project/monero#9496 I noticed in one of the log snippets posted that a peer sent a message that wouldn't normally be sent if both nodes were running default monerod monero-project/monero#9496 (comment):

2024-10-04 22:14:12.000 I [162.218.65.219:11095 INC] 227 bytes received for category command-1001 initiated by peer
2024-10-04 22:14:12.001 I [162.218.65.219:11095 INC] 10 bytes sent for category command-1007 initiated by us
2024-10-04 22:14:12.002 I [162.218.65.219:11095 INC] 15520 bytes sent for category command-1001 initiated by us

Command 1001 is a handshake request & response. Command 1007 is a support flag request & response. Support flags are contained in the handshake message, currently the only flag is for fluffy blocks, monerod will always activate this flag, even if you enable the no_fluffy_blocks arg (it will only disable sending fluffy blocks).

Support flag requests are only sent if you leave the support flags in the handshake empty, so 162.218.65.219 either compiled their own monerod unsetting the fluffy blocks support flag or, more likely, they are running completely custom software.

I tried to find all the nodes displaying this behavior but sadly when I connected to this node it had the fluffy blocks support flag set. Knowing that they are probably running custom software I decided to try find another difference in behavior from monerod, and luckily I managed to find one. I am going to keep the exact method private to prevent them fixing it. The method will not give false positives, monerod would never do what these nodes do. I am certain these nodes are proxies to real nodes.

Scanning the network for this behavior I found 1900 IP addresses running these bad nodes: https://github.com/Boog900/monero-ban-list/blob/main/ban_list.txt, the majority were from these 6 subnets:

91.198.115.0/24
100.42.27.0/24
162.218.65.0/24
193.142.4.0/24
199.116.84.0/24
209.222.252.0/24

Which overlaps with LinkingLions: https://b10c.me/observations/06-linkinglion/.

In total while scanning the network I found around 4900 active IP addresses, which is less than half the count of nodes on https://monero.fail/map (currently 12941). I decided to run monero.fail's tool myself, with a slight modification to check nodes are reachable before adding them to the list. With this tool I got 10175 "Recent Peers". monero.fail counts the same IP running nodes on different ports as different peers, which is fine, but it explains the difference in the count.

The spy nodes run multiple proxies behind the same IP, in the list of 10175 IP:Port combinations the amount of nodes with an IP in the 6 main subnets + some 23.92.36.* that the proxies use is 7584.

This means from the data I have it looks like ~40% of the IPs running Monero nodes are not real nodes and ~75% of the "Recent Peers" from my scan using monero.fails tool were from an IP in the 6 main subnets + some 23.92.36.* that the proxy nodes were using.

Having such a large chunk of the network, each tx sent over clearnet is highly likely to be rooted through one of their nodes at least once in the stem stage. Allowing them to hold onto it and use one of the few information leaks from monerod to try and find the peers in the stem path before it, example: monero-project/monero#9496 (comment). Even if a node does not accept incoming connections they are still highly likely to be connected to at least one node ran by this entity which can be used to exploit an information leak.

With the amount of nodes ran by this entity IMO network security is also an issue.

Ideas

Banning these IP addresses

Banning the IP addresses is a good solution for individual nodes, however not all nodes will do it and they could just switch IPs.

Hardening the addressbook

Although currently monerod prefers to connect to peers in different /16 subnets, the addressbook does not limit the amount of peers stored per subnets, it will even store the same IP with different ports also it could store IPv6 mapped IPv4 addresses, and the canonical IPv4 address at the same time in different entries.

We could limit the amount of address stored per subnet, like bitcoin, and prevent the same IP from being stored multiple times.
Although these nodes could still be active they would be less likely to be connected to.

Proof of Storage

There are schemes for proving that a node is storing the blockchain: https://ieeexplore.ieee.org/document/10174897 although this would require "encrypting" the blockchain, which isn't ideal.

@SyntheticBird45
Copy link

SyntheticBird45 commented Oct 31, 2024

At the same time, I've been investigating these IPs on auxiliary channels and I'm able to attest these are all exhibiting signs of running the exact same software. I've (with also the contribution of plowsof) been able to identify other methods to detect them, which are going to stay private but can be shared to other trusted MRL members.

@RamoBrumen
Copy link

Can this help ? monero-project/monero#7935 ; we need to add this to have more diverse peer connectivity

@Rucknium
Copy link

Rucknium commented Nov 6, 2024

TL;DR: An analysis of node log data provides further supporting evidence that the IP addresses on the banlist are controlled by an eavesdropping adversary. At any given time, an average of 15 percent of outbound connections of the honest logging nodes were made to IP addresses on the banlist, which reduced the effectiveness of the Dandelion++ privacy protocol.

I used fluff-phase transaction relay log data to analyze peer-to-peer connections from honest nodes to IP addresses on the banlist: https://github.com/Boog900/monero-ban-list/blob/main/ban_list.txt

The log data was collected from about ten nodes between April 14, 2024 and May 23, 2024. The data is described in Section 9 of my "March 2024 Suspected Black Marble Flooding Against Monero: Privacy, User Experience, and Countermeasures".

IPs on banlist do not initiate connections (consistent with optimal attack strategy against Dandelion++)

IP addresses in the banlist make up about 11 percent of all 13,600 unique node IP addresses in the dataset. For outbound connections only, IP addresses in the banlist make up about 25 percent of all 6,200 unique node IP addresses. For outbound connections only and counting each port as a distinct node, IP addresses in the banlist make up about 50 percent of all 9,400 distinct nodes.

The banlist IP addresses almost never initiate connections. They only wait for honest nodes to establish outbound connections to them. Only a single IP address on the banlist appeared as an inbound connection to the logging nodes.

Outbound connections are the privacy-sensitive connection type. Dandelion++ relays stem-phase transactions to outbound connections only. The Dandelion++ threat model in Fanti et al. (2018) assumes that

Spies can generate as many outbound edges as they want, to whichever nodes they choose; however, they cannot force honest nodes to create outbound edges to spies.

and

Dandelion is naturally robust to nodes that create a disproportionate number of edges, because spies can only create outbound edges to honest nodes. This matters because in the stem phase, honest nodes only forward messages on outbound edges.

The operator of the banlist IP addresses is encouraging honest nodes to establish outbound connections to the malicious proxy nodes. When honest nodes have many outbound connections to spy nodes, the privacy protections of Dandelion++ do not work very well. Outbound connections are the only type of connections that a node with closed ports can have, so closed-port nodes are at higher risk of eclipse attacks in this circumstance.

Banlist IP addresses within the/24 subnet ranges "saturate" their subnets

The banlist includes six IP address ranges. These IP address ranges include 254 unique IP addresses in their ranges, from xxx.xxx.xxx.1 to xxx.xxx.xxx.254. For each of the six subnets, between 240 and 254 unique IP addressees appear in my dataset. This subnet saturation suggests that a single entity controls every IP address in the subnet and the entity uses the IP addresses to accept connections from honest peer nodes.

Besides the IP addresses on the banlist, the greatest subset saturation in the dataset was 49.12.239.0/24 with about 30 unique IP addresses. This subnet is associated with a Hetzner Autonomous System Network (ASN), which leases servers to companies and individuals. The banlist IP address ranges are probably malicious, given that similar saturation behavior is not observed for any other IP address ranges.

Empirical privacy impact

The share, $p$, of an honest node's outbound connections that are made to spy nodes determines the honest node's privacy risk at any given time. Higher $p$ means greater privacy risk.

The dataset contains data on the duration of each logging node's connection to peers. (The data is actually a log of fluff-phase transactions received from peer nodes, but the first and last time receiving transactions from a node is roughly the same as the duration of the connection.)

Some of the logging node operators apparently already manually enabled a banlist that prevented connections to most of the IP addresses on @Boog900 's banlist. Data from these logging nodes was excluded from the following analysis. The share of malicious nodes in the logging nodes' outbound connections at any given time can be weighted by the duration of each connection to produce a $p$ for an average period of time.

The weighted average $p$ of the logging nodes was about 0.15, much less than the 25 percent share of distinct IP addresses on the banlist (if using the unique IP address metric) and the 50 percent share of distinct nodes (if using the distinct IP/port combination metric). Possible reasons for the discrepancy are discussed in the next section.

How much of a threat to privacy do these suspected spy proxy nodes pose to users? The Dandelion++ paper (Fanti et al. 2018) choose recall and precision to measure the privacy of the protocol:

As discussed in [Venkatakrishnan, Fanti & Viswanath (2017)], precision and recall are a superset of the metrics typically studied in this space; in particular, recall is equivalent (in expectation) to probability of detection. On the other hand, precision can be interpreted as a measure of a node’s plausible deniability; the more transactions get mapped to a single node, the lower the adversary’s precision.

Recall and precision have these definitions in terms of true positives, false positives, and false negatives:

recall = true_positives / (true_positives + false_negative)

precision = true_positives / (true_positives + false_positives)

Venkatakrishnan, Fanti & Viswanath (2017) prove that the lowest recall and precision that any clearnet transaction relay protocol can achieve is $p$ and $p^2$, respectively. Fanti et al. (2018) prove that the Dandelion++ protocol can achieve the $p$ lower bound on recall and nearly achieve the $p^2$ lower bound on precision in realistic circumstances.

Assuming the IP addresses on the banlist are malicious spy nodes controlled by a single adversary, the mean recall and precision that the adversary achieved would be approximately the empirical mean of $p$ and $p^2$, respectively. The weighted average $p$ was already estimated to be 0.15 above. Due to Jensen's inequality the mean of $p^2$ is not $0.15^2 = 0.023$, but a Riemann–Stieltjes integral of the weighted empirical cumulative distribution function (WECDF) of $p$ can be used instead. Integrating $f(x) = x^2$ with respect to the WECDF gives an estimated mean precision of 0.035.

Share of outbound connections to malicious nodes is lower than the share of malicious nodes on the network

In the previous section I noted that the share of outbound connections that logging nodes made to the banlist IP address is lower than the share of banlist IP address on the network. At least two factors could explain the discrepancy.

First, connections to banlist IP addresses are about 25 percent shorter in time duration than connections to IP addresses that are not on the banlist. The shorter connection durations would mean less weight is given to banlist IP addresses in the weighted mean of $p$.

Second, Monero nodes prefer to connect to nodes that are not within the same /16 IP address ranges. Since most of the banlist IP addresses are in just six /24 ranges (which are a strict subset of /16 ranges), a Monero node that is already connected to an IP address on the banlist would likely skip many of the banlist IP addresses when it chooses its next outbound peer connection.

Suggestion

An honest node that does not establish outbound connections to IP addresses on the banlist would provide better privacy to users who use the node as a local or remote node to construct and broadcast transactions. Future versions of the Monero node software could hard-code some or all of the banlist IP addresses to avoid establishing outbound connections (which are the most privacy-sensitive type of connection), but still allow inbound connections from those IP addresses. The software modification would not exclude those IP addresses from the network. Instead, nodes from those IP addresses would just have to establish their own outbound connections to nodes on the network. The hard-coded behavior could remain in effect until a more universal solution presents itself.

Special thanks to @Boog900 for feedback on this analysis.

Analysis code is forthcoming.

References

Venkatakrishnan, S. B., Fanti, G., & Viswanath, P. (2017). Dandelion: Redesigning the Bitcoin Network for Anonymity, Proc. ACM Meas. Anal. Comput. Syst. 1(1).

Fanti, G., Venkatakrishnan, S. B., Bakshi, S., Denby, B., Bhargava, S., & Miller, A., et al. (2018). "Dandelion++: Lightweight Cryptocurrency Networking with Formal Anonymity Guarantees," Proc. ACM Meas. Anal. Comput. Syst. 2(2).

kkarhan added a commit to greyhat-academy/lists.d that referenced this issue Dec 12, 2024
… by @Boog900 ( as signed off by @jeffro256 and [endorsed](https://gist.github.com/Rucknium/76edd249c363b9ecf2517db4fab42e88) by @Rucknium) to blocklists.list.tsv

Blocklisting such nodes is a security benefit as they threaten the safety and security of Monero users, regardless of whether one endorses Monero or not. [The existing research](monero-project/meta#1119) leads to believe this is a [direct attack](monero-project/research-lab#126) on the whole network and similar to [LinkingLion](https://b10c.me/observations/06-linkinglion/).

Signed-off-by: kkarhan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants