Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining gossip with trust #6079

Closed
kozlovsky opened this issue Apr 27, 2021 · 3 comments
Closed

Combining gossip with trust #6079

kozlovsky opened this issue Apr 27, 2021 · 3 comments

Comments

@kozlovsky
Copy link
Contributor

I'm proposing architecture for a gossip subsystem that can calculate trust. The current goal is to use it to spread torrent's health information. In the future, it should be possible to use the same approach to gossip other types of information as well.

Good gossiping architecture should provide answers for many questions, such as:

  • how to detect information that is valuable enough to gossip about it;
  • how to spread information efficiently;
  • how to combine gossip with trust.

I believe that these questions can be answered independently to some degree. But the last question, "how to combine gossip with trust," seems to be harder than other questions and should be answered first. At first, it is important to implement a way to gossip and store trust-related facts to provide a platform for spreading trustable information in a way that is forward compatible with future Tribler clients. It will then be possible to choose different algorithms to select which torrents to check, to which peers gossip the information, and how exactly a formula for calculating trust ratings should look. It should be easy to change the formula in the future when each node in the local database already has the necessary information to re-calculate ratings according to the updated algorithm.

What is possible to say about a good trust-enabled gossip architecture?

The necessary conditions for having a trust-enabled gossiping framework are the following:

  • each gossiped message should carry its author's public key;
  • each gossiped message should be signed by the author's public key to prove the authorship;

If a gossiped message does not contain the author's id, we don't know about whom to collect the trust information, and if a message is not signed, we cannot be sure who was its actual author.

There are some additional thoughts:

  • Trust-analyzing system should not be affected by churn. In the case of the Tribler/IPv8 distributed network, the churn is high, and nodes appear and disappear all the time. When the Tribler node connects to the network next time, it probably will meet completely different peers. It may be hard to organize peers in a stable structure that is resilient to possible attacks. That means that each node should calculate trust locally and independently from the other nodes, based on the local database of previously received messages solely. The node's local database should have all the necessary information for this.

  • To spread information efficiently, the gossiping framework should allow the epidemic way of sending messages. That means that when a node receives a message, it should be possible to send the same message to a different peer. And that means that the message should include the complete author's public key (and not just an author's random id or a public key fingerprint). A receiver should be able to verify the signature of the message without sending any additional requests for receiving the author's public key.

Maybe what I'm saying looks trivial and self-obvious, but in my opinion, currently, we don't have any of this. Our current Popularity community for gossiping health information does not sign messages, does not store an author of a gossiped message in the database, and sends messages to nearest peers only without any transitivity.

What kind of data should be used for the gossip-related trust rating of nodes?

Currently, Tribler has an accounting system to record information about the traffic exchange. I heard some opinions that we can use this bandwidth balance information to calculate trust ratings for gossiped health information. In my opinion, it is not possible. If some node has an excellent traffic balance, it does not mean that it will never spread false health information about some specific torrents. The node which actively participates in a traffic exchange may have an incentive to maliciously promote some torrents of category X and suppress some torrents of category Y.

What types of messages should a trust-enabled gossip system have?

We need a message to gossip health information itself: node A sends message M1(t1, T1, H1) that at the moment t1, the health of torrent T1 equals H1. Is this single message type enough to calculate trust ratings?

At first consider the simplified scenario, when every node receives every gossiped message.

If node B receives message M1, it can decide to check if this health information is valid. It can check the health of the same torrent at the moment t2. As t1 and t2 differ, the health information H2 will be different from H1. Still, using some heuristics (which are outside of the topic of this issue), it should be possible to estimate if the difference is reasonable or not.

If at the moment t1 node A had reported that the torrent T had 200 seeders and 500 leechers, and 15 minutes later, node B founds 180 seeders and 400 leechers, the change looks reasonable. But if instead, 15 minutes later, node B founds five seeders and 20 leechers, then node B can assume that the information gossiped by node A was potentially malicious.

After the check, node B can update local statistics about note A rating. It would be highly beneficial for other nodes if node B not just updates the local database but also gossip the result of the check to other nodes, so they can avoid performing expensive re-check of the torrent health (assuming other nodes can trust node B messages to some degree).

So node B can send the result of the check to its peers, and in the simplest case, it can use the same simple message format that node A used: a message M2(t2, T1, H2) says that at the moment t2 the health of torrent T1 equals to H2.

Then some node C which receives both messages M1(t1, T1, H1) from node A and M2(t2, T1, H2) from node B, can compare H1 and H2 and decide whether the difference looks reasonable or not. If the difference is not reasonable, then one of the nodes (node A or node B) may be malicious.

Is it necessary for node B to include in message M2 information about whether node B considers previous message M1 malicious or not? It is not necessary because node C can decide for itself. Node C can have different heuristics for health comparison. It is better to send facts (signed messages with health information) and not opinions (what node B thinks about node A using some set of heuristics).

Back to the real world - each node receives a subset of all messages.

So, if node C has both messages M1 and M2, it can decide whether one of these messages looks suspicious or not. But in a real network with high churn, every message has likely gossiped to only a limited number of nodes. In that case, the chance that node C has both messages M1 and M2 will not be high. That means that the message M2 will not perform its task to inform other nodes about M1 correctness. To fix this, we need to pass message M1 with the message M2 as a single packet <M1, M2>. This will allow any node to see signed messages from nodes A and B and be able to make trust-related conclusions (do node B health-check result confirms or contradicts with the result of node A).

If node C decides to check the message M2, it can perform a new health check, construct a new message M3(t3, T1, H2) and then gossip a new packet <M2, M3> to some peers.

Message format

Based on the above reasoning, the following messages can be used for trust-enabled gossiping:

A base message

This message is used for sending the basic information for gossiping and contains the following fields:

  • Author's public key
  • Payload, which for health information includes:
    • Torrent's infohash;
    • Timestamp of the health check;
    • Number of seeders;
    • Number of leechers.
  • Payload signature generated using the author's private key.

A trust-checking message

This message allows for any node to compare health-checking results of two nodes and make conclusions. It consists of:

  • A previous base message from node A that was just checked
  • A new base message from node B. Note that the signature in this base message signs the corresponding payload only and not the payload from the first base message from node A. This is important to be able to "split" node B from the packet while keeping the signature valid.

Possible implementation details

  1. New HealthGossipCommunity can be created alongside the current PopularityCommunity.
  2. The data gossiped through the HealthGossipCommunity should be stored in a separate table to avoid interference with PopularityCommunity gossiping. Later it should be possible to compare health data from different communities to evaluate the performance of used algorithms.
  3. New table should store all incoming messages together with signatures. This way, it will be possible to have proof that specific nodes spread malicious messages.
  4. In case of limited disk space, it should be possible for a node to delete all or specific locally stored messages (too old or belonging to specific torrents/authors) without any consequences except having less information to reliable calculate trust ratings.
  5. Some simple algorithm can be used to select torrents for the initial health checks. For example, we can select torrents randomly with a slight preference for recently added torrents. This algorithm can be improved in the future Tribler versions.
  6. Some simple algorithm can be used to select peers to gossip base health check messages. For example, the original author of the message can gossip the message to all peers, and each peer can gossip the message transitively to other peers with some probability. It is not important to have an ideal algorithm for peer selection from the start, as it may be improved later. We only should be careful not to flood the network with a huge number of retranslated messages.
  7. Some simple algorithm can be used to select received messages and re-check torrent health information. For example, for each checking interval, we can check a random unchecked torrent received no earlier than 1 hour before and no later than 15 minutes before.
  8. Some simple formula can be used to calculate trust. For example, for each node, we can calculate the number of successful checks and the number of unsuccessful checks and declare the node as malicious if the percentage of unsuccessful checks is too high.
  9. Messages from malicious nodes are still stored in the local database, but their content is ignored when calculating the torrent health presented to the user.
@qstokkink
Copy link
Contributor

Given that this is about gossiping and trust, I'll offer my feedback.


The necessary conditions for having a trust-enabled gossiping framework are the following:

  • each gossiped message should carry its author's public key;

  • each gossiped message should be signed by the author's public key to prove the authorship;

Maybe what I'm saying looks trivial and self-obvious, but in my opinion, currently, we don't have any of this.

These are the standard features of communication in IPv8. The PopularityCommunity follows this standard.
Without changing the wire format of the PopularityCommunity, you can use the lazy_wrapper_wd instead of lazy_wrapper on the on_torrents_health method to access the full message, containing the public key and signature. You can store this in your database if you like and forward it as you please. No other changes required.


message should include the complete author's public key (and not just an author's random id or a public key fingerprint). A receiver should be able to verify the signature of the message without sending any additional requests for receiving the author's public key.

This sounds like the protocol that was deprecated in 2013 and completely dropped in 2017. You seem to have outdated information.


If some node has an excellent traffic balance, it does not mean that it will never spread false health information about some specific torrents.

Fully agreed.


It is better to send facts (signed messages with health information) and not opinions (what node B thinks about node A using some set of heuristics)

  1. New table should store all incoming messages together with signatures. This way, it will be possible to have proof that specific nodes spread malicious messages.

Signing a message does not make it a fact. Signing a message that completely contradicts all other known information does not even mean a node was malicious. Measurements can change over time, especially those of torrent health (there is no stable centralized torrent health oracle). Beyond this, there may be other network effects or even (god forbid) bugs in our code.


But in a real network with high churn, every message has likely gossiped to only a limited number of nodes. In that case, the chance that node C has both messages M1 and M2 will not be high. That means that the message M2 will not perform its task to inform other nodes about M1 correctness. To fix this, we need to pass message M1 with the message M2 as a single packet <M1, M2>. This will allow any node to see signed messages from nodes A and B and be able to make trust-related conclusions (do node B health-check result confirms or contradicts with the result of node A).

You can forward the signed message of an honest user and show how it contradicts the measurements of other (fake) identities. Signing a message does not mean that the identity signing is the only identity controlled by a user. In fact, you can generate near-infinite identities to sign messages and overpower all honest nodes (this is the Sybil attack). Even worse, if you start ostracizing nodes for being in the minority, any attacker can trivially get all of the honest nodes in the network to start banning eachother using the forwarded messages of its own fake identities.


If node C decides to check the message M2, it can perform a new health check [...]

I fully support verifying information for yourself when you get conflicting information from others. However, still, why would second-hand information (forwarded and resigned) be better than first-hand information?


In summary:

  • It seems to me your base message already exists and is readily usable in the signed form you need (TorrentsHealthPayload of the PopularityCommunity). Storing these signed messages does not require a new Community.
  • Signed messages do not imply any lack of malicious intent, truth or even existence of the author.
  • If anything, basing yourself on forwarded information seems to make the fake information situation worse than just basing yourself on first-hand information from the peers you know personally.
  • Checking conflicting information seems like a good idea.

Given these observations from your description, I don't see a need for a new Community that forwards information. In fact, I believe deriving trust from your proposed forwarding mechanism would make our users more vulnerable to misinformation than they are now.

Please let me know if I missed something.

@synctext
Copy link
Member

synctext commented Apr 27, 2021

You identified the grand challenge we aim to solve! (see #3571)

Overlap with "thesis project explores making the connection between gossip and trust".
Overlap with old and bit naive idea: "Similarity between contagion and trust"

You have excellent thoughts about trust and gossip. We have a 20 year history with starting communities, gossip, and "Social Inspired Reputations" (2008 prototype). Over 21 years ago I also started building "a generic implementation of
a public writeble database](https://cd.ro.nu/hypermail/1006.html). Now others are also blogging about this (by R3 founder):

Distributed ledgers – or decentralised databases – are systems that enable parties who don’t fully trust
each other to form and maintain consensus about the existence, status and evolution of a set of shared facts

Actual in-depth comment on your ideas... Identity is the only anchor-point you can use for reputation building, reinforcement and trust. SwarmHealth is not a future-proof starting point. Trust needs a root-of-trust, checking swarm health is too cheap to fake.

Trust is hard. With $20 per month in VPS cost your work can be broken down. Simple 100Mbps full-speed spam. Please spend a few months on learning the harsh lessons we went through. Starting point: Sybil attacks and eclipse attacks. Plus this daunting overview and thesis. On top of that our ongoing scientific quest is even protecting from poisoning attacks and byzantine failure.

@qstokkink
Copy link
Contributor

I believe this issue has sufficiently high overlap with what is currently implemented and we can close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants