Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The future of our DHTs plan #3183

Closed
synctext opened this issue Nov 3, 2017 · 14 comments
Closed

The future of our DHTs plan #3183

synctext opened this issue Nov 3, 2017 · 14 comments
Assignees

Comments

@synctext
Copy link
Member

synctext commented Nov 3, 2017

What is our goal with hidden seeder discovery?

We now have the Libtorrent DHT and the Python based DHT using different ports.

  • Do we create a Libtorrent patch to expose their DHT?
  • Do we create a DHT community?

Related issues. DHT self-DDos: #3065 Latency influence on Libtorrent: #2620

@qstokkink
Copy link
Contributor

Related to #109

@qstokkink qstokkink added this to the Backlog milestone Nov 8, 2017
@synctext
Copy link
Member Author

synctext commented Nov 15, 2017

In introduction point we might require to do "DHT address rewriting" and keep track of DHT ID of the seeders for parsing incoming DHT responses. (DHT NATting) 👎
Do we want to parse uTP and DHT traffic or can we not avoid that?

@egbertbouman
Copy link
Member

We could solve this in a number of ways, for instance:

  • Run a single (pymdht) DHT instance for each exit node and use this to do all our DHT requests. This would either require detecting and parsing DHT traffic coming from libtorrent, or somehow send the DHT request to the exit node manually (e.g., using a Dispersy message).
  • Create our own DHT community. Since pymdht does not support BEP-44, the main benefit is that we could store additional information in the DHT, avoiding the use of the key-request/response stuff.
  • As a secondary mechanism maybe we could use the top-n most frequently used public UDP trackers (from our database).

@synctext
Copy link
Member Author

synctext commented Mar 5, 2018

Attack nearby keyspace:
BEP44 allows keys to be the hash of an ed25519 public key with arbitrary 1k signed content. However, this protocol extension introduces new problems. An attacker can identify a hash they want to disrupt and generate nearby hashes so that responsibility for a hash moves onto nodes the attacker controls. This attack is addressed in dht_sec, but there is no analogous protection for BEP44 dht_store keys.
https://gist.github.com/substack/eadd13302d785dc13aac#file-results-markdown

@egbertbouman
Copy link
Member

egbertbouman commented Apr 16, 2018

Status:

image

  • next design: find_peers only return peers that the sender has punctured with. For one indicated peer, a puncture has already been sent. Optionally, use the introduce_to for the second attempt.
  • Possible idea to test this in the wild: hard-coded hidden seeding test swarm.

@synctext
Copy link
Member Author

ToDo in Design 3: caching mechanism for popular torrents or Tribler channels. (e.g. no trackers needed)

@qstokkink
Copy link
Contributor

Will this be replacing the current puncturing infrastructure?

If so, we should make sure we have all of the existing communities ported to IPv8 before this goes live.

@egbertbouman
Copy link
Member

egbertbouman commented Apr 17, 2018

At first, we were thinking about using the introduction-request/response to create a walk for every DHT request you make. The random walk would also run, just like any other community, to discover new peers and fill the Kademlia-based routing table.

After thinking a bit more about this, I could also use the existing message types, and just add an IPv8 puncuture-request to it. This would more or less be the same thing as separate walks, but would avoid having to pollute the introdution-request/response messages with additional fields. @syntext Do you think this would be an acceptable alternative?

@qstokkink I think we can keep the puncturing stuff as is.

@qstokkink
Copy link
Contributor

@egbertbouman that would be nice, then we can immediately merge this into the codebase.

@egbertbouman
Copy link
Member

egbertbouman commented Apr 23, 2018

I'm currently working on restricting node IDs in the DHT community. Based on the bittorrent DHT (see the bittorrent DHT security extention, I was thinking of the following:

first 24 bits of crc32c(ip & 0x030f3fff) + first 136 bits of sha1(public_key)

Note that, contrary to bittorrent, I'm not adding a random number in to the mix, Bittorrent does this to support multiple users behind the same IP (due to NATs). However, in the DHT community all users should have their own unique public keys, so we can do without the random number. This makes calculating a node's ID deterministic.

The number of possible node IDs would be the same as in bittorrent:
image

@egbertbouman
Copy link
Member

Some basic information about the current implementation of the DHTCommunity (see #3642).

Routing table

  • Peers are discovered using a random walk.
  • Newly discovered peers are added to the (Kademlia) routing table. Peer IDs are calculated as explained in the previous comment.
  • Nodes in the routing table are pinged periodically. Bad nodes are removed once we find a better node to replace it.

Finding nodes

When finding nodes close to a certain target id, we first find the closest nodes in the local routing table. Next, we will start a walk from each of these nodes in order to find even closer nodes. This is done by sending a find message. Nodes receiving such a message will respond with a list of closest nodes, and will also send an IPv8 puncture-request the first node on this list. This works very similar to the introduction-request/response mechanism. The querying node will keep sending find messages to closer nodes, until the maximum iterations have been reached, or no more close nodes can be found.
The walks are always conducted such that the is no overlap. A walk will stop if the only possible next step has already been reached by another walk.

Storing values

Values are stored by first finding the closest nodes. Once these nodes are found, each node will be sent a store message, stating which key-value pair they should store.

Retrieving values

This works very similarly to the process of finding nodes, with the exception that the find messages can also respond with a value.

Caching

The DHTCommunity uses the same caching mechanism as Kademlia. When a value has successfully been retrieved, the receiving node stores the key-value pair at the most recently visited node that did not have the value. To prevent over-caching, the expiration time of such a cache entry depends on the number of nodes between the caching node and closest node.

Very simple experiments are available for DAS5/localhost using Gumby. See https://jenkins.tribler.org/job/Experiment_dht for the DAS5 Jenkins job.

@egbertbouman
Copy link
Member

I've added some new features to the DHTCommunity (#3693). It's now possible to create both signed and unsigned values.

I've also added a new community for discovering and connecting to peers, called the DHTDiscoveryCommunity. It works as follows. When storing a peer A under key K, A will first lookup the peers closest to K using the DHTCommunity. Next, A will send these peers a store-peer-request indicating that it wants to be stored under key K. To keep the "connection" to the storing peers open, A will periodically send a ping message. If A stops sending ping messages, it will automatically be removed from the DHT.

When a peer B wants to find and connect to peer A using its key K, B will first lookup the peers closest to K. Next, B will request any peers stored under K by sending a connect-peer-request to these peers. Peers receiving the connect-peer-request will send a puncture-request A (since they have an open connection to that peer), and return a connect-peer-response that includes the address of peer A. B has now discovered peer A and can connect to it.

@qstokkink
Copy link
Contributor

@egbertbouman is this issue resolved now?

@egbertbouman
Copy link
Member

@qstokkink Yep, it's resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants