Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send DHT scrape requests to arbitrary info-hashes #3701

Closed
devos50 opened this issue Mar 8, 2019 · 24 comments
Closed

send DHT scrape requests to arbitrary info-hashes #3701

devos50 opened this issue Mar 8, 2019 · 24 comments
Assignees
Labels

Comments

@devos50
Copy link
Contributor

devos50 commented Mar 8, 2019

For Tribler, we are looking for a way to fetch the overall health of a swarm, in order to share this around and essentially gossip torrent healths around. One problem here is that we need to get the number of seeders and leechers in a swarm, preferably without relying on a UDP or HTTP tracker (since we also want to get the health of magnet-only torrents).

The current approach we are using for this, is to start a download of the swarm in upload_mode. After we have fetched the metadata of the torrent (which is indicated by an alert), we inspect the peers we are connected to and simply take the length of this list. However, this approach is suboptimal for two reasons. First, we do not get an accurate representation of the peers in the swarm with this method. Second, it seems that we are unable to say whether a peer is a seeder or leecher (by inspection of the have field), potentially because the peers have not exchanged which pieces they have?

For this reason, we are looking for a better approach to get the health of a torrent without using trackers. I played a bit around with the get_dht_peers method and that seems to work reasonably well. Doing several calls to get_dht_peers was suggested in a post on Stack Overflow but I'm not sure how accurate this would approximate the health of a swarm. Another approach could be to get a few peers and manually sending PEX messages and 'crawl' the swarm. Would this work?

Any suggestions on this are welcome!

@MassaRoddel
Copy link

According to my experiences: DHT get_peers is not reliable because you will get a lot of fake peers. You even get peers for none existent info hashes. Unless you contact each peer you do not know if they exist.
And now that there are bots that reflect everything you send them it would seem as if there is at least one peer with the same amount of data for a torrent like you self have.

@devos50
Copy link
Contributor Author

devos50 commented Mar 11, 2019

@MassaRoddel thank you for your response. I did some more testing and I'm able to get > 1000 peers from the DHT for popular swarms. My next preferred step would be to send PEX messages and get the actual peers in the swarm.

While doing so, I got the same problem as in this SO post. Only a small percentage of the peers respond to my handshake, which might be caused by firewall/NAT issues. However, a single peer should be enough to bootstrap myself in the swarm and manual send PEX messages to others.

@arvidn
Copy link
Owner

arvidn commented Mar 11, 2019

There's a way to scrape a swarm over DHT. See BEP 33. I don't think libtorrent supports sending scrapes, but it responds to them

@devos50
Copy link
Contributor Author

devos50 commented Mar 11, 2019

@arvidn that's exactly what I need. It also includes information about the number of seeders and thus the health of a torrent. I will try to implement this and get back to you if I have some results. Thanks 👍

@devos50
Copy link
Contributor Author

devos50 commented Mar 12, 2019

@arvidn I've tried a bit to send a get_peers with a scrape=1 in the request. However, the selective exposure of methods in the Python bindings prevent me from sending messages to other DHT nodes.

I noticed in the documentation that there is a dht_direct_request method which should allow me to send arbitrary messages to other DHT nodes, however this method is not accessible from Python. Another idea I have is to determine the DHT nodes in the routing table and send them a get_peers message from another UDP port on my machine. However, I'm not entirely sure whether this would result in a response.

Do you have any other suggestions on how to send a get_peers message with a scrape from Python?

@arvidn
Copy link
Owner

arvidn commented Mar 12, 2019

dht_direct_request should probably be added to the python bindings. However, get_peers is a search, it's not sufficient to just send a message to the peers in the routing table, you'd have to traverse the nodes until you find the nodes that host the info-hash.

I would expect a separate UDP port would work, but libtorrent opens up its port with UPnP, NAT-PMP and PCP also.

@devos50
Copy link
Contributor Author

devos50 commented Mar 12, 2019

@arvidn thanks for your response. The connectivity issue is indeed a problem when doing requests from another port. We run into the same problem in our IPv8 networking library and we use our custom decentralised NAT puncturing algorithm for this.

If I understand it correctly, I should first do a get_peers request to get the nodes that are hosting the torrent with the infohash (which should work by invoking the dht_get_peers method from Python). Next, I can send my custom get_peers message to these nodes using dht_direct_request?

@arvidn
Copy link
Owner

arvidn commented Mar 12, 2019

yeah, if you actually get the nodes back in the response, that should work. I can't recall off-hand if you do though, or if you just get the peers back.

@devos50
Copy link
Contributor Author

devos50 commented Mar 13, 2019

@arvidn could you please expose the dht_direct_request method in the Python bindings? I tried to do it myself by adding a single line to here but it did not work when I tried to invoke it from Python (it probably has something to do with the types of the passed parameters).

Alternatively, it might be helpful to add a scrapeargument to the dht_get_peers method so we can fully utilize BEP33. What do you think of this?

@devos50
Copy link
Contributor Author

devos50 commented Mar 14, 2019

I managed to make some progress on this issue by making two changes to libtorrent. First, I always send a scrape=1 entry when sending a get_peers message. Second, I changed the python bindings and defined the dht_pkt_alert bindings + exposed the pkt_buf member:

    class_<dht_pkt_alert, bases<alert>, noncopyable>(
        "dht_pkt_alert", no_init)
        .add_property("pkt_buf", get_pkt_buf)
        ;

and

bytes get_pkt_buf(dht_pkt_alert const& alert)
{
    return std::string(alert.pkt_buf().data(), static_cast<std::size_t>(alert.pkt_buf().size()));
}

It would be nice if someone with more experience can give some pointers on how to write a better converter for libtorrent::span objects :)

At this point, I'm listening to dht_pkt_alert messages and parsing the packets myself.

I think sending DHT scrapes should be possible in libtorrent but I am not sure about the best way to do so 👍

@arvidn arvidn added this to the 1.2.1 milestone Mar 30, 2019
@arvidn
Copy link
Owner

arvidn commented Apr 25, 2019

sorry for the delay. That seems like a reasonable converter. I would expect you to construct a bytes object directly, or maybe even just say:

   return {alert.pkt_buf().data(), alert.pkt_buf().size()};

Since the returns type is known.

I would be open to patches against RC_1_2 to add support for this

@arvidn
Copy link
Owner

arvidn commented Apr 27, 2019

#3810

@arvidn
Copy link
Owner

arvidn commented Jun 29, 2019

I think landing of #3810 is not sufficient to address this issue, right? There's still no way to send a DHT scrape request.

@arvidn arvidn modified the milestones: 1.2.1, 1.2.2 Jun 29, 2019
@devos50
Copy link
Contributor Author

devos50 commented Jul 6, 2019

@arvidn that’s right. The ability to send scrape requests would be very helpful for our following ongoing work: Tribler/tribler#4256

@arvidn arvidn modified the milestones: 1.2.2, 1.2.3 Nov 13, 2019
@arvidn arvidn modified the milestones: 1.2.3, 1.2.4 Jan 9, 2020
@arvidn arvidn modified the milestones: 1.2.4, 1.2.5 Feb 9, 2020
@arvidn arvidn modified the milestones: 1.2.5, 1.2.6 Mar 13, 2020
@arvidn arvidn modified the milestones: 1.2.6, 1.2.7 Apr 17, 2020
@arvidn arvidn removed this from the 1.2.7 milestone May 31, 2020
@arvidn arvidn added this to the 1.2.8 milestone May 31, 2020
@arvidn arvidn modified the milestones: 1.2.8, 2.1 Aug 17, 2020
@arvidn arvidn changed the title Getting accurate torrent health information without actively downloading data send DHT scrape requests to arbitrary info-hashes Aug 17, 2020
@arvidn arvidn modified the milestones: 2.0.2, 2.0.3 Jan 11, 2021
@arvidn arvidn self-assigned this Apr 8, 2021
@arvidn arvidn removed this from the 2.0.3 milestone Apr 8, 2021
@stale
Copy link

stale bot commented Jul 7, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 7, 2021
@Ofenhed
Copy link

Ofenhed commented Jul 8, 2021

Would this issue include adding support for sending DHT scrapes to the C++ library as well, or would that already be possible?

@stale stale bot removed the stale label Jul 8, 2021
@stale
Copy link

stale bot commented Oct 7, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 7, 2021
@Ofenhed
Copy link

Ofenhed commented Oct 7, 2021

Would this issue include adding support for sending DHT scrapes to the C++ library as well, or would that already be possible?

If this is the case, and I could get some guidance, I'd be willing to take a stab at this.

@stale stale bot removed the stale label Oct 7, 2021
@stale
Copy link

stale bot commented Jan 6, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 6, 2022
@markmdscott
Copy link

Hey @arvidn looks like @Ofenhed is interested in taking this on. If this is alright with you can you provide some guidance with this?

@stale stale bot removed the stale label Jan 9, 2022
@arvidn
Copy link
Owner

arvidn commented Jan 10, 2022

sure. I think this would probably look similar to the other DHT features exposed via the session object. like dht_announce().

The result from the scrape should be returned as an alert. Callbacks across threads would be error prone and complicated. The dht_live_nodes_alert can be used as an example.

@markmdscott
Copy link

@Ofenhed is Arvid's walkthrough on examples for the proposed solution good enough for you to get started?

Or do you happen to be on vacation now?

@Ofenhed
Copy link

Ofenhed commented Jan 14, 2022

@markmdscott Yeah, sorry for not replying. Short story: my current life situation unfortunately doesn't enable any personal projects at the moment.

@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 16, 2022
@stale stale bot closed this as completed Jun 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants