Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dispersy protocol specifications #422

Open
DavidXanatos opened this issue Apr 6, 2015 · 9 comments
Open

Dispersy protocol specifications #422

DavidXanatos opened this issue Apr 6, 2015 · 9 comments

Comments

@DavidXanatos
Copy link

Hello,

I wanted to ask if there is any written down detailed protocol specification for "dispersy".
I would like to make a C++ implementation (I assume there is no at the moment) of it, in order to add triblers new features to an other BT client. If there already is a C/C++ implementation please point me to it.
I looked into the wiki on github but there ware only papers describing the system in vague therms, no details like packet structure or identifiers.

Given the size of the project I would prefer not to have to port the implementation from source but based on some detailed structured documentation, or alternatively with some help from one of the devs. I'm sure you can imagine how hard it is to wrap once head around a half a Meg of code without any guidance.

Cheers
David X.

@synctext
Copy link
Member

@DavidXanatos
There is only some documentation of Dispersy, the wire protocol and tutorials docs (partly outdated):
https://github.com/Tribler/dispersy/blob/devel/doc/wireprotocol_1.org#dispersy-introduction-request-246
https://github.com/Tribler/dispersy/blob/devel/doc/wireprotocol_2.org#dispersy-introduction-request

Would be great to have a clean and compatible C++ implementation. A lot of Bittorrent clients would love to use a "LibTribler" with anonymous downloading and search. However, Python is always too much for them..

@DavidXanatos
Copy link
Author

I took a look into the code already and have a few questions,
as far as I can see on a first glance the search is a slightly improved query flooding approach,
that is needer a scalable nor exhaustive search scheme.
The only difference to a regular query flooding it seams to select a few "taste buddy's" instead of solely random clients.
Is that correct or have I missed something? And if I did how do you make it scalable and ensure that if some user some ware has a particular rare torrent it can always be found by anybody given the right search keyword.

Also there is a compatibility issue with the EC Crypto used: http://forum.tribler.org/viewtopic.php?f=4&t=7245 granted using openssl would be a solution but I proffer Crypto++

@synctext
Copy link
Member

hey!
it seems like a simple search mechanism.. But it uses a giant caching mechanism. Each client stores roughly 50k swarm names in it's local database, so you get a lot of hits. Gnutella required seeding yourself or knowing a seeder directly; poor performance.
Using a similarity function provides a dramatic improvement is search precision, recall and general scalability. (aka taste buddies help finding rare torrents)

Full details: "4P: Performant Private Peer-to-Peer File Sharing", http://www.p2p-conference.org/p2p14/wp-content/uploads/2014/09/218.P2P2014_22.pdf (we have running code of this private search mode, but this increases search time to 30+seconds, unacceptable)

regarding Crypto++, yes, 0-appending by hand in Tribler is not ideal. If you can clarify the issue and could propose improvements, please open a new ticket.

@DavidXanatos
Copy link
Author

Hi,

To what extent the 4P mechanism from the paper is implemented in tribler?
Obviously the file transfer anonymization in tribler uses a different scheme.
And on a first glance I did not notice any code parts that would relay search packets, meaning that the search was only performed on a set of random nodes + some "taste buddy's", is that correct or have I missed something?

Regarding the exhaustive search issue, how does it help me to find something most likely almost non of my "taste buddy's" would ever have any interest in,
more generally asked how does it improve a search for something rare that falls completely outside of my "taste", something rare one needs only once in his tribler installation's lifetime.
I mean thats a common case with software, your every days "taste" may be for example Animes and Series, but if you need a specific obscure "unlicensed" tool the search for it will probably be as bad as with a old school query flooding network.
How does tribler handles such cases?

regarding Crypto++, I will open a new issue for that later...

@synctext
Copy link
Member

All results in the paper have been obtained by Niels with a full 4P implementation. This is possibly the correct branch, but it's too slow for production usage: Tribler/tribler@devel...NielsZeilemaker:private-search-new

Indeed, if none of your random + taste buddies have it in their 50k item caches, you are out of luck. We ran extensive crawlers for the past 12 years on Gnutella, Kazaa, and other networks. People rarely search for rare content:-) My experience says that mostly the problem is the lack of seeders (or 2Kbit/sec download speed), not the search part. You would like to work on improving the query propagation part?

Our credits for seeding swarms would fix a lot if sufficient users start picking it up. Duplicating private communities. We'll have it ready around summertime (see Tribler/tribler#3).

@synctext
Copy link
Member

O yes, preventing typos is something that would help search tremendously our datasets tell us.
Cheap 1 character fixes: https://pypi.python.org/pypi/python-Levenshtein/

@DavidXanatos
Copy link
Author

Well the first step would be for me to make a c++ port (that preferably works with crypto++), and therefore the relevant questions for the start are what is, what is currently implemented in the productive tribler code, not what was tested synthetically or just thought about.

Its certainly true that most people search most of the time for popular content, but in my opinion it is just as important to have a successful search experience when oddly looking for something odd :D
Especially with non media type content, like (non game) software the only way to go is a system that is capable of an exhaustive search, and that can be implemented scalable only with some DHT like approach I think.

@NielsZeilemaker
Copy link
Contributor

Hi @DavidXanatos there is no flooding in Tribler. We simply send messages to our connected peers, and they reply. No forwarding is implemented whatsoever.
Moreover, instead of random peers, we try to find peers which have similar preferences as us. These peers should provide us better search results.

Organising an overlay in this manner will reduce the cost associated to searching substantially, as we're not flooding the system. But as you already mentioned, the long tail of search results (in-popular items) will problably not be found. The caching system helps though.

Look at my 4P paper to see what effect it has on searching. From the top of my head, recall went down to 60% vs ~100% for flooding. This doesn't include caching, which will problably improve the recall.

The currently implementation has nothing to do with 4P, as 4P actually does forward queries to a limited number of neighbors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants