Tribler code review #3646

ichorid · 2018-05-24T15:04:39Z

This is the big list of things I found out during the review.

TODO:

run_tribler.py: __main__ runs twice, refactor in 2-3 separate files.
Session object should be split up into several smaller objects and remixed with TriblerLaunchMany. There are several completely dead methods that are never used anywhere (except unit tests).
DBUpgrader should be de-Twistified and moved to run before Twisted reactor.
LaunchManyCore should be completely refactored by splitting into the series of smaller objects/methods. Its torrent-handling functionality should be moved to the torrent manager.
APIImplementation dir should be scrapped completely after refactoring LaunchManyCore.
RemoteTorrentHandler.py contains almost completely dead code. EDIT: it used to collect torrents in the background for anonymity and quality of service, but was broken a long time ago. Not gonna fix, should just redo this in AllChannel2.0.
SqliteCacheDBHandler.py is full of awful old code that should basically be completely gone when we move to AllChannel2.0. For example, PeerDB is only written, and is never read. We should keep SQL usage to a minimum to be as stateless as possible.
Notifier.py - subjects should not be pre-set in Notifier. Instead, they should be defined dynamically by users of Notifier object, in their respective modules.
Our SQL access routines are scattered all over the code, implementing the same functionality many times. Maybe we should switch to a unified modern solution, like ORM.
Libtorrent managing code: 💥 💣 💣 💣 💣 💣 💣 💣 💣 😱
2k lines of code in 2(two!) objects! And some more high-level stuff is spread out through the rest of the codebase. There is a LOT of valuable experience in this code, so refactoring should be done carefully.
TorrentDef.py - includes dumb TorrentDefNoMetainfo object that forces everyone to check the object type. Should be refactored to become one with TorrentDef. Which should be moved into TorrentMgr module.
TriblerStatistics object should be used as a parent class for Session, instead of being wrapped around.
I stopped after half an hour of reading Dispersy code. My sanity is still precious to me...
Reviewing every file in SOCKS proxy is not required, everything is fine there.
Various flavors of Torrent object are added again and again by different subsystems...
In general, the most common problem is stuff being spread all over the place. It breaks isolation and greatly impedes understanding the code.

Subsystem	Overall state	Outstanding problems	Solution
Startup	Complete mess	Spread through several big objects and files.	Rearrange objects and methods into a single module.
Torrent handling	19-th century cookbook: lots of cool recipes, funky language, no structure.	There are several types of distinct torrent objects with repeating functionality; all torrent-managing code is in 2 giant objects; no clear sync/async separation; pieces of code are scattered all around the codebase;	Step 1: move everything into a single module; Step 2: unify torrent objects; Step 3: refactor to no end;
RemoteTorrentHandler	Evolutionary atavism	Does not work	Delete and forget.
SQL handling	Ancient magic	Everyone do their own magic.	Gradually shift to ORM, while writing new and refactoring old code.
Core utility code (Tribler.Modules.*)	A collection of random pages torn from cookbooks of different centuries.	Lack of structure.	Gradually move everything in its place.
Dispersy	The Necronomicon	Cthulhu Fhtagn!	Reimplement Search, Channels, AllChannel with IPv8, scalable local gossip algorithm and DHT.
TFTP module	Minion of the Old Gods	Obsolete	Delete and forget.
SOCKS proxy	Neat and tidy	Is in the Tribler, instead of IPv8 TunnelCommunity	Move to IPv8 TunnelCommunity.
REST API	Neat and tidy	Everything is fine, almost no cruft there.	Praise and care.
Torrent Checker	Neat enough	Code is fine, though the algorithm is not effecient.	Enhance the algorithm.
Category filter, Credit mining	OK	Could use some cleaning, but basically OK.	Keep clean
Tribler Config	Telephone book	Ok, but a lot of boilerplate code and copy-paste make the potential programmer prone to error.	Use Python .dict trick to reduce code size 3 times. And make individual subsystems add options on their own to common object.
TriblerChain adapter	A love note to IPv8	No problems. At all.	Praise and care.
TriblerTunnel	Another love note to IPv8, much more passionate and longer.	Some methods are too long and not grouped. Otherwise, everything is OK.	Create member objects for subtasks and split some methods into more readable form.
Market community	Writings of a heretical sect of madmen who split from the Church of the Old Gods to establish a cult of their own.	Total overengineering.	I don't know.

The text was updated successfully, but these errors were encountered:

ichorid · 2018-05-25T12:51:00Z

Monitor-bezel-crushing image of Tribler module imports.

On the left side, you could see the Gratutious Spaghetti Pasta of Dispersy. Basically, everything is calling everything there...
Tribler market and tunnel community are connected to it by a feeble umbilical cord of import is_valid_address and import call_on_reactor_thread respectively.
Tribler.community.market is in the center. It shows some signs of hierarchy.
IPv8-related imports show clear hierarchy, though its a little bit flat.

Zoomed in.
Dispersy:

Market:

IPv8:

(p.s. used pyreverse from pylint to generate these. Had to remove all decorators, since it can't handle different decorators with the same name, or when a decorator has the same name as the function.)

ichorid · 2018-05-25T16:36:01Z

It seems RemoteTorrentHandler contains completely dead code. Like, it's not used anywhere, or never ran. And when you break it, nothing happens, Tribler works OK. It seems to only write torrents DB, and never read it.

Guys, could someone comment on this issue?

qstokkink · 2018-05-26T10:46:07Z

It should be used here in the SearchCommunity:

tribler/Tribler/community/search/community.py

Line 98 in b096b9b

self._rtorrent_handler = tribler_session.lm.rtorrent_handler

You should be able to trigger this through searching for a torrent.

ichorid · 2018-05-26T14:34:57Z

The only place where something _rtorrent_handler is used is:

tribler/Tribler/community/search/community.py

Line 526 in b096b9b

    
           self._rtorrent_handler.download_torrent(candidate, infohash, priority=LOW_PRIO_COLLECTING,

But it is never run!

tribler/Tribler/community/search/community.py

Line 516 in b096b9b

infohashes_to_collect = [infohash for infohash in to_collect_dict

is always empty. You could literally comment out .download_torrent, and Tribler GUI works as usual. Channels, search, anonymous downloads, mining - everything works.

So, what the rtorrent_handler subsystem is intended to do?

ichorid · 2018-05-26T14:56:17Z

Basically, message.candidate is always empty here:

tribler/Tribler/community/search/community.py

Line 509 in b096b9b

to_collect_dict.setdefault(infohash, []).append(message.candidate)

qstokkink · 2018-05-26T15:54:32Z

@egbertbouman can you say anything about this?

synctext · 2018-05-26T16:11:11Z

It should collect 50k of torrents slowly in the background. The TFTP binary transfer of torrents never worked smoothly. You found another skeleton in our collection... We have an open issue for magnet links only Tribler operation. Discussion started 3 years ago #1150

devos50 · 2018-05-26T16:17:34Z

I can probably answer your question about the remote torrent handler module.

Another purpose of the module, besides what @synctext says, is to speed up the process of fetching torrent metadata. In particular, when clicking on a torrent in Tribler (i.e. a search result or a torrent inside a channel), it would try to fetch the torrent from other candidates. The torrent file itself would be sent using the TFTP protocol and the data was sent over the Dispersy endpoint (since this endpoint was already NAT punctured by Dispersy). I even think there is a separate prefix for this.

It used to be this way in the old GUI. When I built the new GUI, I added support for fetching torrent metadata using HTTP/UDP trackers and the DHT. Since (re)adding support for the TFTP transfer of torrent metadata was not very high on my priority list, this feature was never added back to the GUI. The code still exists in the original code base, together with a few tests (which are not really unit tests...). I highly doubt whether we want to re-add this feature again. For now, I would be fine if you remove the TFTP module and code that manages fetching torrents from others. Let's discuss this next week 👍

egbertbouman · 2018-05-27T13:41:16Z

@devos50 The RemoteTorrentHandler used to do a lot more than just TFTP torrent collecting, it's also capable of collecting from magnet/DHT/candidates. All this functionality is still in there, but I guess we stopped using this while creating the new GUI.

As far as TFTP goes, I don't think this has ever worked properly. If you look at the code you'll see the that it will only try to collect torrents that we already have 😢 This means that collect-requests are still coming in, but we never actually discover a new torrent.

ichorid · 2018-05-28T11:36:11Z

I talked to everyone involved, and it seems the consensus is: "we used this stuff in the old days, but now its irrelevant, so it should go". So, its removal is on the list.

synctext · 2018-05-28T13:39:17Z

Well, before it is deleted... We need to know the consequences. No full .torrent collecting anymore? Have measurements proven that magnet-based BEP9 collecting is now fast enough? Is there a design for anonymous downloads and getting the .torrent?

ichorid · 2018-05-28T14:29:30Z

@synctext, I've just ran Tribler with this stuff completely disabled, and it worked fine. Search, channels, downloads - everything ran fine. Torrent collecting is redundant now.

egbertbouman · 2018-05-28T14:36:39Z

@synctext I looked into the TFTP torrent collecting, and at the moment it's not working at all. That means that we are only collecting torrents from the DHT/trackers after some user interaction (e.g., downloading a torrent). For testing purposes, I did a 1-line fix, and Tribler magically started collecting TFTP torrents. So, if we are OK with the amount of torrents we are collecting right now, we could remove it.

Note that because we are not currently collecting torrents in the background, we are essentially gossiping all our downloads within the SearchCommunity...

synctext · 2018-05-28T15:05:04Z

Thank you for finding this out!
Quality of search result has been going down in Tribler, in my direct experience. That explains it, the torrent collected has gone down. How many torrents do people have on average? How many people have the healthy 50k torrents? Every Tribler client used to broadcast these health stats. How many dead torrents in the cache? We need a single dev for the content health I believe.

ichorid · 2018-05-28T16:03:57Z

@synctext, we're gonna scrape all of this in favor of AllChannel2.0, the Heir to Dispersy. We definitely want to have some mechanism for background torrent metadata exchange, at least for anonymity purposes. However, there is no point in reclaiming this old code.
For "Dispersy v1.9 refactored" release we could do away with the simple mechanism of returning more than one torrent on search/update request. That would be just enough to support anonymity and quality of service.

synctext · 2018-05-29T10:08:11Z

51% attacks are now real

We first need a spam solution, protection against an attack in AllChannels mechanisms with merely $1000 worth of cloud servers. With solid foundation we can focus on fancy features. Please focus on linking Trustchain non-cheap identities to content discovery. Perfect if we can all do that in 1 upgrade, but I prefer to first remove the SyncAll broadcast in AllChannel community, upgrade to IPv8, and discard all channels votes with Trustchain > 2 hops (and a future Max-Edge-Flow > 1 GByte).

That is a lot of features to get flawlessly deployed, correct?

ichorid · 2018-05-29T11:30:10Z

Trustchain non-cheap identities to content discovery. Perfect if we can all do that in 1 upgrade, but I prefer to first remove the SyncAll broadcast in AllChannel community, upgrade to IPv8, and discard all channels votes with Trustchain > 2 hops (and a future Max-Edge-Flow > 1 GByte).

- that is exactly what I meant by "Dispersy v1.9 refactored"!
To put to use the years of Tribler experience, we have to keep the minimal core of successful code/algorithms from AllChannel/Dispersy. To remove obstacles to future developments, we have to port it to IPv8. To make it secure enough for MarketCommunity, we have to add Trustchain linking.
No one talks about fancy features ))

ichorid · 2018-05-29T12:55:26Z

Each object in our codebase has its own SQL access methods...
Maybe we should discuss using ORM?

It would greatly simplify and unify all SQL-related stuff.

synctext · 2018-05-30T06:59:52Z

Yes, ORM is also on my 2019 wish list.

Our running old-IPv8 ORM code:
http://svn.tribler.org/abc/branches/giorgos/datasync/sqlalchemy/schema.py

Please take the time to read about our operational 2010 work with this code: https://repository.tudelft.nl/islandora/object/uuid:52b586ea-6144-4b4e-a5a1-b05255ce493a

Plus IPv8 in 2010 with ORM:
http://kayapo.tribler.org/trac/wiki/IPv8

synctext · 2018-05-30T07:14:21Z

There is a LOT of valuable experience in this code, so refactoring should be done carefully.

Impressive how fast you comprehend our code base.

ichorid · 2018-05-30T10:59:36Z

@synctext, I have to, to be able to keep up with the pace of development...

ichorid · 2018-06-07T15:46:01Z

IPv8 issues:
Boilerplate code:

auth = BinMemberAuthenticationPayload(self.my_peer.public_key.key_to_bin()).to_pack_list()
dist = GlobalTimeDistributionPayload(self.claim_global_time()).to_pack_list()
payload = MyMessage(self.BLABLABLA).to_pack_list()

is a classic example of a boilerplate code that is used in every message in every community.
@qstokkink , is there some reason why this is not moved to IPv8 _ez_pack, and the programmer have to do it by hand every time?

qstokkink · 2018-06-11T09:26:10Z

@ichorid the GlobalTimeDistributionPayload is deprecated and I want everyone making a message to be discouraged from putting it in there, by forcing them to add this extra line.

ichorid · 2018-06-11T09:49:23Z

@qstokkink, but it is in your wiki tutorial...
What should we use instead of it?

qstokkink · 2018-06-11T10:24:42Z

@ichorid good point, I'll remove that.

ichorid · 2018-08-17T18:43:46Z

I've done some quick check for duplicates in Triber with Clonedigger.
Market community is the first victim:

Source files: 33
Click here to show/hide file names

Clones detected: 71

753 of 3066 lines are duplicates (24.56%)

Parameters
clustering_threshold = 10
distance_threshold = 5
size_threshold = 5
hashing_depth = 1
clusterize_using_hash = False
clusterize_using_dcup = False

(p.s. the analysis does not include unit tests.)

market_duplicates.pdf

@devos50 , I guess this kind of analysis could be useful for your efforts with Market.

ichorid · 2018-08-17T19:12:01Z

Looks like IPv8 has some duplicates too:

Source files: 8

Clones detected: 138

872 of 6993 lines are duplicates (12.47%)

Parameters
clustering_threshold = 10
distance_threshold = 5
size_threshold = 5
hashing_depth = 1
clusterize_using_hash = False
clusterize_using_dcup = False

ipv8_duplicates.pdf

ichorid · 2018-08-17T19:23:58Z

Bear in mind, though, that Clonedigger's algorithm is language-agnostic and prone to false positives. So, take these stats with a proper grain of salt.

ichorid · 2018-08-18T12:33:13Z

For the whole Tribler/next, without Dispersy and Test directories, and without submodules:

Source files: 272

Clones detected: 2822

4892 of 30018 lines are duplicates (16.30%)

Parameters
clustering_threshold = 10
distance_threshold = 5
size_threshold = 5
hashing_depth = 1
clusterize_using_hash = False
clusterize_using_dcup = False

tribler_duplicates.pdf

ichorid · 2019-02-23T10:52:43Z

Anyone willing to continue this work - welcome!
Closing it for now.

ichorid · 2019-06-17T10:49:29Z

After a year of refactoring, Dispersy is dead and Market is moved to a submodule. Here are the updated diagrams:

ichorid added the long-term label May 24, 2018

ichorid self-assigned this May 24, 2018

qstokkink added this to the Backlog milestone Nov 27, 2018

ichorid closed this as completed Feb 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tribler code review #3646

Tribler code review #3646

ichorid commented May 24, 2018 •

edited

Loading

ichorid commented May 25, 2018 •

edited

Loading

ichorid commented May 25, 2018

qstokkink commented May 26, 2018

ichorid commented May 26, 2018

ichorid commented May 26, 2018

qstokkink commented May 26, 2018

synctext commented May 26, 2018

devos50 commented May 26, 2018

egbertbouman commented May 27, 2018

ichorid commented May 28, 2018

synctext commented May 28, 2018

ichorid commented May 28, 2018 •

edited

Loading

egbertbouman commented May 28, 2018 •

edited

Loading

synctext commented May 28, 2018

ichorid commented May 28, 2018

synctext commented May 29, 2018

ichorid commented May 29, 2018 •

edited

Loading

ichorid commented May 29, 2018

synctext commented May 30, 2018

synctext commented May 30, 2018

ichorid commented May 30, 2018

ichorid commented Jun 7, 2018

qstokkink commented Jun 11, 2018

ichorid commented Jun 11, 2018

qstokkink commented Jun 11, 2018

ichorid commented Aug 17, 2018 •

edited

Loading

ichorid commented Aug 17, 2018

ichorid commented Aug 17, 2018

ichorid commented Aug 18, 2018

ichorid commented Feb 23, 2019

ichorid commented Jun 17, 2019

Tribler code review #3646

Tribler code review #3646

Comments

ichorid commented May 24, 2018 • edited Loading

ichorid commented May 25, 2018 • edited Loading

ichorid commented May 25, 2018

qstokkink commented May 26, 2018

ichorid commented May 26, 2018

ichorid commented May 26, 2018

qstokkink commented May 26, 2018

synctext commented May 26, 2018

devos50 commented May 26, 2018

egbertbouman commented May 27, 2018

ichorid commented May 28, 2018

synctext commented May 28, 2018

ichorid commented May 28, 2018 • edited Loading

egbertbouman commented May 28, 2018 • edited Loading

synctext commented May 28, 2018

ichorid commented May 28, 2018

synctext commented May 29, 2018

ichorid commented May 29, 2018 • edited Loading

ichorid commented May 29, 2018

synctext commented May 30, 2018

synctext commented May 30, 2018

ichorid commented May 30, 2018

ichorid commented Jun 7, 2018

qstokkink commented Jun 11, 2018

ichorid commented Jun 11, 2018

qstokkink commented Jun 11, 2018

ichorid commented Aug 17, 2018 • edited Loading

ichorid commented Aug 17, 2018

ichorid commented Aug 17, 2018

ichorid commented Aug 18, 2018

ichorid commented Feb 23, 2019

ichorid commented Jun 17, 2019

ichorid commented May 24, 2018 •

edited

Loading

ichorid commented May 25, 2018 •

edited

Loading

ichorid commented May 28, 2018 •

edited

Loading

egbertbouman commented May 28, 2018 •

edited

Loading

ichorid commented May 29, 2018 •

edited

Loading

ichorid commented Aug 17, 2018 •

edited

Loading