Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring IPv8 class structure #123

Closed
2 tasks
ichorid opened this issue Apr 27, 2018 · 14 comments
Closed
2 tasks

Refactoring IPv8 class structure #123

ichorid opened this issue Apr 27, 2018 · 14 comments
Assignees
Labels
priority: low Should be addressed at some point in the future priority: medium Enhancements, features or exotic bugs

Comments

@ichorid
Copy link
Contributor

ichorid commented Apr 27, 2018

Current IPv8 classes inheritance and composition graph:
(Achtung! The width of the picture could break your monitor's bezel!)
classes_overlay

Some problems are immediately apparent. For example:

  • EccCrypto class is inherited by TunnelCommunity twice, from TunnelCrypto and directly from Overlay, in the member with the same name(!) crypto;
  • Community -> EZPackOverlay -> Overlay -> EndpointListener is essentially a single class spread out through 4 levels of abstraction. Even if, at some point, we would want to create new kinds of overlays, with their own families of communities, we should not create redundant umbrella classes for that purpose.
  • master_peer is used in every community. Maybe push it to Community class?

On high-level design, there are some more notes:

  • Overlay/Community is not a TaskManager. It uses the TaskManager for its functionality. Therefore, it should include a (reference to) TaskManager as a member. It would help the threads mess, BTW.
  • One expects a Community to be either an Overlay, or use the Overlay functionality/services (like crypto and addressing scheme).
  • Again, if we always encrypt all of our communications, communities should not be concerned about that at all. Encryption should be handled by an/the Overlay interface/functionality. BTW, this design would get really useful when we would move to "microservice + endpoint_in_C" architecture.

In general, the programmer does not want to jump several levels up in inheritance hierarchy just to know how the method he looks at works. The program should be broken into small, isolated fragments that are easy to understand and maintain. Superfluous usage of inheritance mechanism does not help that at all.

Resume:
may I chop that stuff into pieces, and assemble something beautiful out of these?

EDIT:

  • Develop UDP packet throughput test for communities.
  • Develop 1000 peers handling test for communities.
@ichorid ichorid self-assigned this Apr 27, 2018
@synctext
Copy link
Member

may I chop that stuff into pieces

Can you please first create a few performance tests? Processing incoming UDPs, generation them, and CPU usage when overlay has 1000 connected peers?

Plus also a 'perfect architecture' design, to discuss, before commensing with coding?

@qstokkink
Copy link
Collaborator

Ah, the class diagram has gotten so beautiful since the Dispersy days: it brings a tear to my eye. Anyway, sentiment aside, let me respond to your points:

  • The TunnelCommunity indeed has its own ECCrypto subclass. I have no emotional attachment to the way this is stored in the class. Change it if you want.

  • I agree that EZPackOverlay could be merged into the Community. Other than that, I think the abstraction is useful for future compatibility.

  • But.. master_peer is part of the Community class?!

  • The TaskManager is a way to tie Twisted calls to a context. This means that all calls made by a Community/Overlay are cancelled when it is unloaded. This is incredibly useful. Please leave this as is.

  • The Community is an Overlay, we just don't have any other implementations yet. Do note that we plan to move on from Dispersy and switch to a protobuf message format. So there will be other implementations of the Overlay interface.

  • Coming from Dispersy, I am really hesitant to do things on autopilot. It very quickly turns into programmers fighting the framework. Don't get me wrong, I think we should expose the functionality. Functionality should always be included by the concious choice of the Overlay programmer though.

Concluding: we have seen beautiful design before and it turned out to be unusable. I would take practical and ugly over beautiful any day of the week

@qstokkink qstokkink added the priority: medium Enhancements, features or exotic bugs label Apr 28, 2018
@qstokkink
Copy link
Collaborator

Actually, I think the class structure/architecture is pretty spot on right now (other than what I discussed above). Most of the Community classes themselves would benefit greatly from a code cleanup and and performance analysis (disk I/O and CPU usage) though.

@ichorid
Copy link
Contributor Author

ichorid commented Apr 28, 2018

@synctext, let's consider the creation of packet throughput / 1000 peers tests the first step of the refactoring process. These result would guide the further development.

@ichorid
Copy link
Contributor Author

ichorid commented Apr 28, 2018

@qstokkink, yeah, master_peer is indeed a member of Community class. But should it really be a static variable that is always overridden by child classes? What is the reason behind this design?

@qstokkink
Copy link
Collaborator

@ichorid the master_peer should be the unique "key" for all instances of a particular Community subclass. In fact, a Community can not run without it and would clash with other communities if it were a duplicate. As such, it makes no sense to allow communities to change this at runtime (although, of course, through the magic of Python you can if you really want to).

@synctext
Copy link
Member

Most of the Community classes themselves would benefit greatly from a code cleanup and and performance analysis (disk I/O and CPU usage) though.

Very true indeed. IPv8 versus
1 ) Dispersy deprication, 2) Download classes refactoring, 3) torrent collecting, 4) Trustchain speed analysis, and 5) AllChannel replacement. Others?

@synctext
Copy link
Member

Half a month of work to refactor video-on-demand?

@ichorid
Copy link
Contributor Author

ichorid commented Apr 28, 2018

@synctext, the problem is, Dispersy deprecation (and other stuff you mentioned) would be much harder to do, if we don't get the base (i.e. IPv8) right first. It would take t+c time to do each one of these, and then take additional c time to refactor them after we finally get to refactor IPv8 to prepare it for microservice/C-library transition. The overhead of supporting current codebase would have to be paid twice if we opt for "features first, refactor later" sequencing:

  • "Refactor IPv8 -> Dispersy deprecation -> AllChannel replacement -> Trustchain speed analysis -> Download Classes Refactoring -> VoD refactoring" would go 50%-100% times faster than:
  • "Dispersy deprecation -> ... -> Refactor IPv8 -> refactor everything for compatibility with refactored IPv8.

We know that we'll have to at least ditch Twisted and make IPv8 a separate service by the end of the year. And when we get new developers for the team we don't want them to have to learn two codebases (For example, I really appreciated what @qstokkink did with IPv8, and that I don't have to learn Dispersy programming).

@ichorid
Copy link
Contributor Author

ichorid commented Apr 28, 2018

BTW, there are almost no calls to Twisted async stuff in IPv8. And those that are still there could be trivially replaced with Python threading module calls etc. That would really help with getting it ready for transition to Python3.

@synctext
Copy link
Member

The overhead of supporting current codebase would have to be paid twice if we opt for "features first, refactor later" sequencing

productive discussion... For a major refactoring project like Tribler it's vital to do a deep dive and understand the illness of the patient. Parts of our code base are said to Frightening Small Children and Disconcerting Grown-ups.

Please dive also into the other major headache parts of the code, now that you have profiled exit nodes and analysed IPv8. @ichorid Do you think it is a good idea to develop the required knowledge for our future micro-architecture and do a health check of all code components?

I'm leaning towards standardizing on IPv8 and re-factor all communities into "event-driven Distributed Apps". The community code is the weak soft underbelly of our code base I believe. Agreed, @qstokkink? Research Question: can we devise a "trustworthy gossip" messaging middleware layer on top of the IPv8 one-to-one communication layer. As a scientist it's not about the twisted refactoring, but about the scientific breakthrough of reasoning with a simplistic trust function (trust==Trustchain hop distance) and evolving them further for coming decade.

Within a 2 week sprint create a table like this? (then generated from source code annotations within a Jenkins job? :-)

Tribler component quality assesment description
Libtorrent wrapping code Current quality level unknown. Towards a lock-free architecture, lock isolation (merely Libtorrent facing modules) and step towards async code architecture
we are hammering the file system and IO 😢 probably the key reason for Tribler resource hunger. Where does it come from?
rewrite the video-on-demand code 😢 Usability of streaming mode could be much improved, we're never an alternative for Youtube without this
Fixing Dispersy 😱 💩 💥 Served a purpose for years, with much pain it is now quite stable, we now know what to improve with database and Bloom complexity
Allchannel 2.0 and radical using Libtorrent for Allchannels 😢 "Full-Sync" needs to be replaced. Essential for scalability and spam prevention
IPv8 😍 Anything which cuts thousands of lines is a blessing
search community ? trivial to spam
Channel communities ? slow to sync, costly walking and database IO

@ichorid
Copy link
Contributor Author

ichorid commented Apr 28, 2018

@synctext, challenge accepted. Two weeks to produce a detailed report on the state of the code.

@qstokkink
Copy link
Collaborator

The community code is the weak soft underbelly of our code base I believe.

It depends. The community interface is like a chef's knife:

  • If you know what you are doing, you can make exactly what you want (a scalable, attack-resilient and performant community).
  • If you don't know what you are doing, you will hurt yourself and possibly even others around you.

And then there's Dispersy, which forces everyone to grab the knife by the blade instead of the handle.

@qstokkink qstokkink added the priority: low Should be addressed at some point in the future label Jun 12, 2018
@qstokkink
Copy link
Collaborator

I think this issue no longer fully aligns with our future vision and is too broad. We can refactor components when we see fit, supported by an issue of properly focused scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: low Should be addressed at some point in the future priority: medium Enhancements, features or exotic bugs
Development

No branches or pull requests

3 participants