Scalability of Dispersy channels #2106

synctext · 2016-04-19T11:57:55Z

This performance analysis work seeks to understand the effectiveness of the sync mechanism.
Linked to: #2039 .

For thesis work first explain that sync to 10 peers failed for example Flood community:-)
Repeat with minimal example community. Use DAS4 or DAS5 for 10..1000 Peer examples.
Repeat with actual channel community with torrent collection.
Redo this with the Q-algorithm, sync of magnet links only, scale to 1M (small) items.

qstokkink · 2016-04-27T14:00:32Z

A weekly small progress report:

For very small communities it tends to finish a full synchronization (3~6 peers/nodes)
Although I have waited for half an hour once for a 3 node - 3 message experiment to finish 😢
In my ongoing quest to find out why this happens at least I made debugging look pretty

synctext · 2016-04-27T20:31:05Z

thnx for the update. curious. If you have time, please insert a picture here of VisualDispersy for easy smartphone viewing..

qstokkink · 2016-04-28T10:43:21Z

It provides a realtime graph of all loaded communities (showing packets sent for the community in question), which looks like this:

It also provides a GUI for easy running of custom communities:

qstokkink · 2016-05-07T17:59:12Z

In other news: I supercharged Dispersy with some selective multithreading. This brought the 10 nodes, 10 messages-per-node experiment down from 6 minutes to 10 seconds on my localhost. Which was thusly fast, that it crashed my VisualDispersy tool. The 3 nodes, one node with 10000 messages was brought down from over 2 hours to 4 minutes. Note that this uses the fastest settings, it can still be slowed down at will to conserve system resources.

One kind-of-big problem which remains in the synchronization department is the message delivery to the communities. It seems some kind of buffer, in between the endpoint and the community overlay, is holding the messages and then delivering them all in one shot. For example, for a node the first 9400/10000 messages will suddenly pour in 2 minutes after the first packet was received and then it will continue to receive messages at a steady pace. This behavior is the real killer for the 1M messages community.

Once I have finished polishing up the code (properly logging/handling exceptions, addings comments, hunting down thread id asserts, etc.) I will share it.

synctext · 2016-05-07T19:22:21Z

message batching is that killer!

synctext · 2016-05-07T19:28:32Z

wow, that is a solid performance evaluation + dramatic improvement. Impressive thesis content!

Looking forward to seeing a demo Monday.

Message batching was introduced as a possible measure to improve performance. The idea was to process a few messages at once. It would reduce context switches and use IO more efficiently...

qstokkink · 2016-05-08T06:41:01Z

I did disable the batch feature (with the flag), so the buffering is occurring somewhere where it shouldn't be. Right now I assume it to be in the horrible spaghetti of community.on_messages.

synctext · 2016-05-08T11:41:57Z

self._dispersy._delay() interesting....

that might be from the time when peer not always included their full public key in messages, just the hash. Then messages got delayed until the full public key was known. That really can be refactored out..

qstokkink · 2016-05-09T20:29:18Z

Continuing with the conversion refactor:
I created a Gist to showcase how some of the internals and overlay mechanics of the Cap'n Proto-based conversion would look. As a use case I took AllChannel's votecast.
https://gist.github.com/qstokkink/681682e7e97acdc6c015106a13baaf54

I also have a Gist for Protocol Buffers:
https://gist.github.com/qstokkink/c74be266136c05ab280531a5da2146ef

Some observations:

Neither package supports string length limits without some annotation/option magic (see Gists)
For the same packet, Protocol Buffers has a much smaller packet size (14 bytes vs cap'n proto's 18 packed or 56 raw)
Protocol Buffers requires pre-compilation (ergo more files)
The Cap'n Proto syntax is much cleaner for users (imho)

All-in-all (also to save @whirm some work managing packages) Protocol Buffers might actually be the best choice.

EDIT:
I finished a standalone version of the Protocol Buffers serialization. Next step is creating tests and after that Tribler integration.

qstokkink · 2016-05-13T09:46:43Z

Small milestone: The standalone serialization/conversion has reached 100% unit test coverage and is ready for integration into Tribler.

lfdversluis · 2016-05-13T10:09:44Z

Interesting, so what is the gain from using (capnp vs) protobuf vs. struct.pack? I was thinking to also include this in the tunnel community on Tribler. If protobuf is in the official debian and ubuntu repos then it is indeed more favorable to use those.

qstokkink · 2016-05-13T10:42:29Z

The gain of capnp and protobuf over struct is in part readability of the message structures and also easy backward compatibility. Note that these approaches may in some cases (rare) waste a few bytes over hardcore manual struct definitions (although this is usually covered by smart message packing/compression).

For an example of the readability, the votecast message is defined as follows right now:

'!20shl'

In Protocol Buffers you would define the exact same thing as:

message Votecast {
  required string cid = 1 [(length) = 20];
  required int32 vote = 2 [(length) = 2];
  required int64 timestamp = 3;
}

The goal is to (eventually) have all communities use this.

lfdversluis · 2016-05-13T10:52:04Z

And performance wise? Your protobuf wrapper really looks useful.

synctext · 2016-05-13T13:47:19Z

Solid progress.
Should really help with refactoring and moving towards a trust-based Dispersy.
Thus: a single walker, based on multi-chain records.

qstokkink · 2016-05-13T19:54:47Z

@lfdversluis Captain proto is faster because it actually stores objects serialized in memory. Protocol Buffers does not however, this makes it slower (https://capnproto.org/ has a nice bar plot of the two).

lfdversluis · 2016-05-13T20:29:33Z

yeah the infinite speed thingy I am familiar with. Was wondering if you had done any comparison and stress testing :D

qstokkink · 2016-05-15T08:21:15Z

After extensive whiteboard work, I think I have a new community design everyone can be happy with. The design both allows backward compatibility and forward compatibility, without having to change Dispersy. Note that this immediately adopts the single walker for all communities. Here is the class overview, which I will explain below:

Backward compatibility/old communities

The old communities will continue to exist (for the time being), but will exist solely to forward messages to and from the new community. This allows for phasing out the old communities without loss of data. The new CommunityManager will exist alongside these old communites, behaving like any other Dispersy community.

The CommunityManager

This will serve as the mediator between all of the new communities and Dispersy. It will handle sharing/sending all Protocol Buffers message definitions from the new communities. The advantage of this single-community-in-the-middle behavior is that (1) only a single walker is used, (2) all new communities would use a getCandidates() function which can be easily hooked up to the multi-chain trust system, working with, instead of working around, dispersy_yield_verified_candidates() and (3) this would allow (god forbid) Dispersy to be phased out, should this ever be desired.

New communities

New communities will no longer have to deal with Dispersy directly. Instead of conversion.py byte packing, payload.py definitions and manual integrity checks they use Protocol Buffers definitions. Consequently this will get rid of about 80% of the community code AND make it more readable.

If anyone has any critiques, questions and/or feedback: please share them.

lfdversluis · 2016-05-15T13:37:37Z

I am already quite convinced about your protobuf wrapper, although I am not sure how much time we gain using it (I know it is faster than struct.pack which we use in all conversion.py files now).
This plan sounds good!

If we decide to take this approach, I would be happy to make the anon-tunnels use this system, but to what extent do you think this will require rewriting? I cannot spend months on the tunnels, so an indication would be nice. If @synctext likes this approach and gives the green light to take this path, then I hope you can explain us all your plan in more detail :)

qstokkink · 2016-05-15T14:54:45Z

@lfdversluis this can work alongside 'old' channels so there is no need to immediately switch. To answer your question: assuming the CommunityManager exposes all of the required functionality correctly you could probably switch the community code itself in a day (+- 6 hours). However changing all of the unit tests as well, getting the code peer reviewed and running into unexpected issues will make the process probably take 2 weeks.

synctext · 2016-05-15T17:34:30Z

epic weekend work!

Please note that backwards compatibility is not needed. If we can release a new Tribler with a fresh, lighter, and faster AllChannel, that is all OK. We reset all the votes. Another release could break and upgrade search, tunnel, etc.
To upgrade to a single MultiChain based walker we need different communities at some point.

If we can avoid breakage of compatibility with little work, that's obviously preferred.

devos50 · 2016-05-16T09:55:29Z

@qstokkink we currently have no dedicated unit tests for each individual community. Since the (big fat) wx tests will be removed soon, the coverage in the Tribler/community module will drop tremendously. I think the refactoring of the communities module is a good opportunity to write new unit tests.

I don't think I really understood the idea of the NewCommunity vs OldCommunity. Can you explain the advantage of using the OldCommunity as 'intermediate' class (only used for message passing) a bit more?

Also, are you planning to refactor the (old) communities one by one or are we going to change all communities to adopt your new design immediately?

Other than that, I like the design and I definitely look forward to more stable and easy-to-use communities 👍

qstokkink · 2016-05-16T16:31:15Z

@devos50 Not having any unit tests does definitely speed up the adoption process of this new scheme. Furthermore, writing new tests should be a lot easier now.

The old communities are for backward compatibility, such that in the transition period between Tribler version switches, communities do not get torn in two. This would happen because of the switch in wire protocol, which would make it impossible for new versions to enter an old version's community and vice versa. By keeping the support for the old protocol for a bit, you can perform the switch between old communities and new communities more gracefully.

The added benefit of being able to cope with the old communities, without breaking the new ones, is that you can indeed switch over the communities one by one. On the other hand, it might make more sense to handle the port in a single pull request. I am not sure what the best approach would be.

qstokkink · 2016-05-18T14:35:01Z

Alright, I finished compiling the list of current Tribler wire-format messages, containing data/member/field types and aliases. Just in case anyone wants to know what a particular message looks like on the wire right now. This will be the base for the Protocol Buffers definitions, such that transitioning will be as painless as possible.

One particular thing that caught my eye, is that some communities are overwriting the introduction request and response. This will have to change if (or when) a single walker is used in Tribler.

EDIT: I finished porting the AllChannel messages, here is the real-world example of how the Serializer would work.

EDIT 2: All of the .proto definitions have been finished (see https://github.com/qstokkink/TriblerProtobufSerialization/tree/triblermessages). Moving to integration with Tribler and porting communities.

qstokkink · 2016-05-27T08:18:18Z

Something threw a wrench in the works, (very likely) preventing a pull request from being available this monday: the communities use some of the Dispersy routing information in their logic. Out of the 37 header fields, 8 are currently being used inside the communities , 1 is deprecated due to the switch to Protocol Buffers and 1 is a duplicate field. I expect this to delay the refactoring by 1 or 2 days.

EDIT: Actually, here is the new base community class. This is as good as it will get without performing some major refactoring inside the Dispersy project and the Tribler communities. We should discuss details and where the code should exist next monday.

qstokkink · 2016-05-31T16:22:33Z

synctext · 2016-06-13T13:07:39Z

@whirm How do you want to review this? One round of comments this week or One Giant PR when done? devel...qstokkink:protobufserialization_rc

synctext · 2016-09-15T09:17:28Z

Thesis material:

Understanding performance and emergent behavior using Visual Dispersy (correctness)
Improving performance and community coding in Dispersy using Protocol buffers (general performance)
Exploiting multi-core architectures for Dispersy speedups (multi-core performance)

Together:
Multi-core architecture for anonymous Internet streaming

First priority: PooledTunnelCommunity stable 👏

synctext · 2016-09-26T13:29:46Z

Thesis storyline: dispersy is just a use-case, now fast & usable.
why is this not just a case of (boring, not beyond state-of-the-art) performance engineering?

48 cores == scalability ?
Test on our Delft relay server, test the performance of Leaseweb exit nodes.

Key target for thesis final experiment: 1 anonymous file download on a 16 or 48 core machine.
Graph with 1,2,4, 8,16 core scalability and speedup of anonymous download speed.

synctext · 2017-11-15T11:57:02Z

@egbertbouman
magnet-based channels
Scale to 40k magnet links (roughly 250 bytes each) per channel == 10MByte.
Add/Remove magnet links, Keyword search link (search inside swarm with included files; thus we are required to keep LevelDB), fixe congestion: max download speed of metadata, at link speed, Full sync, and signature by channel owner. Remove playlist feature.
Current "bug", ignore trackers in old .torrent based channel (outside core).

synctext · 2018-02-26T11:00:36Z

Issues moved to: #1150 (comment)

synctext assigned qstokkink Apr 19, 2016

synctext added the type: MSc Thesis Work label Apr 19, 2016

synctext added this to the Backlog milestone Apr 19, 2016

lfdversluis mentioned this issue May 16, 2016

Maybe Tribler v6.5.0 should be called v7.0.0? #1703

Closed

qstokkink mentioned this issue Jun 20, 2016

NEEDS VOLUNTEER: Communities ported to Protocol Buffers #2314

Closed

synctext mentioned this issue Oct 31, 2016

slow anonymous downloads: Crypto CPU bottleneck #1882

Closed

synctext assigned egbertbouman Nov 15, 2017

synctext mentioned this issue Feb 26, 2018

Abandoning collected torrent files #1150

Closed

synctext closed this as completed Feb 26, 2018

qstokkink mentioned this issue Dec 19, 2018

Metadata entry size limit in UI #4060

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalability of Dispersy channels #2106

Scalability of Dispersy channels #2106

synctext commented Apr 19, 2016

qstokkink commented Apr 27, 2016

synctext commented Apr 27, 2016

qstokkink commented Apr 28, 2016

qstokkink commented May 7, 2016

synctext commented May 7, 2016 •

edited

Loading

synctext commented May 7, 2016

qstokkink commented May 8, 2016

synctext commented May 8, 2016

qstokkink commented May 9, 2016 •

edited

Loading

qstokkink commented May 13, 2016

lfdversluis commented May 13, 2016

qstokkink commented May 13, 2016

lfdversluis commented May 13, 2016

synctext commented May 13, 2016

qstokkink commented May 13, 2016

lfdversluis commented May 13, 2016

qstokkink commented May 15, 2016

lfdversluis commented May 15, 2016

qstokkink commented May 15, 2016

synctext commented May 15, 2016

devos50 commented May 16, 2016

qstokkink commented May 16, 2016 •

edited

Loading

qstokkink commented May 18, 2016 •

edited

Loading

qstokkink commented May 27, 2016 •

edited

Loading

qstokkink commented May 31, 2016 •

edited

Loading

synctext commented Jun 13, 2016

synctext commented Sep 15, 2016 •

edited

Loading

synctext commented Sep 26, 2016 •

edited

Loading

synctext commented Nov 15, 2017

synctext commented Feb 26, 2018

Scalability of Dispersy channels #2106

Scalability of Dispersy channels #2106

Comments

synctext commented Apr 19, 2016

qstokkink commented Apr 27, 2016

synctext commented Apr 27, 2016

qstokkink commented Apr 28, 2016

qstokkink commented May 7, 2016

synctext commented May 7, 2016 • edited Loading

synctext commented May 7, 2016

qstokkink commented May 8, 2016

synctext commented May 8, 2016

qstokkink commented May 9, 2016 • edited Loading

qstokkink commented May 13, 2016

lfdversluis commented May 13, 2016

qstokkink commented May 13, 2016

lfdversluis commented May 13, 2016

synctext commented May 13, 2016

qstokkink commented May 13, 2016

lfdversluis commented May 13, 2016

qstokkink commented May 15, 2016

Backward compatibility/old communities

The CommunityManager

New communities

lfdversluis commented May 15, 2016

qstokkink commented May 15, 2016

synctext commented May 15, 2016

devos50 commented May 16, 2016

qstokkink commented May 16, 2016 • edited Loading

qstokkink commented May 18, 2016 • edited Loading

qstokkink commented May 27, 2016 • edited Loading

qstokkink commented May 31, 2016 • edited Loading

synctext commented Jun 13, 2016

synctext commented Sep 15, 2016 • edited Loading

synctext commented Sep 26, 2016 • edited Loading

synctext commented Nov 15, 2017

synctext commented Feb 26, 2018

synctext commented May 7, 2016 •

edited

Loading

qstokkink commented May 9, 2016 •

edited

Loading

qstokkink commented May 16, 2016 •

edited

Loading

qstokkink commented May 18, 2016 •

edited

Loading

qstokkink commented May 27, 2016 •

edited

Loading

qstokkink commented May 31, 2016 •

edited

Loading

synctext commented Sep 15, 2016 •

edited

Loading

synctext commented Sep 26, 2016 •

edited

Loading