Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossiping results #2

Open
jannikluhn opened this issue Sep 11, 2018 · 2 comments
Open

Gossiping results #2

jannikluhn opened this issue Sep 11, 2018 · 2 comments

Comments

@jannikluhn
Copy link
Owner

Loose collection of results and their interpretation (in part somewhat speculative), to be continued.

Total throughput is limited by worst nodes in the network

The network breaks down when nodes have a load of 100%. However, when increasing network throughput, not all nodes reach this threshold at the same time (see diagram), and even when the first nodes have reached it, a large fraction of the network still has a lot of capacity. Thus, the nodes with the lowest capacity that reach the threshold first limit the total throughput of the network.

https://github.com/jannikluhn/sharding-netsim/blob/master/runs/21_gossipsub_shard_block_size/occ.pdf

Syncing speed is limited by downlink of syncing nodes, not the network

Conversely, even when the network is fully utilized, there is still a lot of bandwidth available (see same diagram below) to serve blocks to new nodes joining the network (including attesters checking availability of recent blocks).

https://github.com/jannikluhn/sharding-netsim/blob/master/runs/21_gossipsub_shard_block_size/occ.pdf

Lower block time can be traded (almost) 1:1 with smaller block size

To increase block size while keeping network throughput constant, the block time needs to be increased. In theory, this should not affect the load of all nodes or the propagation time (normalized to the block time) at all. The simulations mostly confirm that, but show a slight increase of load for larger blocks (see diagram) and a slight decrease of "normalized propagation time" (not shown here). This is probably due to the increased transmission time of single gossip packets which makes nodes in some cases assume a lost packet. This prompts them to requesting the packet unnecessarily, leading to higher load but faster propagation times.

https://github.com/jannikluhn/sharding-netsim/blob/master/runs/20_gossipsub_shard_block_time_and_size/occ.pdf
https://github.com/jannikluhn/sharding-netsim/blob/master/runs/20_gossipsub_shard_block_time_and_size/hists.pdf

Crashing nodes

Node crashes lead to failing transmissions and larger propagation times. The discovery protocol should be capable of healing broken connections quickly.

https://github.com/jannikluhn/sharding-netsim/blob/master/runs/17_gossipsub_crash/hists.pdf

Maximum block size is between 3 and 4 MB at 8s block time

For the assumed bandwidth distribution, the network works at 3, but fails at 4MB. There should probably be some safety margin though.

https://github.com/jannikluhn/sharding-netsim/blob/master/runs/21_gossipsub_shard_block_size/occ.pdf

Number of peers does not matter much

Increasing the number of peers each node has should decrease the propagation time, especially at the beginning of packet transmissions. However, the simulations show almost no difference between (on average) 3.6 and 12.5 peers (for one particular set of parameters). Having more than that leads to network congestion.

The reason for this is that nodes send packets one after another (to utilize bandwidth most effectively). Therefore, adding another peer only makes a difference once all original peers have already been informed. At this point though, most nodes in the network have in most cases been informed already, so the additional peer adds almost no value.

https://github.com/jannikluhn/sharding-netsim/blob/master/runs/14_gossipsub_shard_peers/hists.pdf

PushPull does not improve propagation time considerably compared to GossipSub

As shown in the diagram below, propagation during the "pull" period is rather slow. This might be fixable with choosing better parameters (e.g. shorter request intervals) or minor changes to the protocol (e.g. adding a "Don't have" message). However, I doubt the speedup would justify the higher protocol complexity compared to GossipSub.

Figure

Episub: Propagation time too large

(Todo: upload diagram)

@whyrusleeping
Copy link

These results are really cool, i love the detail.

cc @bigs @vyzo @mgoelzer

@vyzo
Copy link

vyzo commented Sep 13, 2018

Note that implementing smarter IWANT requests is something that we recently discussed with @whyrusleeping.
Basically keep some memory of requested messages so that we don't request them multiple times.
This should eliminate the observed duplicate transmissions, albeit at the cost of some complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants