Gossiping results #2

jannikluhn · 2018-09-11T16:40:20Z

Loose collection of results and their interpretation (in part somewhat speculative), to be continued.

Total throughput is limited by worst nodes in the network

The network breaks down when nodes have a load of 100%. However, when increasing network throughput, not all nodes reach this threshold at the same time (see diagram), and even when the first nodes have reached it, a large fraction of the network still has a lot of capacity. Thus, the nodes with the lowest capacity that reach the threshold first limit the total throughput of the network.

https://github.com/jannikluhn/sharding-netsim/blob/master/runs/21_gossipsub_shard_block_size/occ.pdf

Syncing speed is limited by downlink of syncing nodes, not the network

Conversely, even when the network is fully utilized, there is still a lot of bandwidth available (see same diagram below) to serve blocks to new nodes joining the network (including attesters checking availability of recent blocks).

https://github.com/jannikluhn/sharding-netsim/blob/master/runs/21_gossipsub_shard_block_size/occ.pdf

Lower block time can be traded (almost) 1:1 with smaller block size

To increase block size while keeping network throughput constant, the block time needs to be increased. In theory, this should not affect the load of all nodes or the propagation time (normalized to the block time) at all. The simulations mostly confirm that, but show a slight increase of load for larger blocks (see diagram) and a slight decrease of "normalized propagation time" (not shown here). This is probably due to the increased transmission time of single gossip packets which makes nodes in some cases assume a lost packet. This prompts them to requesting the packet unnecessarily, leading to higher load but faster propagation times.

https://github.com/jannikluhn/sharding-netsim/blob/master/runs/20_gossipsub_shard_block_time_and_size/occ.pdf
https://github.com/jannikluhn/sharding-netsim/blob/master/runs/20_gossipsub_shard_block_time_and_size/hists.pdf

Crashing nodes

Node crashes lead to failing transmissions and larger propagation times. The discovery protocol should be capable of healing broken connections quickly.

https://github.com/jannikluhn/sharding-netsim/blob/master/runs/17_gossipsub_crash/hists.pdf

Maximum block size is between 3 and 4 MB at 8s block time

For the assumed bandwidth distribution, the network works at 3, but fails at 4MB. There should probably be some safety margin though.

https://github.com/jannikluhn/sharding-netsim/blob/master/runs/21_gossipsub_shard_block_size/occ.pdf

Number of peers does not matter much

Increasing the number of peers each node has should decrease the propagation time, especially at the beginning of packet transmissions. However, the simulations show almost no difference between (on average) 3.6 and 12.5 peers (for one particular set of parameters). Having more than that leads to network congestion.

The reason for this is that nodes send packets one after another (to utilize bandwidth most effectively). Therefore, adding another peer only makes a difference once all original peers have already been informed. At this point though, most nodes in the network have in most cases been informed already, so the additional peer adds almost no value.

https://github.com/jannikluhn/sharding-netsim/blob/master/runs/14_gossipsub_shard_peers/hists.pdf

PushPull does not improve propagation time considerably compared to GossipSub

As shown in the diagram below, propagation during the "pull" period is rather slow. This might be fixable with choosing better parameters (e.g. shorter request intervals) or minor changes to the protocol (e.g. adding a "Don't have" message). However, I doubt the speedup would justify the higher protocol complexity compared to GossipSub.

Figure

Episub: Propagation time too large

(Todo: upload diagram)

whyrusleeping · 2018-09-13T19:56:25Z

These results are really cool, i love the detail.

cc @bigs @vyzo @mgoelzer

vyzo · 2018-09-13T20:05:45Z

Note that implementing smarter IWANT requests is something that we recently discussed with @whyrusleeping.
Basically keep some memory of requested messages so that we don't request them multiple times.
This should eliminate the observed duplicate transmissions, albeit at the cost of some complexity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gossiping results #2

Gossiping results #2

jannikluhn commented Sep 11, 2018

whyrusleeping commented Sep 13, 2018

vyzo commented Sep 13, 2018

Gossiping results #2

Gossiping results #2

Comments

jannikluhn commented Sep 11, 2018

Total throughput is limited by worst nodes in the network

Syncing speed is limited by downlink of syncing nodes, not the network

Lower block time can be traded (almost) 1:1 with smaller block size

Crashing nodes

Maximum block size is between 3 and 4 MB at 8s block time

Number of peers does not matter much

PushPull does not improve propagation time considerably compared to GossipSub

Episub: Propagation time too large

whyrusleeping commented Sep 13, 2018

vyzo commented Sep 13, 2018