You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loose collection of results and their interpretation (in part somewhat speculative), to be continued.
Total throughput is limited by worst nodes in the network
The network breaks down when nodes have a load of 100%. However, when increasing network throughput, not all nodes reach this threshold at the same time (see diagram), and even when the first nodes have reached it, a large fraction of the network still has a lot of capacity. Thus, the nodes with the lowest capacity that reach the threshold first limit the total throughput of the network.
Syncing speed is limited by downlink of syncing nodes, not the network
Conversely, even when the network is fully utilized, there is still a lot of bandwidth available (see same diagram below) to serve blocks to new nodes joining the network (including attesters checking availability of recent blocks).
Lower block time can be traded (almost) 1:1 with smaller block size
To increase block size while keeping network throughput constant, the block time needs to be increased. In theory, this should not affect the load of all nodes or the propagation time (normalized to the block time) at all. The simulations mostly confirm that, but show a slight increase of load for larger blocks (see diagram) and a slight decrease of "normalized propagation time" (not shown here). This is probably due to the increased transmission time of single gossip packets which makes nodes in some cases assume a lost packet. This prompts them to requesting the packet unnecessarily, leading to higher load but faster propagation times.
Node crashes lead to failing transmissions and larger propagation times. The discovery protocol should be capable of healing broken connections quickly.
Increasing the number of peers each node has should decrease the propagation time, especially at the beginning of packet transmissions. However, the simulations show almost no difference between (on average) 3.6 and 12.5 peers (for one particular set of parameters). Having more than that leads to network congestion.
The reason for this is that nodes send packets one after another (to utilize bandwidth most effectively). Therefore, adding another peer only makes a difference once all original peers have already been informed. At this point though, most nodes in the network have in most cases been informed already, so the additional peer adds almost no value.
PushPull does not improve propagation time considerably compared to GossipSub
As shown in the diagram below, propagation during the "pull" period is rather slow. This might be fixable with choosing better parameters (e.g. shorter request intervals) or minor changes to the protocol (e.g. adding a "Don't have" message). However, I doubt the speedup would justify the higher protocol complexity compared to GossipSub.
Note that implementing smarter IWANT requests is something that we recently discussed with @whyrusleeping.
Basically keep some memory of requested messages so that we don't request them multiple times.
This should eliminate the observed duplicate transmissions, albeit at the cost of some complexity.
Loose collection of results and their interpretation (in part somewhat speculative), to be continued.
Total throughput is limited by worst nodes in the network
The network breaks down when nodes have a load of 100%. However, when increasing network throughput, not all nodes reach this threshold at the same time (see diagram), and even when the first nodes have reached it, a large fraction of the network still has a lot of capacity. Thus, the nodes with the lowest capacity that reach the threshold first limit the total throughput of the network.
https://github.com/jannikluhn/sharding-netsim/blob/master/runs/21_gossipsub_shard_block_size/occ.pdf
Syncing speed is limited by downlink of syncing nodes, not the network
Conversely, even when the network is fully utilized, there is still a lot of bandwidth available (see same diagram below) to serve blocks to new nodes joining the network (including attesters checking availability of recent blocks).
https://github.com/jannikluhn/sharding-netsim/blob/master/runs/21_gossipsub_shard_block_size/occ.pdf
Lower block time can be traded (almost) 1:1 with smaller block size
To increase block size while keeping network throughput constant, the block time needs to be increased. In theory, this should not affect the load of all nodes or the propagation time (normalized to the block time) at all. The simulations mostly confirm that, but show a slight increase of load for larger blocks (see diagram) and a slight decrease of "normalized propagation time" (not shown here). This is probably due to the increased transmission time of single gossip packets which makes nodes in some cases assume a lost packet. This prompts them to requesting the packet unnecessarily, leading to higher load but faster propagation times.
https://github.com/jannikluhn/sharding-netsim/blob/master/runs/20_gossipsub_shard_block_time_and_size/occ.pdf
https://github.com/jannikluhn/sharding-netsim/blob/master/runs/20_gossipsub_shard_block_time_and_size/hists.pdf
Crashing nodes
Node crashes lead to failing transmissions and larger propagation times. The discovery protocol should be capable of healing broken connections quickly.
https://github.com/jannikluhn/sharding-netsim/blob/master/runs/17_gossipsub_crash/hists.pdf
Maximum block size is between 3 and 4 MB at 8s block time
For the assumed bandwidth distribution, the network works at 3, but fails at 4MB. There should probably be some safety margin though.
https://github.com/jannikluhn/sharding-netsim/blob/master/runs/21_gossipsub_shard_block_size/occ.pdf
Number of peers does not matter much
Increasing the number of peers each node has should decrease the propagation time, especially at the beginning of packet transmissions. However, the simulations show almost no difference between (on average) 3.6 and 12.5 peers (for one particular set of parameters). Having more than that leads to network congestion.
The reason for this is that nodes send packets one after another (to utilize bandwidth most effectively). Therefore, adding another peer only makes a difference once all original peers have already been informed. At this point though, most nodes in the network have in most cases been informed already, so the additional peer adds almost no value.
https://github.com/jannikluhn/sharding-netsim/blob/master/runs/14_gossipsub_shard_peers/hists.pdf
PushPull does not improve propagation time considerably compared to GossipSub
As shown in the diagram below, propagation during the "pull" period is rather slow. This might be fixable with choosing better parameters (e.g. shorter request intervals) or minor changes to the protocol (e.g. adding a "Don't have" message). However, I doubt the speedup would justify the higher protocol complexity compared to GossipSub.
Figure
Episub: Propagation time too large
(Todo: upload diagram)
The text was updated successfully, but these errors were encountered: