You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Any request for updates to a random node will likely result in receiving mostly redundant information. This problem is will severely limit how large the network can grow. Even using a version filter results in redundant data sent. If every node votes, and all the votes are gossiped, it is expected that N updates are received by each node. A simple version filter will request all N from each node despite that both nodes have the full dataset.
Proposed Solution
Use bloom filters to minimize the amount data transferred during gossip.
Bloom filters can only be constructed to give false positives if the item is in the set, not false negatives. For gossip this means that the data returned by a node may be incomplete, and using a "version" filter along with a bloom filter could result with older updates that are never shared to the network. The property that gossip depends on is that there is some non-zero probability of a message eventually reaching all the nodes.
The basic idea is to construct a bloom filter of the local nodes set of keys, and request the remote node to send any data that is missing from the set. To work around the problem that some false positives will be dropped the local node uses a unique bloom filter hash function when constructing the bloom filter for each request. Thus after N requests the probability of a false positive being dropped should be (probability false positive)**N
Our data is organized as <Pubkey,Value>, and #1546 proposes effectively a <Pubkey, Label, Value> table. It's not obvious how to efficiently diff the two sets. The simplest approach is to compute a collision resistant ValueHash regardless of the Pubkey or Label association with SHA256. Then each node can diff its current data set with the remote. This may cause some updated or recently discarded values to be re-transmitted. A possible solution is to keep the purged set around and also add it to the bloom filter.
for each ValueHash in the local table, if the value is in the filter, drop it
Unique salted bloom filter hash functions per request
ValueHashes are already random. To generate a fast and unique bloom filter hash function pick bytes out of the ValueHash with a randomly generated seed. The random function needs to be constant across the network, and the seed needs to be shared with the request.
Purge
This is a problem in the current implementation as well. Any retransmit of a purged entry will re-add it to the network.
To prevent purged values from being retransmitted, their ValueHash's could be kept in a local cache for a longer period of time after they have been purged and used to initialize the bloom filter.
Future Work
Long lived nodes can start filtering entries based on insert time as well. When constructing a filter between nodes, the local node can take into account how long both the local and the remote have been on the network and create it's filter for only recently added items. This may only be necessary for very large networks.
The text was updated successfully, but these errors were encountered:
PR #1546
Problem
Any request for updates to a random node will likely result in receiving mostly redundant information. This problem is will severely limit how large the network can grow. Even using a version filter results in redundant data sent. If every node votes, and all the votes are gossiped, it is expected that N updates are received by each node. A simple version filter will request all N from each node despite that both nodes have the full dataset.
Proposed Solution
Use bloom filters to minimize the amount data transferred during gossip.
Challenges
Bloom filters can only be constructed to give false positives if the item is in the set, not false negatives. For gossip this means that the data returned by a node may be incomplete, and using a "version" filter along with a bloom filter could result with older updates that are never shared to the network. The property that gossip depends on is that there is some non-zero probability of a message eventually reaching all the nodes.
The basic idea is to construct a bloom filter of the local nodes set of keys, and request the remote node to send any data that is missing from the set. To work around the problem that some false positives will be dropped the local node uses a unique bloom filter hash function when constructing the bloom filter for each request. Thus after N requests the probability of a false positive being dropped should be (probability false positive)**N
Our data is organized as <Pubkey,Value>, and #1546 proposes effectively a <Pubkey, Label, Value> table. It's not obvious how to efficiently diff the two sets. The simplest approach is to compute a collision resistant ValueHash regardless of the Pubkey or Label association with SHA256. Then each node can diff its current data set with the remote. This may cause some updated or recently discarded values to be re-transmitted. A possible solution is to keep the purged set around and also add it to the bloom filter.
Design
Unique salted bloom filter hash functions per request
ValueHashes are already random. To generate a fast and unique bloom filter hash function pick bytes out of the ValueHash with a randomly generated seed. The random function needs to be constant across the network, and the seed needs to be shared with the request.
Purge
This is a problem in the current implementation as well. Any retransmit of a purged entry will re-add it to the network.
To prevent purged values from being retransmitted, their ValueHash's could be kept in a local cache for a longer period of time after they have been purged and used to initialize the bloom filter.
Future Work
Long lived nodes can start filtering entries based on insert time as well. When constructing a filter between nodes, the local node can take into account how long both the local and the remote have been on the network and create it's filter for only recently added items. This may only be necessary for very large networks.
The text was updated successfully, but these errors were encountered: