Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossip shares redundant data between nodes #1619

Closed
aeyakovenko opened this issue Oct 26, 2018 · 1 comment
Closed

Gossip shares redundant data between nodes #1619

aeyakovenko opened this issue Oct 26, 2018 · 1 comment

Comments

@aeyakovenko
Copy link
Member

aeyakovenko commented Oct 26, 2018

PR #1546

Problem

Any request for updates to a random node will likely result in receiving mostly redundant information. This problem is will severely limit how large the network can grow. Even using a version filter results in redundant data sent. If every node votes, and all the votes are gossiped, it is expected that N updates are received by each node. A simple version filter will request all N from each node despite that both nodes have the full dataset.

Proposed Solution

Use bloom filters to minimize the amount data transferred during gossip.

Challenges

Bloom filters can only be constructed to give false positives if the item is in the set, not false negatives. For gossip this means that the data returned by a node may be incomplete, and using a "version" filter along with a bloom filter could result with older updates that are never shared to the network. The property that gossip depends on is that there is some non-zero probability of a message eventually reaching all the nodes.

The basic idea is to construct a bloom filter of the local nodes set of keys, and request the remote node to send any data that is missing from the set. To work around the problem that some false positives will be dropped the local node uses a unique bloom filter hash function when constructing the bloom filter for each request. Thus after N requests the probability of a false positive being dropped should be (probability false positive)**N

Our data is organized as <Pubkey,Value>, and #1546 proposes effectively a <Pubkey, Label, Value> table. It's not obvious how to efficiently diff the two sets. The simplest approach is to compute a collision resistant ValueHash regardless of the Pubkey or Label association with SHA256. Then each node can diff its current data set with the remote. This may cause some updated or recently discarded values to be re-transmitted. A possible solution is to keep the purged set around and also add it to the bloom filter.

Design
  1. compute a ValueHash of each value in the table
  2. select remote gossip node
  3. find all the ValueHash's in the local table
  4. compute optimal bloom filter over the ValueHashes. (https://hur.st/bloomfilter/?n=1000&p=&m=4096&k=4)
  5. send the filter to the remote node
  6. remote scans its table
  7. for each ValueHash in the local table, if the value is in the filter, drop it
Unique salted bloom filter hash functions per request

ValueHashes are already random. To generate a fast and unique bloom filter hash function pick bytes out of the ValueHash with a randomly generated seed. The random function needs to be constant across the network, and the seed needs to be shared with the request.

Purge

This is a problem in the current implementation as well. Any retransmit of a purged entry will re-add it to the network.

To prevent purged values from being retransmitted, their ValueHash's could be kept in a local cache for a longer period of time after they have been purged and used to initialize the bloom filter.

Future Work

Long lived nodes can start filtering entries based on insert time as well. When constructing a filter between nodes, the local node can take into account how long both the local and the remote have been on the network and create it's filter for only recently added items. This may only be necessary for very large networks.

@garious garious added this to the The Future! milestone Oct 26, 2018
@aeyakovenko
Copy link
Member Author

closed in #1546

@garious garious modified the milestones: The Future!, v0.11 Tabletops Dec 9, 2018
Lichtso pushed a commit to Lichtso/solana that referenced this issue Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants