Bloom filter gossip #266

joshua-kim · 2023-06-17T00:51:39Z

Why this should be merged

Performance optimization to lower end-to-end issuance to acceptance time for a transaction.

How this works

Uses a pull-based approach to gossip instead of a push-based approach. See testing for more details.

How this was tested

Ran some simulations, results are as follows:

With the existing push-based gossip, we see that as more and more nodes learn the new transaction, the rate at which nodes learn the transaction slows down exponentially.

At first, few nodes know of the transactions, but only one node knows about it so there's not many people gossiping it. Once it hits the 50% mark, the transaction is being gossiped at maximum velocity, since we're at a middle-point where both half the network is willing to gossip the transaction, and the other half is willing to continue forwarding it.

As a majority of the network learns of the transaction, it starts slowing down again. This is because nodes sample peers to gossip to, but the peers will only forward the gossiped transaction if it hasn't seen it already. Over time, peers effectively "absorb" the gossip and don't forward it, so the last few nodes are very unlikely to ever hear of the transaction until a re-gossip cycle kicks in.

With the pull-based approach, we see that the transaction is gossiped exponentially quickly as more and more nodes learn the transaction. This is because we now poll peers, and as more peers learn the transaction, it's more likely that you're going to poll a set of peers that has it.

plugin/evm/block_builder.go

plugin/evm/syncervm_client.go

core/txpool/txpool.go

core/txpool/txpool_test.go

gossip/bloom.go

gossip/mempool.go

gossip/mempool_test.go

gossip/gossip.go

darioush

High level comments:

Improve the gossip to not block on each peer's response.
Remove the wrapper from the bloom filter
Wrap the mempool type instead of modifying it if possible.

Also left some inline comments.

plugin/evm/message/codec.go

plugin/evm/tx_gossip.go

plugin/evm/vm.go

plugin/evm/mempool.go

core/txpool/txpool_test.go

gossip/gossip_test.go

plugin/evm/mempool.go

gossip/mempool.go

peer/mock_client.go

plugin/evm/message/message.go

gossip/bloom.go

plugin/evm/message/handler.go

plugin/evm/vm.go

gossip/gossip_test.go

plugin/evm/gossip_mempool.go

plugin/evm/gossiper_atomic_gossiping_test.go

darioush · 2023-07-12T01:42:06Z

plugin/evm/gossiper_atomic_gossiping_test.go

@@ -56,6 +62,8 @@ func TestMempoolAtmTxsIssueTxAndGossiping(t *testing.T) {
 		return nil
 	}

+	assert.NoError(vm.SetState(ctx, snow.NormalOp))


nit: prefer using require over assert for new code.

Just using assert here because the rest of this test uses assert, I think we can probably pass over these tests to use require in a future pr

darioush · 2023-07-12T01:49:36Z

plugin/evm/mempool.go

 	if _, exists := m.issuedTxs[txID]; exists {
-		return nil
+		return fmt.Errorf("%w: %s", errTxAlreadyIssued, txID)
 	}
 	if _, exists := m.currentTxs[txID]; exists {
-		return nil
+		return fmt.Errorf("%w: %s", errTxAlreadyIssued, txID)
 	}
 	if _, exists := m.txHeap.Get(txID); exists {
-		return nil
+		return fmt.Errorf("%w: %s", errTxAlreadyIssued, txID)


why are we changing this behavior?

I believe we want something to signal that tx was already issued and we don't need to refresh the bloom filter.
I'd add a boolean to to return boolean + error, instead of changing the error semantics

abi87

@joshua-kim I am still reviewing this but I have two general questions:

Shouldn't the bloom filter mechanism be activated only following an hard fork? What would happen with current code if it was deployed in a network of nodes not fuly sopporting bloom gossiping?
Is this bloom gossip mechanism designed to be portable to avalanchego mempool? If so, are there PRs on avalanchego side? Happy to help there if I can

hexfusion

Mostly notes on first pass, really cool work!

gossip/bloom_test.go

gossip/gossip_test.go

peer/network.go

plugin/evm/gossip_mempool.go

plugin/evm/vm.go

plugin/evm/gossip_mempool.go

plugin/evm/gossiper.go

richardpringle

Take my comment with a grain of salt, it's honestly more for me than it is for you!
@joshua-kim, feel free to resolve the comment if you don't agree, but I would love it if you could explain why.

And if you do agree... you're welcome 😉 .

gossip/bloom.go

joshua-kim · 2023-07-12T19:53:31Z

Shouldn't the bloom filter mechanism be activated only following an hard fork? What would happen with current code if it was deployed in a network of nodes not fuly sopporting bloom gossiping?

We can deploy this without a hard fork, since this introduces all the new gossip over new message types. Nodes without the upgrade will just drop the new messages.

Eventually we will want to deprecate the existing push-based gossip, which will need to be done on a hard-fork.

Is this bloom gossip mechanism designed to be portable to avalanchego mempool? If so, are there PRs on avalanchego side? Happy to help there if I can

Yeah this should be portable to X/P chains (which is why the code is in its own independent gossip package). Once this PR is merged the follow-up will be to refactor the gossip package into avalanchego, and from there introduce it into the P/X chains.

ceyonur

changes overall lgtm, just few comments. I feel like currently our gossip/network/handler related files are little too spread in the code base. It's a bit hard to follow and add new handlers. I wonder if we should collect them together in a package and use subpackage to differentiate from each other.

gossip/bloom.go

gossip/gossip.go

plugin/evm/gossip_mempool.go

gossip/gossip.go

plugin/evm/gossiper.go

aaronbuchwald · 2023-07-17T22:46:37Z

gossip/bloom.go

+// ResetBloomFilterIfNeeded resets a bloom filter if it breaches a ratio of
+// filled elements. Returns true if the bloom filter was reset.
+func ResetBloomFilterIfNeeded(
+	bloomFilter **bloomfilter.Filter,


Do we need a pointer to a pointer here?

I wonder if we should just have a NeedReset(...) bool function and recreate the bloom filter anew in the called?

Yeah so this is a funny situation, we either pass a double pointer here, or we pass a single pointer here but this causes an issue since when we overwrite this pointer's value like so:

fresh, _ := bloomfilter.New((*bloomFilter).M(), (*bloomFilter).K()) *bloomFilter = *fresh // copying a mutex here

it flags the linter as us copying a mutex (I think this is technically safe since it's gonna be unlocked and no one else can modify it). I felt like the double pointer was less evil than this.

I wonder if we should just have a NeedReset(...) bool function and recreate the bloom filter anew in the called?

Yeah an alternative would be just to remove this function entirely, it's only used in two places and it's a small snippet of code so the duplication cost is really not bad.

Ya I'd prefer to either remove this entirely or update this to a function that does not deal with a double pointer. Even though this seems correct, it still seems overly complicated. I think another simpler alternative would be to have a function that returns nil if no reset is needed and returns a non-nil pointer to a new bloom filter if it is needed (or a boolean if that seems cleaner)

plugin/evm/message/message.go

gossip/mempool.go

StephenButtolph

did a pass of the gossip package

gossip/bloom.go

gossip/gossip.go

gossip/handler.go

gossip/types.go

gossip/gossip.go

StephenButtolph · 2023-08-31T01:45:21Z

gossip/gossip.go

+// GossipableAny exists to help create non-nil pointers to a concrete Gossipable
+type GossipableAny[T any] interface {
+	*T
+	Gossipable
+}


This is such a wild hack... afaict it does work tho... Is there a ref we can link to for this? Essentially it works because:

The type must be a pointer to something

The type must implement the interface.

A pointer to an interface does not implement the interface.

Yeah you summed up how it works. I can add a ref here.

It's basically a decision of whether we prefer this generics black magic (add complexity to the package) vs if we want to add another interface the caller has to return a pointer to their type (add complexity to the caller). I prefer the former personally but can understand if we feel like this is too evil.

Alternative:

type FooFactory struct{}{} func (FooFactory) MakePointer() *Foo { return &Foo{} }

peer/network.go

StephenButtolph · 2023-08-31T01:50:25Z

plugin/evm/gossip_mempool.go

+type GossipAtomicTx struct {
+	Tx *Tx `serialize:"true"`
+}
+
+func (tx *GossipAtomicTx) GetHash() gossip.Hash {
+	id := tx.Tx.ID()
+	return gossip.HashFromBytes(id[:])
+}
+
+func (tx *GossipAtomicTx) Marshal() ([]byte, error) {
+	return Codec.Marshal(message.Version, tx)
+}
+
+func (tx *GossipAtomicTx) Unmarshal(bytes []byte) error {
+	_, err := Codec.Unmarshal(bytes, tx)
+	return err
+}


Would it make sense to implement these functions directly on the *Tx type? I suppose it's pretty nice that we define both messages here.

My thoughts are that this is all gossip-specific code and we should never be using GetHash, Marshal, or Unmarshal outside of the context of our gossip package/implementations so I don't want people to ever depend on it.

plugin/evm/gossip_mempool.go

StephenButtolph · 2023-08-31T02:02:05Z

plugin/evm/gossip_mempool.go

+	g.lock.RLock()
+	defer g.lock.RUnlock()
+
+	return g.bloom


This seems racy - this bloom can be reset in the goroutine running Subscribe - so the caller of this function must either have access to the lock that is local to this struct - or be otherwise guaranteed that Subscribe isn't running.

Or I'm missing something

No this is racy as you correctly mentioned since we return a reference which can be modified by the Subscribe goroutine once the lock is released. I think this is fixed if we deep-copy here.

StephenButtolph · 2023-08-31T02:03:40Z

plugin/evm/mempool.go

+	m.lock.RLock()
+	defer m.lock.RUnlock()
+
+	return m.bloom


Similarly to the other comment - this seems racy.

StephenButtolph · 2023-08-31T02:04:34Z

plugin/evm/mempool.go

+	for _, item := range m.txHeap.maxHeap.items {
+		gossipTx := &GossipAtomicTx{Tx: item.tx}
+		if !filter(gossipTx) {
+			continue
+		}
+		gossipTxs = append(gossipTxs, gossipTx)
+	}


Also feel like this should have some form of maximum size

plugin/evm/vm.go

Co-authored-by: Stephen Buttolph <[email protected]> Signed-off-by: Joshua Kim <[email protected]>

StephenButtolph

small nit

peer/network.go

Co-authored-by: Stephen Buttolph <[email protected]> Signed-off-by: Joshua Kim <[email protected]>

peer/network.go

joshua-kim · 2023-09-05T15:32:41Z

Diff is getting huge, we broke up this PR's diff to make it more reviewable to get this over the finish line:

Coreth PR to add router handling + UT: #316
Avalanchego PR to add gossip package: ava-labs/avalanchego#1958
(Draft) Coreth PR to use gossip package: #318, blocked on (2)

t-anyu · 2023-09-25T01:56:25Z

@joshua-kim as there are nodes running older versions, will they get throttled due to droppedRequests counter or any other mechanisms?

joshua-kim · 2023-09-26T15:04:17Z

@t-anyu Nodes running older versions when receiving an incoming new gossip request will drop them because they'll fail to unmarshal against the legacy codec. They will still get transactions because all nodes will support the legacy pushGossiper implementation until the next network upgrade.

a1k0n · 2023-09-26T15:48:34Z

Nodes running older versions when receiving an incoming new gossip request will drop them because they'll fail to unmarshal against the legacy codec. They will still get transactions because all nodes will support the legacy pushGossiper implementation until the next network upgrade.

@joshua-kim @StephenButtolph I hope the team isn't planning to remove push gossip entirely.

StephenButtolph · 2023-09-26T15:53:56Z

@a1k0n

I hope the team isn't planning to remove push gossip entirely.

We will be moving push gossip to the new p2p SDK. We will not remove push gossip entirely - as that would significantly increase the tx propagation times.

a1k0n · 2023-09-26T15:57:32Z

We will be moving push gossip to the new p2p SDK. We will not remove push gossip entirely - as that would significantly increase the tx propagation times.

Ah, I see. Thanks!

joshua-kim requested review from aaronbuchwald, darioush and ceyonur as code owners June 17, 2023 00:51

joshua-kim mentioned this pull request Jun 17, 2023

Bloom filter gossip #264

Closed