Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat! content addressable transaction pool #935

Closed
wants to merge 29 commits into from

Conversation

cmwaters
Copy link
Contributor

Closes: #884

@cmwaters cmwaters mentioned this pull request Jan 12, 2023
@Wondertan
Copy link
Member

Wondering how much code here was copied from the existing implementations

@cmwaters
Copy link
Contributor Author

Wondering how much code here was copied from the existing implementations

v2 started as a complete copy of v1 so there's a significant amount although the protocol does make some substantial changes to information flow. For example, you no longer have a go routine for every peer you connect with.

Copy link
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14/41 files but pausing review so I wanted to submit this first batch of comments.

Since the cat mempool code seems inspired by the priority mempool I wonder if we need to address any of the performance issues Sergio alluded to in informalsystems/audit-celestia#36

Comment on lines 28 to 29
// GetList returns the underlying linked-list that backs the LRU cache. Note,
// this should be used for testing purposes only!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] since cache_test.go lives inside the cat package, what are you thoughts on un-exporting this method?

The question is motived by an effort to enforce: "Note, this should be used for testing purposes only!"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

mempool/cat/cache.go Outdated Show resolved Hide resolved
mempool/cat/cache.go Outdated Show resolved Hide resolved
mempool/cat/peers.go Outdated Show resolved Hide resolved
mempool/cat/pool.go Outdated Show resolved Hide resolved
mempool/cat/pool.go Outdated Show resolved Hide resolved
// - We send multiple requests and the first peer eventually responds after the second peer has already
if txmp.IsRejectedTx(key) {
// The peer has sent us a transaction that we have previously marked as invalid. Since `CheckTx` can
// be non-deterministic, we don't punish the peer but instead just ignore the msg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] I think the mempool only deals with transactions and doesn't inspect the contents (i.e. messages)

Suggested change
// be non-deterministic, we don't punish the peer but instead just ignore the msg
// be non-deterministic, we don't punish the peer but instead just ignore the tx

mempool/cat/pool.go Outdated Show resolved Hide resolved
mempool/cat/pool.go Outdated Show resolved Hide resolved
mempool/cat/pool.go Outdated Show resolved Hide resolved
Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to take a second look at the tests, but overall this is really really good. The spec and the ADR are excellently written. Really top shelf stuff.

mempool/cat/spec.md Outdated Show resolved Hide resolved
Comment on lines +397 to +400
if err != nil {
// TODO: find a more polite way of handling this error
panic(err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional]
perhaps just a simple error log and os.Exit call?

Comment on lines +324 to +326
// Add jitter to when the node broadcasts it's seen txs to stagger when nodes
// in the network broadcast their seenTx messages.
time.Sleep(time.Duration(rand.Intn(10)*10) * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question]
how critical is this to the protocol working? What about natural network latencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-critical. We could remove it. Staggering when we send these messages may reduce the amount we need to send but it might mean in the worst case that a node becomes aware of transactions existence later than necessary.

mempool/cat/reactor.go Show resolved Hide resolved
mempool/cat/reactor.go Show resolved Hide resolved
Comment on lines +89 to +94
func (r *requestScheduler) ForTx(key types.TxKey) uint16 {
r.mtx.Lock()
defer r.mtx.Unlock()

return r.requestsByTx[key]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[comment]
fwiw I found this name confusing, not sure I have a better suggestion tho.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, just the name of the method

mempool/cat/pool.go Show resolved Hide resolved
mempool/cat/pool.go Show resolved Hide resolved
mempool/cat/pool.go Show resolved Hide resolved
}

numExpired := txmp.store.purgeExpiredTxs(expirationHeight, expirationAge)
txmp.metrics.EvictedTxs.Add(float64(numExpired))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question]
are we adding to the evicted transaction metrics here but not adding them to the evicted cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I remember thinking that there's a difference between eviction due to a TTL and eviction due to overflow but now I'm not sure if I agree with my earlier self.

@cmwaters
Copy link
Contributor Author

Since the cat mempool code seems inspired by the priority mempool I wonder if we need to address any of the performance issues Sergio alluded to in informalsystems/audit-celestia#36

We've been running the priority mempool in 100 node networks without any noticeable problems to performance. If there are any known problems then let me know and we can work them out. I'm also indifferent if we want to base CAT off the FIFO mempool. My understanding was that priority based block production and eviction was an important design requirement.

@@ -184,6 +185,16 @@ func TestReactorWithEvidence(t *testing.T) {
mempoolv1.WithPreCheck(sm.TxPreCheck(state)),
mempoolv1.WithPostCheck(sm.TxPostCheck(state)),
)
case cfg.MempoolV2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[non-blocking] this switch statement for defining the mempool can probably be refactored into a helper. I think I've seen it 3x already.

@@ -0,0 +1,97 @@
package cat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[non-blocking] to properly test the threadsafe nature of the cache we should add a test that spins up a number of go rountines and goes through the various actions.

Something like, goroutine A is Adding txns, goroutine B is removing txns, goroutine C is checking and evicting txns. They all have random sleeps in between actions, and then then a common done channel that closes them and the test down after 10s or something.

Copy link
Member

@MSevey MSevey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial thoughts 12/42 files reviewed.

Comment on lines 14 to 17
mtx tmsync.Mutex
size int
cacheMap map[types.TxKey]*list.Element
list *list.List
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding this correctly, the size is static. I'm not sure if we have any standards around prefixes cc @evan-forbes but I'd recommend something like static so that it is clear that this field doesn't require acquiring the mutex in order to access it.

Similarly the list should also have the static prefix as it manages its own mutex and doesn't require the LRUTxCache mutex for its operations to be threadsafe.

Suggested change
mtx tmsync.Mutex
size int
cacheMap map[types.TxKey]*list.Element
list *list.List
staticSize int
staticList *list.List
mtx tmsync.Mutex
cacheMap map[types.TxKey]*list.Element

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right. I think the convention is to place the mutex only above the fields that access it. Here this was just copied it from the v1 mempool implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly the list should also have the static prefix as it manages its own mutex and doesn't require the LRUTxCache mutex for its operations to be threadsafe.

list.List does not have it's own mutex and is managed by the LRUTxCache's mutex so I will leave it as is.

mempool/cat/cache.go Outdated Show resolved Hide resolved
mempool/cat/cache.go Outdated Show resolved Hide resolved
mempool/cat/cache.go Outdated Show resolved Hide resolved
mempool/cat/cache.go Show resolved Hide resolved
mempool/cat/cache_test.go Show resolved Hide resolved
mempool/cat/peers.go Outdated Show resolved Hide resolved
@cmwaters
Copy link
Contributor Author

cmwaters commented Jan 16, 2023

I have split this out into 4PRs: #941, #942, #943 & #944. All current suggestions have been incorporated here and in those PRs

I have kept this in draft mode as a point of reference so people can view the complete change

Copy link
Contributor

@staheri14 staheri14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently going through some of the pull requests that are related to my upcoming tasks, to familiarize myself with the codebase. I'm not able to give a detailed review at this time, as I am still gathering more information and background. I'll make some non-critical suggestions and comments as I read through the code. :)

Comment on lines +17 to +18

mtx tmsync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mtx tmsync.Mutex
mtx tmsync.Mutex

}
}

func (c *LRUTxCache) Reset() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some brief function description for the LRUTxCache methods can be helpful.

Comment on lines +73 to +77
delete(c.cacheMap, txKey)

if e != nil {
c.list.Remove(e)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to know why the delete operation is not moved to its following if block? the current version does not error out as delete is a no-op for the invalid and non-exiting keys, but logically it would make sense for both Remove and delete operations to be conditioned to e being nil, wdyt?

}

func (c *LRUTxCache) Remove(txKey types.TxKey) {
c.mtx.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following what has been implemented in the Has method (where it checks the cache size), a similar check can be done in here too:

Suggested change
c.mtx.Lock()
if c.staticSize == 0 {
return
}
c.mtx.Lock()

@staheri14
Copy link
Contributor

I have split this out into 4PRs: #941, #942, #943 & #944. All current suggestions have been incorporated here and in those PRs

I have kept this in draft mode as a point of reference so people can view the complete change

I just saw this comment, will move my suggestions to the relevant PRs.

@cmwaters
Copy link
Contributor Author

cmwaters commented Mar 9, 2023

Closing this as all the constituent parts have been completed

@cmwaters cmwaters closed this Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EPIC: Push/Pull Mempool
6 participants