Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: refactor and fuzz tests for checkpoint segments and pools #93

Merged
merged 10 commits into from
Oct 26, 2022

Conversation

SebastianElvis
Copy link
Member

Fixes BM-153

This PR provides the fuzz tests for the checkpoint segment pool. Along the way, this PR also refactors the checkpoint structures for readability and modularity.

Copy link
Member

@vitsalis vitsalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!! Some recommendations for refactoring and the fuzz tests:

}

// TODO: generalise to NumExpectedProofs > 2
// TODO: optimise the complexity by hashmap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we implement this TODO in this PR as well. I suppose we could do both TODOs, we havee all the necessary structures and the code will be simplified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first TODO requires support from Babylon, which at the moment only provides ConnectParts with two parts as input. So probably we cannot add this support at the moment.

The second TODO is in fact implemented, where the pool has a map with two keys (index of checkpoint segment and hash of OP_RETURN data). I feel like finding matches inevitably requires traversal of two lists of checkpoint segments. Could you think of any other optimisation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the second key of the hashmap was something that was shared by both checkpoint segments (instead of the hash of the data contained in them, e.g. epoch number or something even more specific)? Then, we could use that key to do filtering on the amount of stuff we iterate in the second loop. Currently, for every first part we're iterating all second parts which is suboptimal. Won't have much effect in practice since we do not expect the pool to have many elements, but I think we might simplify the code a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the second key of the hashmap was something that was shared by both checkpoint segments (instead of the hash of the data contained in them, e.g. epoch number or something even more specific)?

This scenario is already ruled out by the checksum in the checkpoint, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we could use the checksum.

}
}

func (p *CkptSegmentPool) Add(ckptSeg *CkptSegment) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we add docstrings to all of those new functions. Relevant to refactoring.

matchedCkpts = append(matchedCkpts, ckpt)
// remove the two ckptSeg in pool
delete(p.Pool[uint8(0)], hash1)
delete(p.Pool[uint8(1)], hash2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen in the case that BBN is down when we try to submit the checkpoints? I suppose we panic, although this is not a breaking condition for the vigilante. I suggest that this function only concerns itself to extracting relevant checkpoints and leaves the deletion of existing checkpoints into another function that is triggered externally when a checkpoint has been successfully included in BBN. This way, the reporter won't lose information and can be relied upon in the case that BBN goes down for a while.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that this function only concerns itself to extracting relevant checkpoints and leaves the deletion of existing checkpoints into another function that is triggered externally when a checkpoint has been successfully included in BBN.

Good idea, will do. A possible way of doing this is to move these matched checkpoints from checkpoint segment pool to another list of matched checkpoints, then report these matched checkpoints, and finally remove them from the list. In fact, ADR-001 (which I think is a good idea) will require reporter to implement such a list anyway (where each checkpoint has to be k-deep before being reported). How does that sound?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good idea, agree all the way ✊

}

// Sort the matched pairs by epoch, since they have to be submitted in order
sort.Slice(matchedCkpts, func(i, j int) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe do the sorting elsewhere. The checkpoint pool should not be aware of checkpoint submitting invariants.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, we can do the sorting in the list of matched checkpoints described above. wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

}

func FuzzCkptSegmentPool(f *testing.F) {
datagen.AddRandomSeedsToFuzzer(f, 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use more seeds. Unless they add much overhead.

f.Fuzz(func(t *testing.T, seed int64) {
rand.Seed(seed)

tag := btctxformatter.MainTag(48)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we decide between MainTag and TestTag based on the seed. Also, we decide the tag index based on the seed.

"github.com/stretchr/testify/require"
)

func randNBytes(n int) []byte {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's the datagen.GenRandomByteArray function

Epoch: rand.Uint64(),
LastCommitHash: randNBytes(btctxformatter.LastCommitHashLength),
BitMap: randNBytes(btctxformatter.BitMapLength),
SubmitterAddress: randNBytes(btctxformatter.AddressLength),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should look like a valid BTC address.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test aims at ensuring the correctness of matching checkpoint segments. Would checking the format of addresses be out of the scope?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should use as realistic data as possible. Eitherway, we won't have to check whether the address is valid since the generation mechanism will accomplish that. Might be a good addition to the datagen functionality of bbn.

require.Equal(t, 2, pool.Size())

// find matched pairs of segments in the pool
ckpts := pool.Match()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function only checks that checkpoints get matched only when a pool contains 2 elements that are matching. Other elements that we should test for

  • Variable number of checkpoints inside the pool. We can emulate the number of checkpoints from the seed (e.g. deciding between inserting 0..10 checkpoints.
  • The checkpoints are returned in sorted order in terms of the epoch.
  • There might be checkpoints which do not have a match (e.g. only one part of the checkpoint has been included). We can emulate this by randomly generating a flag that decides if this should happen (i.e. only add one part of a checkpoint). Then, we can randomly decide if this part is the first one or the second one.

@SebastianElvis
Copy link
Member Author

Thanks for the review Vitalis! Now this PR has addressed the comments. Most notably,

  • The fuzz test now takes a random number of pairs of checkpoint segments, where each pair may or may not be a match.
  • CkptSegmentPool is now repurposed to CheckpointCache, which additionally maintains the matched checkpoints. This way we can ensure each checkpoint is reported before removing it from the cache. It will also serve as a basis of the vigilante monitor and ADR-001, which will be implemented in subsequent PRs.

Some future works still remain:

  • the current fuzz test does not check orders of segments in the cache. Instead this should be tested in BTC handler, which orders checkpoints before reporting them.
  • the current fuzz test does not ensure the format of addresses. This requires support from datagen package beforehand. Will try to add in a separate PR from Babylon side.

Copy link
Member

@vitsalis vitsalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Only minor stuff:

log.Debug("Found no matched pair of checkpoint segments in this match attempt")
return nil
}

// Sort the matched pairs by epoch, since they have to be submitted in order
// TODO: find smarter way for sorting
sort.Slice(r.CheckpointCache.Checkpoints, func(i, j int) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could implement a GetSortedCheckpoints method.

// for each matched pair, wrap to MsgInsertBTCSpvProof and send to Babylon
for _, ckpt := range ckpts {
for r.CheckpointCache.NumCheckpoints() > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe update the comment below to demonstrate that this is a while loop that modifies the list in place.

Also, we could implement methods HasCheckpoints and RemoveEarliestCheckpoint to make the operation more clear.


// if we don't want a match, then mess up with one of BabylonData
if !match {
if rand.Intn(2) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's the datagen.OneInN method


// get a random tag, either for mainnet or testnet
var tag btctxformatter.BabylonTag
if rand.Intn(2) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also use datagen.OneInN here.

@SebastianElvis SebastianElvis merged commit 17ce33e into main Oct 26, 2022
@SebastianElvis SebastianElvis deleted the ckpt-tests branch October 26, 2022 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants