Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reporter: multiple fixes on resolving corner cases #33

Merged
merged 14 commits into from
Sep 13, 2022

Conversation

SebastianElvis
Copy link
Member

@SebastianElvis SebastianElvis commented Sep 12, 2022

In preparation of addressing issues and implementing unit tests, this PR has addressed the following corner cases:

  • the given stopHeight is outside the range of BTC cache in GetLastBlocks
  • the given blockHeight is outside the range of BTC cache FindBlock
  • some txs in a block can be nil. Left a TODO to find out why
  • When BBN header chain or BTC chain contains less than k blocks. In this case we need to
    • skip consistency check
    • start sync from first block in BBN header chain, rather than the k-deep block

After these fixes I can finally reproduce issues #25 and #26, otherwise the program panics in multiple places.

Copy link

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I must be missing something but everything I thought I knew about the cache seems to go against what I'm seeing in the new code. 😕

kDeepBBNBlockHeight := bbnLatestBlockHeight - r.btcConfirmationDepth + 1
kDeepBBNBlock := r.btcCache.FindBlock(kDeepBBNBlockHeight)
if kDeepBBNBlock == nil {
err = fmt.Errorf("cannot find k-deep block (height: %d) in BBN header chain in BTC cache", kDeepBBNBlockHeight)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the case if the BBN on-chain light client is ahead of the BTC node we're connected to. I see this comment says we ensure this isn't the case, but it looks like we ensure by panicking if it is. Is that okay, can we just try later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another case apart from BBN being longer than BTC: both BTC and BBN have less than k blocks.

I was thinking of another fix: extend this phase to ensure that 1) BTC is no shorter than BBN, 2) BBN has >= k blocks. What do you think?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think those conditions can be satisfied at the same time under all circumstances. Imagine you are the only vigilante and BBN doesn't have anything other than the genesis block. You can wait forever for it to have more than k blocks, while Babylon is waiting for you to start sending them.

As I understand the consistency check is to ensure that Babylon is on the same fork as BTC. Babylon has a base block in genesis. If it has less than k blocks then the oldest block it has must be its base block. That block is supposed to be trusted, so it must be the same in BTC. So, I would just validate max(bbn_tip_height - confirmation_depth, bbn_base_height) is consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That block is supposed to be trusted, so it must be the same in BTC. So, I would just validate max(bbn_tip_height - confirmation_depth, bbn_base_height) is consistent.

That's a good point. Yeah the base block in BBN is supposed to be trusted. Will adapt to this rule then. 👍

@@ -49,6 +49,13 @@ func (b *BTCCache) reverse() {

// GetLastBlocks returns list of blocks from cache up to a specified height
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what this method intends to do. It says "up to a specified height" which suggests going from the oldest block in the cache, up to stopHeight, so if we have blocks 100-200 and stopHeight=180 we will get blocks 100-180.

But if I look at what it does, this isn't the case:

for i := len(b.blocks) - 1; i >= 0; i-- {
		if b.blocks[i].Height < int32(stopHeight) {
			j = i
			break
		}
	}

	return b.blocks[j:]

That looks like it will go from the last block 200 and walk backwards until it finds block where the height is less than the stopHeight, which is block 179, then it will return blocks upwards from here, so blocks 179-200.

This would be more in line with what you expect needs to be sent to Babylon to backfill any gaps, so I assume here that the description is the one which is ambiguous about "up to".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the cache is stored in reverse order, but unless I completely lost my mind, it shouldn't be.

I thought initially it's filled backwards, like [100, 99, 98, 97] until the desired size is matched and then reversed, to leave blocks in ascending order, where the oldest block can be popped from [0] and the newest appended at the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is supposed to store blocks in the same order as the blockchain, where block[0] is the earliest block and block[-1] is the last block in the cache, i.e, [100, 101, 102, ..., 200]. This function aims at returning all blocks higher than a given stopHeight in BTC cache. Perhaps I will need to rephrase the comment...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "between the specified height and the tip of the chain"?
I was mostly asking because I saw the latest changes go against the original for loop logic, and reading the docstring didn't break the tie.

Comment on lines 52 to 54
if b.blocks[0].Height <= int32(stopHeight) {
return b.blocks
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this condition. If we want blocks between stopHeight and the tip, then this will return more, it won't start from stopHeight. Say I have all blocks in the cache from genesis, and the first height is 0. Surely that will be less than the stopHeight so I get all blocks, there is no stop.

I would possibly understand it if it was the other way around, saying I want from stopHeight to the tip, but the first block we have has Height >= stopHeight so we might as well return all (and there might be gap).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. After a second thought, I think a better option is to return an error.

This function aims at returning all blocks higher than the given stopHeight. This function is used for getting all blocks that exist in BTC but do not exist in BBN, such that the reporter will forward them to BBN. When BTC cache is properly initialised, if stopHeight (which is the height of the k-deep block in BBN) is outside the BTC cache, then this is a programming error.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, if you can't find stopHeight in the cache then you won't be able to backfill BBN, and error is justified if blocks[0].Height > stopHeight. blocks[0].Height <= stopHeight is what you want, but then you want to drop some prefix as well from the result.

Comment on lines 55 to 57
if b.blocks[len(b.blocks)-1].Height > int32(stopHeight) {
return []*IndexedBlock{}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't understand this one. It says if the last block is higher than stopHeight then we return an empty array. But that's not what the original loop did, which collected the top blocks until it hit the stopHeight. If I want blocks between stopHeight and the tip, then surely the last BTC block is going to be the tip which is higher than the stopHeight.

I would understand the other way around, if it said if last Height < stopHeight then we have nothing to return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would understand the other way around, if it said if last Height < stopHeight then we have nothing to return.

Nice catch. I think your version is correct 👍

Comment on lines 72 to 74
if b.blocks[0].Height < int32(blockHeight) || b.blocks[len(b.blocks)-1].Height > int32(blockHeight) {
return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. Based on Add the blocks are appended to the end of the cache, the last block is the highest height, blocks[0] is the lowest.

Why is it then that if we are looking for blockHeight then we return nil if the highest height is higher than what we're looking for, or if the lowest height is lower. It seems to me like the && of these two conditions is where we would rightfully expect to find something.

Either the comparisons should be reversed, or maybe the cache is stored in reverse order to what I expected it to be, but then I don't understand what Add is doing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch and sorry for the confusion. Indeed I was meaning if blockHeight is out of range then return nil. Thus the correct version should be

	  if int32(blockHeight) < b.blocks[0].Height || b.blocks[len(b.blocks)-1].Height < int32(blockHeight) {
		  return nil
	  }

for _, tx := range ib.Txs {
if tx == nil { // TODO: find out why tx can be nil
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe issue a warning?

@SebastianElvis
Copy link
Member Author

Thanks for the comments Akosh! Now this PR has resolved the comments and now can reproduce the bugs in https://github.com/babylonchain/vigilante/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+label%3Abug. These bugs will be fixed in subsequent PRs. Feel free to have a look again.

Comment on lines +16 to +22
var ModuleBasics = append(
client.ModuleBasics,
epoching.AppModuleBasic{},
checkpointing.AppModuleBasic{},
btclightclient.AppModuleBasic{},
btccheckpoint.AppModuleBasic{},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment about what this is and why it's needed?

Comment on lines 78 to 90
// Find the base height
_, bbnBaseHeight, err = r.babylonClient.QueryBaseHeader()
if err != nil {
panic(err)
}

// Find the block for consistency check
// i.e., the block at height `max(bbn_tip_height - confirmation_depth, bbn_base_height)`
if bbnLatestBlockHeight-bbnBaseHeight >= r.btcConfirmationDepth {
consistencyCheckHeight = bbnLatestBlockHeight - r.btcConfirmationDepth + 1 // height of the k-deep block in BBN header chain
} else {
consistencyCheckHeight = bbnBaseHeight // height of the base header in BBN header chain
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this looks like it should work. I see @vitsalis had to add a new query to support this.

To be honest what I head in mind wasn't to explicitly ask for what the base header height was, but rather to do something like ask Babylon what the lasted btcConfirmationDepth blocks are on the main chain in the light client. If the chain is shorter than btcConfirmationDepth that must be because we hit the base header.

This query could have done that: https://github.com/babylonchain/babylon/blob/d206af7ed6a58e5b62b4b5981b02fd066f4776d9/x/btclightclient/keeper/grpc_query.go#L64-L77

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but rather to do something like ask Babylon what the lasted btcConfirmationDepth blocks are on the main chain in the light client.

This requires the vigilant reporter to retrieve the last btcConfirmationDepth blocks for a consistency check. This is less efficient than getting the base header directly, right?

Copy link

@aakoshh aakoshh Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because of the way getting the base header is implemented https://github.com/babylonchain/babylon/blob/d206af7ed6a58e5b62b4b5981b02fd066f4776d9/x/btclightclient/keeper/state.go#L127-L136

It collects the mainchain as far back as it can go (not necessarily all the way to the base in genesis, if we pruned away some of the old blocks already). So it would actually be more efficient to only ask for the last 6 blocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I thought the base header is stored somewhere in the parameter or the genesis state... In fact, I think it should be a part of the system parameter. Many use cases (e.g., explorer) will need to query the base header, which plays as the beginning time of BBN.

// If BBN has less than k blocks, sync from the 1st block in BBN,
// since in this case the base header has passed the consistency check
if bbnLatestBlockHeight >= r.btcConfirmationDepth {
startSyncHeight = bbnLatestBlockHeight - r.btcConfirmationDepth + 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
startSyncHeight = bbnLatestBlockHeight - r.btcConfirmationDepth + 1
startSyncHeight = max(bbnLatestBlockHeight - r.btcConfirmationDepth, bbnBaseHeight) + 1,

Make sure you don't start syncing from before the bbnBaseHeight because the light client will not admit those blocks!

Copy link
Member Author

@SebastianElvis SebastianElvis Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the version in the last commit, i.e., change the condition to if bbnLatestBlockHeight >= bbnBaseHeight+r.btcConfirmationDepth?

if bbnLatestBlockHeight >= r.btcConfirmationDepth {
startSyncHeight = bbnLatestBlockHeight - r.btcConfirmationDepth + 1
} else {
startSyncHeight = 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
startSyncHeight = 1
startSyncHeight = bbnBaseHeight + 1

It could be the case that someone configures the base height to be 2 even if the total chain is just 4 blocks long or something, leading to rejected transactions.

stopHeight := int32(totalBlockCount) - int32(r.btcConfirmationDepth) - int32(r.checkpointFinalizationTimeout)
if stopHeight < 0 { // this happens when Bitcoin contains less than `k+w` blocks
stopHeight = 0
}

ibs, err = r.btcClient.GetLastBlocks(stopHeight)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the btcCacheMaxEntries the way it currently is no good, as demonstrated by the + 1000 and the TODO here:

k := btccParams.BtcConfirmationDepth
w := btccParams.CheckpointFinalizationTimeout
btcCacheMaxEntries := uint(k+w) + 1000
// TODO: btcCacheMaxEntries can still be larger than k+w. We need to give a larger btc cache size here

For example BTC could be at 1000 and BBN base at 500 by the time you start BBN. It's all well and good to want max 100 entries, but if you need to backfill 500 and then it doesn't really help for the cache to panic, like it did in the demo.

What I reckon you should do is have a target size. Allow the initialisation to fill up more than that, but trim it back down after BBN has been backfilled. Or just don't use the cache for backfilling BBN.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add to the cache not just an Add but a method you can use to notify it once a header has been sent to BBN.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I reckon you should do is have a target size. Allow the initialisation to fill up more than that, but trim it back down after BBN has been backfilled. Or just don't use the cache for backfilling BBN.

This is a good idea! Yeah we can dequeue the oldest block and enqueue the new block safely. Will do!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if you dequeue and enqueue, the size will not shrink, so maybe you want to separate the two completely. Add just adds, and a Trim(safeHeight) can remove up to the point you are happy you sent to BBN, minus the target size of the cache.

Copy link
Member Author

@SebastianElvis SebastianElvis Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to contradict to the design of having a fixed-size cache for consistency/stalling check. In fact if we want to keep everything in cache then we inevitably need to make its size much larger than k+w when BBN falls behind BTC.

Then, perhaps another way is to:

  1. During bootstrapping, fetch blocks since height T-k-w
  2. After bootstrapping, put the most recent k+w blocks in these fetched blocks into cache
  3. Maintain the cache (of the latest k+w blocks) and use it to do consistency/stalling checks periodically

So that we have a fixed-size cache that is always sliding forward. What do you think?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we both described are compatible. What I was suggesting is

  1. Init the cache from T-k-w, which will be an unknown number of blocks depending on how much BBN knows about BTC.
  2. As you feed blocks to BBN, call Trim on the cache to pop the oldest blocks. But don't trim the last k+w blocks, which could be your btcCacheMinEntries setting.
  3. When there's a new BTC block, push it to the end.

So your cache size would not be fixed, but rather fluctuate between btcCacheMinEntries and btcCacheMaxEntries+1. Effectively fixed, but elastic and perhaps easier to manage.

You do what you feel most comfortable with!

Comment on lines +77 to 80
for i := len(b.blocks) - 1; i >= 0; i-- {
if b.blocks[i].Height == int32(blockHeight) {
return b.blocks[i]
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW if you know the blocks are in order, you should be able to calculate the offset directly. Don't even need bisecting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah will do this when the time is allowed, possibly by a map or something.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need a map, if you know your first entry is the lowest height, your last entry is the highest height, and in between there are no gaps, then the index you're looking for is blocks[targetHeight - blocks[0].Height] or something like that.

@SebastianElvis
Copy link
Member Author

Thanks for the second-round review Akosh! Now this PR has resolved the follow-up comments, including

  • trimming the cache and changing the cache size
  • typos on the base header height
  • more comments

Feel free to have a look again.

Copy link

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but to be honest the cache is a bit convoluted, so 🤞
If it was me I would try the simpler model of having no hard limit, but rather the target/minimum required for consistency checks, and use it as a buffer that can grow until you are able to send the values to BBN. But if this works for you, great!

/* Initial consistency check: whether the `max(bbn_tip_height - confirmation_depth, bbn_base_height)`-th block is same */

// Find the base height
_, bbnBaseHeight, err = r.babylonClient.QueryBaseHeader()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SebastianElvis what does baseHeight mean? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BBN header chain starts from a certain height of BTC, rather than BTC's genesis block. Base height means the height of BTC block that BBN header chain starts with.

@SebastianElvis SebastianElvis merged commit 79c8927 into main Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants