-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reporter: multiple fixes on resolving corner cases #33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry I must be missing something but everything I thought I knew about the cache seems to go against what I'm seeing in the new code. 😕
reporter/bootstrapping.go
Outdated
kDeepBBNBlockHeight := bbnLatestBlockHeight - r.btcConfirmationDepth + 1 | ||
kDeepBBNBlock := r.btcCache.FindBlock(kDeepBBNBlockHeight) | ||
if kDeepBBNBlock == nil { | ||
err = fmt.Errorf("cannot find k-deep block (height: %d) in BBN header chain in BTC cache", kDeepBBNBlockHeight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be the case if the BBN on-chain light client is ahead of the BTC node we're connected to. I see this comment says we ensure this isn't the case, but it looks like we ensure by panicking if it is. Is that okay, can we just try later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another case apart from BBN being longer than BTC: both BTC and BBN have less than k blocks.
I was thinking of another fix: extend this phase to ensure that 1) BTC is no shorter than BBN, 2) BBN has >= k blocks. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think those conditions can be satisfied at the same time under all circumstances. Imagine you are the only vigilante and BBN doesn't have anything other than the genesis block. You can wait forever for it to have more than k blocks, while Babylon is waiting for you to start sending them.
As I understand the consistency check is to ensure that Babylon is on the same fork as BTC. Babylon has a base block in genesis. If it has less than k blocks then the oldest block it has must be its base block. That block is supposed to be trusted, so it must be the same in BTC. So, I would just validate max(bbn_tip_height - confirmation_depth, bbn_base_height)
is consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That block is supposed to be trusted, so it must be the same in BTC. So, I would just validate max(bbn_tip_height - confirmation_depth, bbn_base_height) is consistent.
That's a good point. Yeah the base block in BBN is supposed to be trusted. Will adapt to this rule then. 👍
types/btccache.go
Outdated
@@ -49,6 +49,13 @@ func (b *BTCCache) reverse() { | |||
|
|||
// GetLastBlocks returns list of blocks from cache up to a specified height |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what this method intends to do. It says "up to a specified height" which suggests going from the oldest block in the cache, up to stopHeight
, so if we have blocks 100-200 and stopHeight=180 we will get blocks 100-180.
But if I look at what it does, this isn't the case:
for i := len(b.blocks) - 1; i >= 0; i-- {
if b.blocks[i].Height < int32(stopHeight) {
j = i
break
}
}
return b.blocks[j:]
That looks like it will go from the last block 200 and walk backwards until it finds block where the height is less than the stopHeight, which is block 179, then it will return blocks upwards from here, so blocks 179-200.
This would be more in line with what you expect needs to be sent to Babylon to backfill any gaps, so I assume here that the description is the one which is ambiguous about "up to".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the cache is stored in reverse order, but unless I completely lost my mind, it shouldn't be.
I thought initially it's filled backwards, like [100, 99, 98, 97]
until the desired size is matched and then reversed, to leave blocks in ascending order, where the oldest block can be popped from [0]
and the newest appended at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache is supposed to store blocks in the same order as the blockchain, where block[0] is the earliest block and block[-1] is the last block in the cache, i.e, [100, 101, 102, ..., 200]. This function aims at returning all blocks higher than a given stopHeight
in BTC cache. Perhaps I will need to rephrase the comment...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "between the specified height and the tip of the chain"?
I was mostly asking because I saw the latest changes go against the original for loop logic, and reading the docstring didn't break the tie.
types/btccache.go
Outdated
if b.blocks[0].Height <= int32(stopHeight) { | ||
return b.blocks | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this condition. If we want blocks between stopHeight
and the tip, then this will return more, it won't start from stopHeight
. Say I have all blocks in the cache from genesis, and the first height is 0. Surely that will be less than the stopHeight
so I get all blocks, there is no stop.
I would possibly understand it if it was the other way around, saying I want from stopHeight
to the tip, but the first block we have has Height >= stopHeight
so we might as well return all (and there might be gap).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. After a second thought, I think a better option is to return an error.
This function aims at returning all blocks higher than the given stopHeight
. This function is used for getting all blocks that exist in BTC but do not exist in BBN, such that the reporter will forward them to BBN. When BTC cache is properly initialised, if stopHeight
(which is the height of the k-deep block in BBN) is outside the BTC cache, then this is a programming error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, if you can't find stopHeight
in the cache then you won't be able to backfill BBN, and error is justified if blocks[0].Height > stopHeight
. blocks[0].Height <= stopHeight
is what you want, but then you want to drop some prefix as well from the result.
types/btccache.go
Outdated
if b.blocks[len(b.blocks)-1].Height > int32(stopHeight) { | ||
return []*IndexedBlock{} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't understand this one. It says if the last block is higher than stopHeight
then we return an empty array. But that's not what the original loop did, which collected the top blocks until it hit the stopHeight
. If I want blocks between stopHeight
and the tip, then surely the last BTC block is going to be the tip which is higher than the stopHeight
.
I would understand the other way around, if it said if last Height < stopHeight
then we have nothing to return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would understand the other way around, if it said if last Height < stopHeight then we have nothing to return.
Nice catch. I think your version is correct 👍
types/btccache.go
Outdated
if b.blocks[0].Height < int32(blockHeight) || b.blocks[len(b.blocks)-1].Height > int32(blockHeight) { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. Based on Add the blocks are appended to the end of the cache, the last block is the highest height, blocks[0]
is the lowest.
Why is it then that if we are looking for blockHeight
then we return nil
if the highest height is higher than what we're looking for, or if the lowest height is lower. It seems to me like the &&
of these two conditions is where we would rightfully expect to find something.
Either the comparisons should be reversed, or maybe the cache is stored in reverse order to what I expected it to be, but then I don't understand what Add
is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch and sorry for the confusion. Indeed I was meaning if blockHeight
is out of range then return nil
. Thus the correct version should be
if int32(blockHeight) < b.blocks[0].Height || b.blocks[len(b.blocks)-1].Height < int32(blockHeight) {
return nil
}
for _, tx := range ib.Txs { | ||
if tx == nil { // TODO: find out why tx can be nil | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe issue a warning?
Thanks for the comments Akosh! Now this PR has resolved the comments and now can reproduce the bugs in https://github.com/babylonchain/vigilante/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+label%3Abug. These bugs will be fixed in subsequent PRs. Feel free to have a look again. |
var ModuleBasics = append( | ||
client.ModuleBasics, | ||
epoching.AppModuleBasic{}, | ||
checkpointing.AppModuleBasic{}, | ||
btclightclient.AppModuleBasic{}, | ||
btccheckpoint.AppModuleBasic{}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment about what this is and why it's needed?
reporter/bootstrapping.go
Outdated
// Find the base height | ||
_, bbnBaseHeight, err = r.babylonClient.QueryBaseHeader() | ||
if err != nil { | ||
panic(err) | ||
} | ||
|
||
// Find the block for consistency check | ||
// i.e., the block at height `max(bbn_tip_height - confirmation_depth, bbn_base_height)` | ||
if bbnLatestBlockHeight-bbnBaseHeight >= r.btcConfirmationDepth { | ||
consistencyCheckHeight = bbnLatestBlockHeight - r.btcConfirmationDepth + 1 // height of the k-deep block in BBN header chain | ||
} else { | ||
consistencyCheckHeight = bbnBaseHeight // height of the base header in BBN header chain | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this looks like it should work. I see @vitsalis had to add a new query to support this.
To be honest what I head in mind wasn't to explicitly ask for what the base header height was, but rather to do something like ask Babylon what the lasted btcConfirmationDepth
blocks are on the main chain in the light client. If the chain is shorter than btcConfirmationDepth
that must be because we hit the base header.
This query could have done that: https://github.com/babylonchain/babylon/blob/d206af7ed6a58e5b62b4b5981b02fd066f4776d9/x/btclightclient/keeper/grpc_query.go#L64-L77
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but rather to do something like ask Babylon what the lasted btcConfirmationDepth blocks are on the main chain in the light client.
This requires the vigilant reporter to retrieve the last btcConfirmationDepth
blocks for a consistency check. This is less efficient than getting the base header directly, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because of the way getting the base header is implemented https://github.com/babylonchain/babylon/blob/d206af7ed6a58e5b62b4b5981b02fd066f4776d9/x/btclightclient/keeper/state.go#L127-L136
It collects the mainchain as far back as it can go (not necessarily all the way to the base in genesis, if we pruned away some of the old blocks already). So it would actually be more efficient to only ask for the last 6 blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I thought the base header is stored somewhere in the parameter or the genesis state... In fact, I think it should be a part of the system parameter. Many use cases (e.g., explorer) will need to query the base header, which plays as the beginning time of BBN.
// If BBN has less than k blocks, sync from the 1st block in BBN, | ||
// since in this case the base header has passed the consistency check | ||
if bbnLatestBlockHeight >= r.btcConfirmationDepth { | ||
startSyncHeight = bbnLatestBlockHeight - r.btcConfirmationDepth + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startSyncHeight = bbnLatestBlockHeight - r.btcConfirmationDepth + 1 | |
startSyncHeight = max(bbnLatestBlockHeight - r.btcConfirmationDepth, bbnBaseHeight) + 1, |
Make sure you don't start syncing from before the bbnBaseHeight
because the light client will not admit those blocks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the version in the last commit, i.e., change the condition to if bbnLatestBlockHeight >= bbnBaseHeight+r.btcConfirmationDepth
?
reporter/bootstrapping.go
Outdated
if bbnLatestBlockHeight >= r.btcConfirmationDepth { | ||
startSyncHeight = bbnLatestBlockHeight - r.btcConfirmationDepth + 1 | ||
} else { | ||
startSyncHeight = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startSyncHeight = 1 | |
startSyncHeight = bbnBaseHeight + 1 |
It could be the case that someone configures the base height to be 2 even if the total chain is just 4 blocks long or something, leading to rejected transactions.
stopHeight := int32(totalBlockCount) - int32(r.btcConfirmationDepth) - int32(r.checkpointFinalizationTimeout) | ||
if stopHeight < 0 { // this happens when Bitcoin contains less than `k+w` blocks | ||
stopHeight = 0 | ||
} | ||
|
||
ibs, err = r.btcClient.GetLastBlocks(stopHeight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the btcCacheMaxEntries
the way it currently is no good, as demonstrated by the + 1000
and the TODO here:
vigilante/reporter/reporter.go
Lines 46 to 49 in 3e80e75
k := btccParams.BtcConfirmationDepth | |
w := btccParams.CheckpointFinalizationTimeout | |
btcCacheMaxEntries := uint(k+w) + 1000 | |
// TODO: btcCacheMaxEntries can still be larger than k+w. We need to give a larger btc cache size here |
For example BTC could be at 1000 and BBN base at 500 by the time you start BBN. It's all well and good to want max 100 entries, but if you need to backfill 500 and then it doesn't really help for the cache to panic, like it did in the demo.
What I reckon you should do is have a target size. Allow the initialisation to fill up more than that, but trim it back down after BBN has been backfilled. Or just don't use the cache for backfilling BBN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add to the cache not just an Add
but a method you can use to notify it once a header has been sent to BBN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I reckon you should do is have a target size. Allow the initialisation to fill up more than that, but trim it back down after BBN has been backfilled. Or just don't use the cache for backfilling BBN.
This is a good idea! Yeah we can dequeue the oldest block and enqueue the new block safely. Will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if you dequeue and enqueue, the size will not shrink, so maybe you want to separate the two completely. Add
just adds, and a Trim(safeHeight)
can remove up to the point you are happy you sent to BBN, minus the target size of the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to contradict to the design of having a fixed-size cache for consistency/stalling check. In fact if we want to keep everything in cache then we inevitably need to make its size much larger than k+w
when BBN falls behind BTC.
Then, perhaps another way is to:
- During bootstrapping, fetch blocks since height
T-k-w
- After bootstrapping, put the most recent
k+w
blocks in these fetched blocks into cache - Maintain the cache (of the latest
k+w
blocks) and use it to do consistency/stalling checks periodically
So that we have a fixed-size cache that is always sliding forward. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what we both described are compatible. What I was suggesting is
- Init the cache from
T-k-w
, which will be an unknown number of blocks depending on how much BBN knows about BTC. - As you feed blocks to BBN, call
Trim
on the cache to pop the oldest blocks. But don't trim the lastk+w
blocks, which could be yourbtcCacheMinEntries
setting. - When there's a new BTC block, push it to the end.
So your cache size would not be fixed, but rather fluctuate between btcCacheMinEntries
and btcCacheMaxEntries+1
. Effectively fixed, but elastic and perhaps easier to manage.
You do what you feel most comfortable with!
for i := len(b.blocks) - 1; i >= 0; i-- { | ||
if b.blocks[i].Height == int32(blockHeight) { | ||
return b.blocks[i] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW if you know the blocks are in order, you should be able to calculate the offset directly. Don't even need bisecting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah will do this when the time is allowed, possibly by a map or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need a map, if you know your first entry is the lowest height, your last entry is the highest height, and in between there are no gaps, then the index you're looking for is blocks[targetHeight - blocks[0].Height]
or something like that.
Thanks for the second-round review Akosh! Now this PR has resolved the follow-up comments, including
Feel free to have a look again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but to be honest the cache is a bit convoluted, so 🤞
If it was me I would try the simpler model of having no hard limit, but rather the target/minimum required for consistency checks, and use it as a buffer that can grow until you are able to send the values to BBN. But if this works for you, great!
/* Initial consistency check: whether the `max(bbn_tip_height - confirmation_depth, bbn_base_height)`-th block is same */ | ||
|
||
// Find the base height | ||
_, bbnBaseHeight, err = r.babylonClient.QueryBaseHeader() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SebastianElvis what does baseHeight
mean? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BBN header chain starts from a certain height of BTC, rather than BTC's genesis block. Base height means the height of BTC block that BBN header chain starts with.
In preparation of addressing issues and implementing unit tests, this PR has addressed the following corner cases:
stopHeight
is outside the range of BTC cache inGetLastBlocks
blockHeight
is outside the range of BTC cacheFindBlock
After these fixes I can finally reproduce issues #25 and #26, otherwise the program panics in multiple places.