Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Monitor/Fix deadlock bugs in bootstrapping #136

Merged
merged 3 commits into from
Feb 1, 2023
Merged

Conversation

gitferry
Copy link
Collaborator

This PR is to fix bugs in the monitor found during the integration test with other components. The cause of the bugs is mostly related to deadlocks across goroutines. With this change, the monitor worked well in verifying historical BTC checkpoints. A subsequent PR would fix bugs in handling incoming BTC blocks.

Copy link
Member

@SebastianElvis SebastianElvis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Nevertheless, I have some questions on the design of making bootstrap and block handler concurrent, as per below comments. Not sure if I miss something or not.

@@ -24,6 +25,8 @@ type MonitorConfig struct {
LivenessCheckIntervalSeconds uint64 `mapstructure:"liveness-check-interval-seconds"`
// Max lasting BTC heights that a checkpoint is not reported before an alarm is sent
MaxLiveBtcHeights uint64 `mapstructure:"max-live-btc-heights"`
// the confirmation depth to consider a BTC block as confirmed
BtcConfirmationDepth uint64 `mapstructure:"btc-confirmation-depth"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a part of monitor's config? Looks like its value is retrieved from Babylon and we could not specify its value in a config file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Before this PR, the value is retrieved from Babylon, but after a second thought, I think it's better the make it also configurable on the Monitor side. One reason is that we should not trust any queries from Babylon. All the information obtained from Babylon should be verifiable (we only trust the genesis file). Another reason is that this makes testing easier.

prevHash := &tipBlock.Header.PrevBlock
for i := int(tipHeight-baseHeight) - 1; i >= 0; i-- {
for i := len(chainBlocks) - 2; i >= 0; i-- {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a comment describing why -2 here

cb.Lock()
defer cb.Unlock()

func (cb *CheckpointsBookkeeper) exists(id string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Suggested change
func (cb *CheckpointsBookkeeper) exists(id string) bool {
func (cb *CheckpointsBookkeeper) Has(id string) bool {

is better?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, has is better but I'm not sure we should make it exposed as it is used internally and not thread-safe.

"need to restart the bootstrapping process", event.Height, err.Error())
bs.Synced.Store(false)
bs.Bootstrap()
if bs.Synced.Swap(false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean only when the scanner is synced, an error here triggers re-bootstrapping? Is it possible that the scanner fails to handle a new block while syncing, thus missing this new block?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question. Yep, it is possible when the bootstrapping is happening but failed to include the new block. In this case, the bootstrapping will happen again when the next block comes. I don't think it would be a problem since this happens rarely and it will always be synced.

@gitferry gitferry merged commit fc7a384 into monitor Feb 1, 2023
@gitferry gitferry deleted the monitor-fix-bugs branch February 1, 2023 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants