Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate peers on startup #231

Merged
merged 23 commits into from
Mar 19, 2024
Merged

Validate peers on startup #231

merged 23 commits into from
Mar 19, 2024

Conversation

cam-schultz
Copy link
Collaborator

Why this should be merged

Fixes #193

How this works

Manually connects to subnet validators on startup, rather than at message relay time. Each subnet must be connected to enough stake that any of the configured destinations would be able to validate Warp messages sent by the subnet, otherwise the relayer exits. We still need to attempt to connect to peers at message relay time, since the validator set may have changed. This change is more intended to surface any issues earlier, rather than optimizing performance.

In addition to moving this check to startup, this change also refactors some of the peer connection and signature management logic into common helpers.

How this was tested

CI

How is this documented

N/A

@cam-schultz cam-schultz changed the title Validate peers Validate peers on startup Mar 13, 2024
@@ -483,55 +465,6 @@ func (r *messageRelayer) createSignedMessageAppRequest(requestID uint32) (*avala
return nil, errNotEnoughSignatures
}

func (r *messageRelayer) getCurrentCanonicalValidatorSet() ([]*avalancheWarp.Validator, uint64, error) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signing subnet check moved to messageRelayer constructor; the rest of this function moved to CanonicalValidatorClient.GetCurrentCanonicalValidatorSet

zap.Error(err),
)
return nil, err
}

// We make queries to node IDs, not unique validators as represented by a BLS pubkey, so we need this map to track
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic was moved to AppRequestNetwork.ConnectToCanonicalValidators

@@ -156,18 +175,6 @@ func (r *messageRelayer) createSignedMessage() (*avalancheWarp.Message, error) {
)
return nil, err
}
signingSubnetID := r.relayer.sourceSubnetID
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved to the messageRelayer constructor

Comment on lines -101 to -103
// convenience fields to access the source subnet and chain IDs after initialization
sourceSubnetIDs []ids.ID
sourceBlockchainIDs []ids.ID
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were only used to pass to the app request network constructor. With this change, it makes more sense to pass the configuration and iterate over the SourceBlockchain and DestinationBlockchain slices directly, so the parsed IDs were moved there.

@@ -103,83 +102,99 @@ func NewNetwork(
return nil, nil, err
}

// We need to initially connect to some nodes in the network before peer
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to connect to these beacon nodes, since we explicitly connect to every node that we send AppRequest messages to now.

@@ -417,6 +401,18 @@ func (s *SourceBlockchain) Validate(destinationBlockchainIDs *set.Set[string]) e
}
}

// Validate and store the subnet and blockchain IDs for future use
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but we should consider splitting config.go up into multiple files at some point soon. I think SourceBlockchain and DestinationBlockchain could each be put in their own file still in the config package and could help improve navigation/readability.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed and some of the similarities between the source/destination can be abstracted

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of the similarities between the source/destination can be abstracted

Viper doesn't play nicely with this unfortunately. There may be a workaround, but I haven't looked to deeply into it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look too bad, but we can definitely defer to a separate ticket https://stackoverflow.com/questions/47185318/multiple-config-files-with-go-viper

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I think we should keep a single user-facing configuration file, but split it into multiple Go files for easier code readability.

I experimented with Matt's suggestion of composing SourceBlockchain and DestinationBlockchain using a common type, but Viper seemed to have trouble unmarshalling JSON into them. Definitely worth revisiting if we decide to go for those config improvements.

database/json_file_storage.go Outdated Show resolved Hide resolved
main/main.go Outdated Show resolved Hide resolved
break
// Manually connect to the validators of each of the source subnets.
// We return an error if we are unable to connect to sufficient stake on any of the subnets.
// Sufficient stake is determined by the Warp quora of the configured supported destinations,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that "quora" is a word. 😅

// We return an error if we are unable to connect to sufficient stake on any of the subnets.
// Sufficient stake is determined by the Warp quora of the configured supported destinations,
// or if the subnet supports all destinations, by the quora of all configured destinations.
for _, sourceBlockchain := range cfg.SourceBlockchains {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop body is pretty large. May want to consider moving parts out to helper functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also be great to add unit tests for this since the logic is relatively complex. I'm not sure how difficult it is to mock the various clients and arNetwork though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add unit tests, I believe we'd have to add mocks for the info client, the P-Chain client, and the underlying Network. All doable, but it might be a large enough change to warrant its own ticket.

peers/app_request_network.go Outdated Show resolved Hide resolved
database/json_file_storage.go Outdated Show resolved Hide resolved
main/main.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
Comment on lines 132 to 136
logger.Error(
"Failed to get subnet ID",
zap.Error(err),
)
return nil, nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an effort to reduce to code complexity a little bit, I think if we don't require and additional info in the log (ex the blockchain ID), then it would be best skip logging here and to wrap the error:
return nil, nil, fmt.Errorf("failed to get subnet ID: %w", err)
This way, all the important info will get logged together at some point up the stack. As is, we'll get a bunch of disjointed errors that will be harder to correlate.

Doesn't have to be something we do in this PR, but I think it would be good practice to go through and implement at some point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that we can have a more unified logging / error handling strategy. I also agree that we should go through and unify the repo once we decide on that strategy. In the meantime, however, I'd like to err on the side of over-verbosity and log where we can, even if some logs are redundant.

Comment on lines +278 to +279
ConnectedWeight uint64
TotalValidatorWeight uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to have at least TotalValidatorWeight not be in the struct here, but have a receiver function on ConnectedCanonicalValidators that will iterate through and add up all the weights. This would avoid bugs down the road if anyone mutates in the Validator set.

If we did the same for ConnectedWeight, it would be easier to get the updated weight of all our connections. If we need to use the connected weight many times and don't want to try connecting to all the nodes every time, maybe make the field MostRecentConnectedWeight and then add a receiver method UpdateConnectedWeight

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those ideas make sense, but I'm going to defer that for later. I think we'll likely want to add an async routine to update the validator set, rather than fetching it in every message relay, at which point we'll want to carefully consider how the global state is managed. For now though, this type is only used in the message relayer, so the risk of it being misused is fairly low.

relayer/message_relayer.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
peers/app_request_network.go Outdated Show resolved Hide resolved
// Helper struct to hold connected validator information
// Warp Validators sharing the same BLS key may consist of multiple nodes,
// so we need to track the node ID to validator index mapping
type ConnectedCanonicalValidators struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this struct to a separate file, and ideally have mockign tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I don't think it's worth all that special treatment since this type is POD. It's really just there to simplify calls to ConnectToCanonicalValidators and keep associated data packaged together.

@@ -417,6 +401,18 @@ func (s *SourceBlockchain) Validate(destinationBlockchainIDs *set.Set[string]) e
}
}

// Validate and store the subnet and blockchain IDs for future use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look too bad, but we can definitely defer to a separate ticket https://stackoverflow.com/questions/47185318/multiple-config-files-with-go-viper

@cam-schultz cam-schultz merged commit 54888f9 into main Mar 19, 2024
6 of 7 checks passed
@cam-schultz cam-schultz deleted the validate-peers branch March 19, 2024 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Validate peer version
4 participants