Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update HCP bootstrapping to support existing clusters #16916

Merged
merged 4 commits into from
Apr 27, 2023

Conversation

freddygv
Copy link
Contributor

@freddygv freddygv commented Apr 7, 2023

Replaces #16788

Description

We want to allow users to link Consul clusters that already exist to
HCP. Existing clusters need care when bootstrapped by HCP, since we do
not want to do things like change ACL/TLS settings for a running
cluster.

The plan is to only send down a management token to existing clusters
and nothing else (at least initially).

This PR adds a few commits to that end:

  • Add an HCP management token to server config so that it can be persisted
    through Raft as we do for the initial management token.
  • Update the HCP SDK to be able to receive this new management token.
  • Update the bootstrapping logic to account for the different types of
    clusters that are going to be supported.

Best reviewed by commit

Testing & Reproduction steps

  • Unit tests

PR Checklist

  • updated test coverage
  • external facing docs updated
  • not a security concern

TODO

  • Manual testing

@github-actions github-actions bot added pr/dependencies PR specifically updates dependencies of project theme/cli Flags and documentation for the CLI interface theme/config Relating to Consul Agent configuration, including reloading labels Apr 7, 2023
@freddygv freddygv mentioned this pull request Apr 7, 2023
3 tasks
@freddygv freddygv added backport/1.14 backport/1.15 This release series is no longer active on CE. Use backport/ent/1.15. labels Apr 7, 2023
Copy link
Contributor

@jjacobson93 jjacobson93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one non-blocking comment/question. Looks good

@@ -1739,6 +1739,9 @@ func (c *RuntimeConfig) Sanitized() map[string]interface{} {

// IsCloudEnabled returns true if a cloud.resource_id is set and the server mode is enabled
func (c *RuntimeConfig) IsCloudEnabled() bool {
if c == nil {
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the case where RuntimeConfig is nil here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddygv freddygv force-pushed the cc-4716-link-existing-clusters branch from a8e7574 to 7956dbc Compare April 10, 2023 23:06
@freddygv freddygv requested review from a team and JadhavPoonam and removed request for a team April 10, 2023 23:38
Copy link
Member

@jmurret jmurret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 nice work. just had two small things, but one could possibly cause a subtle bug, so asking for changes/discussion.

@@ -2528,17 +2528,20 @@ func validateAutoConfigAuthorizer(rt RuntimeConfig) error {
}

func (b *builder) cloudConfigVal(v *CloudConfigRaw) (val hcpconfig.CloudConfig) {
val.ResourceID = os.Getenv("HCP_RESOURCE_ID")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will need to use LookupEnv here similar to what the SDK uses or else this will set an empty string if it is null. Previous iterations of the SDK also checked for null rather than empty string and would fail. So, probably best to only set this when the env var is actually configured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The val variable here is already a zero-value of hcp.CloudConfig because of the named return, so if HCP_RESOURCE_ID is not set then it stays empty. The SDK is checking that it's set first because that function is mutating potentially non-empty config.

I updated this to make the val variable declaration clearer. This isn't really a case to use named returns: style guide.

agent/consul/leader.go Show resolved Hide resolved
@freddygv freddygv requested a review from jmurret April 21, 2023 19:51
Copy link
Member

@hanshasselberg hanshasselberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great! I like the combined flow and the tests!

I left a couple nits and I have one question: When a cluster with consul 1.14 with CCM integration is upgraded to Consul 1.15, will it rebootstrap because of the missing sucess marker?

agent/hcp/bootstrap/bootstrap.go Outdated Show resolved Hide resolved
agent/hcp/bootstrap/bootstrap.go Show resolved Hide resolved
agent/hcp/bootstrap/bootstrap.go Outdated Show resolved Hide resolved
return os.WriteFile(name, []byte(token), 0600)
}

func persistBootstrapConfig(dataDir, cfgJSON string) error {
func persistBootstrapConfig(dir, cfgJSON string) error {
if cfgJSON == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: to further remove differences between new and existing cluster flow, an empty config {} could be stored here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we send that empty JSON string down from HCP? It would simplify some of the conditional logic in persistAndProcessConfig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the last commit, I updated that bit to expect {} from CCM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea to always send valid json 👍

Copy link
Member

@jmurret jmurret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 👍

@freddygv freddygv force-pushed the cc-4716-link-existing-clusters branch from 9b3b0ed to 97b080b Compare April 24, 2023 23:04
@jmurret jmurret force-pushed the cc-4716-link-existing-clusters branch from 22c7a30 to 9a28f9e Compare April 25, 2023 19:09
We want to move away from injecting an initial management token into
Consul clusters linked to HCP. The reasoning is that by using a separate
class of token we can have more flexibility in terms of allowing HCP's
token to co-exist with the user's management token.

Down the line we can also more easily adjust the permissions attached to
HCP's token to limit it's scope.

With these changes, the cloud management token is like the initial
management token in that iit has the same global management policy and
if it is created it effectively bootstraps the ACL system.
The HCP management token will now be sent in a special field rather than
as Consul's "initial management" token configuration.

This commit also updates the mock HCP server to more accurately reflect
the behavior of the CCM backend.
We want to allow users to link Consul clusters that already exist to
HCP. Existing clusters need care when bootstrapped by HCP, since we do
not want to do things like change ACL/TLS settings for a running
cluster.

Additional changes:

* Deconstruct MaybeBootstrap so that it can be tested. The HCP Go SDK
  requires HTTPS to fetch a token from the Auth URL, even if the backend
  server is mocked. By pulling the hcp.Client creation out we can modify
  its TLS configuration in tests while keeping the secure behavior in
  production code.

* Add light validation for data received/loaded.

* Sanitize initial_management token from received config, since HCP will
  only ever use the CloudConfig.MangementToken.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.15 This release series is no longer active on CE. Use backport/ent/1.15. pr/dependencies PR specifically updates dependencies of project theme/cli Flags and documentation for the CLI interface theme/config Relating to Consul Agent configuration, including reloading
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants