-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update HCP bootstrapping to support existing clusters #16916
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one non-blocking comment/question. Looks good
@@ -1739,6 +1739,9 @@ func (c *RuntimeConfig) Sanitized() map[string]interface{} { | |||
|
|||
// IsCloudEnabled returns true if a cloud.resource_id is set and the server mode is enabled | |||
func (c *RuntimeConfig) IsCloudEnabled() bool { | |||
if c == nil { | |||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the case where RuntimeConfig
is nil
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some tests in command/agent
that hit that and panic'd: https://app.circleci.com/pipelines/github/hashicorp/consul/56048/workflows/a6e357e8-4c0f-49a7-a22b-6872e2f391f8/jobs/1349145/parallel-runs/3
a8e7574
to
7956dbc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 nice work. just had two small things, but one could possibly cause a subtle bug, so asking for changes/discussion.
agent/config/builder.go
Outdated
@@ -2528,17 +2528,20 @@ func validateAutoConfigAuthorizer(rt RuntimeConfig) error { | |||
} | |||
|
|||
func (b *builder) cloudConfigVal(v *CloudConfigRaw) (val hcpconfig.CloudConfig) { | |||
val.ResourceID = os.Getenv("HCP_RESOURCE_ID") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you will need to use LookupEnv
here similar to what the SDK uses or else this will set an empty string if it is null. Previous iterations of the SDK also checked for null rather than empty string and would fail. So, probably best to only set this when the env var is actually configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The val
variable here is already a zero-value of hcp.CloudConfig
because of the named return, so if HCP_RESOURCE_ID
is not set then it stays empty. The SDK is checking that it's set first because that function is mutating potentially non-empty config.
I updated this to make the val
variable declaration clearer. This isn't really a case to use named returns: style guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks great! I like the combined flow and the tests!
I left a couple nits and I have one question: When a cluster with consul 1.14 with CCM integration is upgraded to Consul 1.15, will it rebootstrap because of the missing sucess marker?
agent/hcp/bootstrap/bootstrap.go
Outdated
return os.WriteFile(name, []byte(token), 0600) | ||
} | ||
|
||
func persistBootstrapConfig(dataDir, cfgJSON string) error { | ||
func persistBootstrapConfig(dir, cfgJSON string) error { | ||
if cfgJSON == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: to further remove differences between new and existing cluster flow, an empty config {}
could be stored here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we send that empty JSON string down from HCP? It would simplify some of the conditional logic in persistAndProcessConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the last commit, I updated that bit to expect {}
from CCM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea to always send valid json 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! 👍
9b3b0ed
to
97b080b
Compare
22c7a30
to
9a28f9e
Compare
We want to move away from injecting an initial management token into Consul clusters linked to HCP. The reasoning is that by using a separate class of token we can have more flexibility in terms of allowing HCP's token to co-exist with the user's management token. Down the line we can also more easily adjust the permissions attached to HCP's token to limit it's scope. With these changes, the cloud management token is like the initial management token in that iit has the same global management policy and if it is created it effectively bootstraps the ACL system.
The HCP management token will now be sent in a special field rather than as Consul's "initial management" token configuration. This commit also updates the mock HCP server to more accurately reflect the behavior of the CCM backend.
We want to allow users to link Consul clusters that already exist to HCP. Existing clusters need care when bootstrapped by HCP, since we do not want to do things like change ACL/TLS settings for a running cluster. Additional changes: * Deconstruct MaybeBootstrap so that it can be tested. The HCP Go SDK requires HTTPS to fetch a token from the Auth URL, even if the backend server is mocked. By pulling the hcp.Client creation out we can modify its TLS configuration in tests while keeping the secure behavior in production code. * Add light validation for data received/loaded. * Sanitize initial_management token from received config, since HCP will only ever use the CloudConfig.MangementToken.
9a28f9e
to
f798be1
Compare
Replaces #16788
Description
We want to allow users to link Consul clusters that already exist to
HCP. Existing clusters need care when bootstrapped by HCP, since we do
not want to do things like change ACL/TLS settings for a running
cluster.
The plan is to only send down a management token to existing clusters
and nothing else (at least initially).
This PR adds a few commits to that end:
through Raft as we do for the initial management token.
clusters that are going to be supported.
Best reviewed by commit
Testing & Reproduction steps
PR Checklist
TODO