-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap worker nodes using kops-controller #9653
Bootstrap worker nodes using kops-controller #9653
Conversation
/approve cancel |
c278b40
to
d37e4ae
Compare
cmd/kops-controller/main.go
Outdated
var err error | ||
switch opt.Server.Provider { | ||
case kops.CloudProviderAWS: | ||
verifier, err = awsup.NewAWSVerifier() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have a node identifier in kops-controller already. We do also have the node verifier in node-authorizer
.
Not a blocker, just that we can probably look at that code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The verifier needs to authenticate the request data as having come from an instance that was created using the spec of one of the cluster's instancegroups. Bonus points for authenticating the particular instance group or even the particular instance.
node-authorizer
makes two checks, neither of which are sufficient, even in combination. First, it verifies an instance identity document provided by the instance. This document is long-lived and replayable: it is not tied to any data in the rest of the request. The only thing tying the data in the request to the instance is a check that the source IP address of the incoming request matches that of the instance, which is not a particularly strong mechanism.
srv, err := server.NewServer(&opt, verifier) | ||
if err != nil { | ||
setupLog.Error(err, "unable to create server") | ||
os.Exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been finding recently that I've been rewriting the kubebuilder main to call into a func run() error
function immediately, just to avoid having to log & exit every time!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structured logging, such as in the handling of addNodeController
, makes it tricky to use that pattern.
} | ||
token = strings.TrimPrefix(token, AWSAuthenticationTokenPrefix) | ||
|
||
// We rely on the client and server using the same version of the same STS library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead send the body of the STS request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vault does that, but they have to support arbitrary clients. I took the ability to require the client to use the same version of the AWS library to reduce the risk of allowing attackers to use kops-controller as a pass-through proxy to STS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - interesting point. Does this introduce a challenge for upgrading the AWS SDK though? If there's master vs node skew then I am worried that either old-version nodes or new nodes won't be able to join?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think supporting version skew was a requirement. Old nodes would only be an issue if they were created before the ASG's spec was updated yet didn't call kops-controller
until after the first master was updated. The old node would be cleaned out by rolling update (though the cluster validation failure would impede rolling update from starting). New nodes would fail only until the first master is updated.
I do have an idea of making rolling update ignore non-ready nodes that are in nodes-role instance groups other than the one being updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the scenario we be most mindful of is the master being ahead of the nodes. i.e. we update the AWS SDK and somehow the body changes, we roll that out for the control-plane, but the nodes aren't yet updated. As you point out, old node configurations would fail to join in this scenario. Kubernetes itself tries very hard to ensure that the masters can be a minor version ahead of the nodes (indefinitely); we don't have to support that but we should be aware of this.
Do we have an example of what's in the body? How likely is it to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time we run kops update cluster --yes
(or terraform apply
) the node ASGs will be updated. All new instances created after that point will be on the new version of the library. Similarly, all nodes that have joined by the time the first master is updated will continue to be in the cluster. The only issue would be a worker instance that is created before the kops update cluster --yes
but takes longer to join the cluster than the time it takes for the first control plane node to get updated and apply the new kops-controller manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The body usually has QWN0aW9uPUdldENhbGxlcklkZW50aXR5JlZlcnNpb249MjAxMS0wNi0xNQ==
which is the base64 encoding of Action=GetCallerIdentity&Version=2011-06-15
. It doesn't seem likely to change often.
The other thing we're relying on being the same from client to server is the request URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that sounds reasonable. We could create a unit test to check that the body is this value; if the unit test ever failed we could deal with it (e.g constructing all allowed forms and seeing if any of them match the sha), but as you point out it seems unlikely.
return "", fmt.Errorf("incorrect content-length") | ||
} | ||
|
||
// TODO - implement retry? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect our client needs to retry anyway, so there might not be much reason for us to retry here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nodeup will retry, it's just a question of how long transient errors delay node startup.
if resp.StatusCode != http.StatusOK { | ||
detail := "" | ||
if resp.Body != nil { | ||
scanner := bufio.NewScanner(resp.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to just ioutil.ReadAll
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only want the first line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK; not a problem. Unless there's something that we actively don't want to log, I'd personally just log the whole thing - little cost, potential benefit for debugging. But your call.
Really great stuff. Mostly minor feedback. Topics that may be more interesting:
|
d37e4ae
to
5bd91e4
Compare
|
I wonder if now there's a circular dependency for most CNIs now, with CNIs depending on a worker node joining and worker node joining depending on the CNI. |
I don't think so. If you install a CNI manually, you wait for the masters to be available, but NOT READY, and then apply the CNI spec. The worker nodes join once the masters are ready. Just looking quickly at this, it looks like dns-controller is not updating the DNS and dns-controller uses host network. |
/retest |
1 similar comment
/retest |
// APIVersion defines the versioned schema of this representation of a request. | ||
APIVersion string `json:"apiVersion"` | ||
// Certs are the requested certificates and their respective public keys. | ||
Certs map[string]string `json:"certs"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to make this Certs map[string]*CertRequest
or Certs []*CertRequest
just to give you more extensibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on whether we need to support bootstrapping nodes created with a different version, which I discuss below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you post this discussion? I'm not seeing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the block of comments on aws_verifier.go line 129.
// BootstrapRespose is a response to a BootstrapRequest. | ||
type BootstrapResponse struct { | ||
// Certs are the issued certificates. | ||
Certs map[string]string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here - map[string]*Cert
doesn't cost much now, might avoid version contortions later
if resp.StatusCode != http.StatusOK { | ||
detail := "" | ||
if resp.Body != nil { | ||
scanner := bufio.NewScanner(resp.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK; not a problem. Unless there's something that we actively don't want to log, I'd personally just log the whole thing - little cost, potential benefit for debugging. But your call.
} | ||
token = strings.TrimPrefix(token, AWSAuthenticationTokenPrefix) | ||
|
||
// We rely on the client and server using the same version of the same STS library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - interesting point. Does this introduce a challenge for upgrading the AWS SDK though? If there's master vs node skew then I am worried that either old-version nodes or new nodes won't be able to join?
} | ||
httpReq.Header.Set("Content-Type", "application/json") | ||
|
||
token, err := b.Authenticator.CreateToken(reqBytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining the signature challenge. I do understand, but ... can we implement this on GCE / OpenStack / Bare-Metal? I know it's not as simple, but could we do something like this:
type SignedBootstrapRequest struct {
Request *BootstrapRequest `json:"request"`
AWSSecureTokenService *AWSSecureTokenService `json:"awsSecureTokenService"`
// GCESignature *GCESignature
}
type AWSSecureTokenService struct {
Headers map[string]string `json:"headers"`
}
It feels more extensible (for other clouds, possible for AWS also)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this interface is implementable by other clouds, assuming those clouds have some way to authenticate an instance. The interface I defined seems to me to have been reduced to the essentials: the provider-specific client implementation gets the serialized body and produces an authenticator. The common code transfers the authenticator in the Authorization:
header. The provider-specific server implementation gets the serialized body and the authenticator and gets to say what the node name is or reject the request.
I'm presuming that multi-provider clusters are well out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that multi-provider clusters should be out of scope. Even if we have a central management cluster, I think we would want node bring-up to be done relatively locally. I could imagine forwarding the request to another service, but I imagine that would essentially be wrapping the current request anyway.
I can see the authorization header as isomorphic to what I described - we're encoding the signature data (with the body) into json, base64 encoding it, and then putting that in the authorization header. I'm really not sure if that is simpler, but it's likely tied up in the skew problem ... I'll ponder!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting the token in the header simplifies the task of getting the signature to include the request, since you can then just take a secure hash of the entire body. Putting it into the Authorization:
header means anything in the middle is likely to treat it as confidential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK to proceed on this basis; if we do find that on another cloud it's much trickier to put it into the header (e.g. GCE's TPM flow) then we might have to put it into the body for that cloud - is that reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's reasonable. If that were to happen, we'd probably do something like change the interfaces to return a modified body.
So I like this PR a lot! My only reservation is around making the requests more structured, to allow for evolution and support for other clouds. re deprecating node-authorizer, let's get this going first, then check with users! |
aef5c86
to
744edaf
Compare
744edaf
to
1a253dc
Compare
/retest One prow networking failure before we even touch our code, the other test looks like it actually passed |
This LGTM ... I think we agreed to merge and aim for 1.19 (instead of 1.20), so... /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: justinsb The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kops-e2e-cni-cilium |
/retest Review the full test history for this PR. Silence the bot with an |
1 similar comment
/retest Review the full test history for this PR. Silence the bot with an |
The idea is for nodeup to generate private keys, send the public keys to kops-controller, and have kops-controller issue the necessary certificates.
This is far enough along to prove the concept. Posting to get early review (especially from @justinsb) while I plumb in the remaining pieces.On AWS for Kubernetes 1.19 and later, worker nodes bootstrap by generating a private key and authenticating with a cloud-provider-specific mechanism to kops-controller. Kops-controller then verifies the authentication and issues a node-specific kubelet certificate.
This new bootstrap mechanism replaces both the mechanisms of reading a cluster-wide kubelet cert from the state store and node-authorizer. There is no option to use either of these previous mechanisms on Kubernetes 1.19+ in AWS.
Additionally, enables Kubernetes
Node
authorization for Kubernetes 1.19 and later, on all cloud providers.Cribs from #8580. The authentication mechanism for AWS is based on that used by Vault. The code is structured to allow other cloud providers to provide their own authentication mechanisms.
Followup PRs will switch other worker node certs to use this mechanism.
Closes #1231