Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap worker nodes using kops-controller #9653

Merged
merged 13 commits into from
Aug 17, 2020

Conversation

johngmyers
Copy link
Member

@johngmyers johngmyers commented Jul 31, 2020

The idea is for nodeup to generate private keys, send the public keys to kops-controller, and have kops-controller issue the necessary certificates.

This is far enough along to prove the concept. Posting to get early review (especially from @justinsb) while I plumb in the remaining pieces.

On AWS for Kubernetes 1.19 and later, worker nodes bootstrap by generating a private key and authenticating with a cloud-provider-specific mechanism to kops-controller. Kops-controller then verifies the authentication and issues a node-specific kubelet certificate.

This new bootstrap mechanism replaces both the mechanisms of reading a cluster-wide kubelet cert from the state store and node-authorizer. There is no option to use either of these previous mechanisms on Kubernetes 1.19+ in AWS.

Additionally, enables Kubernetes Node authorization for Kubernetes 1.19 and later, on all cloud providers.

Cribs from #8580. The authentication mechanism for AWS is based on that used by Vault. The code is structured to allow other cloud providers to provide their own authentication mechanisms.

Followup PRs will switch other worker node certs to use this mechanism.

Closes #1231

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 31, 2020
@k8s-ci-robot k8s-ci-robot added area/addons area/api approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kops-controller area/nodeup area/provider/aws Issues or PRs related to aws provider labels Jul 31, 2020
@johngmyers
Copy link
Member Author

/approve cancel

@k8s-ci-robot k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 31, 2020
@johngmyers johngmyers force-pushed the kops-controller-server branch 2 times, most recently from c278b40 to d37e4ae Compare July 31, 2020 04:32
var err error
switch opt.Server.Provider {
case kops.CloudProviderAWS:
verifier, err = awsup.NewAWSVerifier()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a node identifier in kops-controller already. We do also have the node verifier in node-authorizer.

Not a blocker, just that we can probably look at that code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verifier needs to authenticate the request data as having come from an instance that was created using the spec of one of the cluster's instancegroups. Bonus points for authenticating the particular instance group or even the particular instance.

node-authorizer makes two checks, neither of which are sufficient, even in combination. First, it verifies an instance identity document provided by the instance. This document is long-lived and replayable: it is not tied to any data in the rest of the request. The only thing tying the data in the request to the instance is a check that the source IP address of the incoming request matches that of the instance, which is not a particularly strong mechanism.

srv, err := server.NewServer(&opt, verifier)
if err != nil {
setupLog.Error(err, "unable to create server")
os.Exit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been finding recently that I've been rewriting the kubebuilder main to call into a func run() error function immediately, just to avoid having to log & exit every time!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structured logging, such as in the handling of addNodeController, makes it tricky to use that pattern.

cmd/kops-controller/pkg/server/server.go Show resolved Hide resolved
nodeup/pkg/model/bootstrap_client.go Outdated Show resolved Hide resolved
nodeup/pkg/model/bootstrap_client.go Show resolved Hide resolved
}
token = strings.TrimPrefix(token, AWSAuthenticationTokenPrefix)

// We rely on the client and server using the same version of the same STS library.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead send the body of the STS request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vault does that, but they have to support arbitrary clients. I took the ability to require the client to use the same version of the AWS library to reduce the risk of allowing attackers to use kops-controller as a pass-through proxy to STS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - interesting point. Does this introduce a challenge for upgrading the AWS SDK though? If there's master vs node skew then I am worried that either old-version nodes or new nodes won't be able to join?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think supporting version skew was a requirement. Old nodes would only be an issue if they were created before the ASG's spec was updated yet didn't call kops-controller until after the first master was updated. The old node would be cleaned out by rolling update (though the cluster validation failure would impede rolling update from starting). New nodes would fail only until the first master is updated.

I do have an idea of making rolling update ignore non-ready nodes that are in nodes-role instance groups other than the one being updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the scenario we be most mindful of is the master being ahead of the nodes. i.e. we update the AWS SDK and somehow the body changes, we roll that out for the control-plane, but the nodes aren't yet updated. As you point out, old node configurations would fail to join in this scenario. Kubernetes itself tries very hard to ensure that the masters can be a minor version ahead of the nodes (indefinitely); we don't have to support that but we should be aware of this.

Do we have an example of what's in the body? How likely is it to change?

Copy link
Member Author

@johngmyers johngmyers Aug 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time we run kops update cluster --yes (or terraform apply) the node ASGs will be updated. All new instances created after that point will be on the new version of the library. Similarly, all nodes that have joined by the time the first master is updated will continue to be in the cluster. The only issue would be a worker instance that is created before the kops update cluster --yes but takes longer to join the cluster than the time it takes for the first control plane node to get updated and apply the new kops-controller manifest.

Copy link
Member Author

@johngmyers johngmyers Aug 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body usually has QWN0aW9uPUdldENhbGxlcklkZW50aXR5JlZlcnNpb249MjAxMS0wNi0xNQ== which is the base64 encoding of Action=GetCallerIdentity&Version=2011-06-15. It doesn't seem likely to change often.

The other thing we're relying on being the same from client to server is the request URL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that sounds reasonable. We could create a unit test to check that the body is this value; if the unit test ever failed we could deal with it (e.g constructing all allowed forms and seeing if any of them match the sha), but as you point out it seems unlikely.

return "", fmt.Errorf("incorrect content-length")
}

// TODO - implement retry?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect our client needs to retry anyway, so there might not be much reason for us to retry here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodeup will retry, it's just a question of how long transient errors delay node startup.

if resp.StatusCode != http.StatusOK {
detail := ""
if resp.Body != nil {
scanner := bufio.NewScanner(resp.Body)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to just ioutil.ReadAll ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want the first line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK; not a problem. Unless there's something that we actively don't want to log, I'd personally just log the whole thing - little cost, potential benefit for debugging. But your call.

@justinsb
Copy link
Member

Really great stuff. Mostly minor feedback.

Topics that may be more interesting:

  • Can we / should we reuse any of the existing validation code
  • GRPC vs REST
  • Should we make the request more extensible, with a flexible "evidence" section that we basically throw lots of fields into. This is the approach taken by node-authorizer, which supports some interesting verification policies. But generally it might just be easier anyway!

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 31, 2020
@johngmyers johngmyers force-pushed the kops-controller-server branch from d37e4ae to 5bd91e4 Compare August 1, 2020 05:24
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 1, 2020
@johngmyers
Copy link
Member Author

  • aws_verifier might want to verify the received arn's partition matches that of its own identity.
  • aws_verifier assumes the arn it receives is for an assumed-role. It does in my (limited) testing, but in case there's some region or situation where it isn't we might want to have an escape hatch to the ten-year-kubelet-cert mechanism.
  • I haven't tested non-nil values of Spec.IAM.Profile. I doubt we have coverage of that feature. What is the use case for that feature? Where should we document that anything sharing such such a profile would be able to register as a node?
  • I think we should remove support for node-authorizer in k8s 1.19+ for the non-AWS cloud providers as well.

@johngmyers
Copy link
Member Author

I wonder if now there's a circular dependency for most CNIs now, with CNIs depending on a worker node joining and worker node joining depending on the CNI.
/retest

@olemarkus
Copy link
Member

I don't think so. If you install a CNI manually, you wait for the masters to be available, but NOT READY, and then apply the CNI spec. The worker nodes join once the masters are ready. Just looking quickly at this, it looks like dns-controller is not updating the DNS and dns-controller uses host network.

@johngmyers
Copy link
Member Author

/retest

1 similar comment
@johngmyers
Copy link
Member Author

/retest

cmd/kops-controller/pkg/server/server.go Outdated Show resolved Hide resolved
nodeup/pkg/model/kops_controller.go Show resolved Hide resolved
// APIVersion defines the versioned schema of this representation of a request.
APIVersion string `json:"apiVersion"`
// Certs are the requested certificates and their respective public keys.
Certs map[string]string `json:"certs"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to make this Certs map[string]*CertRequest or Certs []*CertRequest just to give you more extensibility

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on whether we need to support bootstrapping nodes created with a different version, which I discuss below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you post this discussion? I'm not seeing it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the block of comments on aws_verifier.go line 129.

// BootstrapRespose is a response to a BootstrapRequest.
type BootstrapResponse struct {
// Certs are the issued certificates.
Certs map[string]string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here - map[string]*Cert doesn't cost much now, might avoid version contortions later

if resp.StatusCode != http.StatusOK {
detail := ""
if resp.Body != nil {
scanner := bufio.NewScanner(resp.Body)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK; not a problem. Unless there's something that we actively don't want to log, I'd personally just log the whole thing - little cost, potential benefit for debugging. But your call.

}
token = strings.TrimPrefix(token, AWSAuthenticationTokenPrefix)

// We rely on the client and server using the same version of the same STS library.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - interesting point. Does this introduce a challenge for upgrading the AWS SDK though? If there's master vs node skew then I am worried that either old-version nodes or new nodes won't be able to join?

upup/pkg/fi/authenticate.go Outdated Show resolved Hide resolved
}
httpReq.Header.Set("Content-Type", "application/json")

token, err := b.Authenticator.CreateToken(reqBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining the signature challenge. I do understand, but ... can we implement this on GCE / OpenStack / Bare-Metal? I know it's not as simple, but could we do something like this:

type SignedBootstrapRequest struct {
  Request *BootstrapRequest `json:"request"` 
  AWSSecureTokenService *AWSSecureTokenService `json:"awsSecureTokenService"`
  // GCESignature *GCESignature
}

type AWSSecureTokenService struct {
  Headers map[string]string `json:"headers"`
}

It feels more extensible (for other clouds, possible for AWS also)

Copy link
Member Author

@johngmyers johngmyers Aug 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this interface is implementable by other clouds, assuming those clouds have some way to authenticate an instance. The interface I defined seems to me to have been reduced to the essentials: the provider-specific client implementation gets the serialized body and produces an authenticator. The common code transfers the authenticator in the Authorization: header. The provider-specific server implementation gets the serialized body and the authenticator and gets to say what the node name is or reject the request.

I'm presuming that multi-provider clusters are well out of scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that multi-provider clusters should be out of scope. Even if we have a central management cluster, I think we would want node bring-up to be done relatively locally. I could imagine forwarding the request to another service, but I imagine that would essentially be wrapping the current request anyway.

I can see the authorization header as isomorphic to what I described - we're encoding the signature data (with the body) into json, base64 encoding it, and then putting that in the authorization header. I'm really not sure if that is simpler, but it's likely tied up in the skew problem ... I'll ponder!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the token in the header simplifies the task of getting the signature to include the request, since you can then just take a secure hash of the entire body. Putting it into the Authorization: header means anything in the middle is likely to treat it as confidential.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK to proceed on this basis; if we do find that on another cloud it's much trickier to put it into the header (e.g. GCE's TPM flow) then we might have to put it into the body for that cloud - is that reasonable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's reasonable. If that were to happen, we'd probably do something like change the interfaces to return a modified body.

@justinsb
Copy link
Member

justinsb commented Aug 8, 2020

So I like this PR a lot! My only reservation is around making the requests more structured, to allow for evolution and support for other clouds.

re deprecating node-authorizer, let's get this going first, then check with users!

@johngmyers johngmyers force-pushed the kops-controller-server branch from aef5c86 to 744edaf Compare August 15, 2020 16:50
@johngmyers johngmyers force-pushed the kops-controller-server branch from 744edaf to 1a253dc Compare August 15, 2020 17:30
@justinsb
Copy link
Member

/retest

One prow networking failure before we even touch our code, the other test looks like it actually passed

@justinsb
Copy link
Member

This LGTM ... I think we agreed to merge and aim for 1.19 (instead of 1.20), so...

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 16, 2020
@justinsb
Copy link
Member

/test pull-kops-e2e-cni-cilium

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/addons area/api area/kops-controller area/nodeup area/provider/aws Issues or PRs related to aws provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update our kubelet (and other) certs for RBAC
6 participants