Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write Kubelet flags as an option on openshift start node to support moving directly to kubelet #18322

Merged
merged 2 commits into from
Feb 2, 2018

Conversation

smarterclayton
Copy link
Contributor

Instead of having openshift start node bootstrap, prepare to move to directly invoking the kubelet by having a flag on openshift start node called --write-flags that generates the arguments to invoke the kubelet for a given --config. Instead of calling openshift start node to do bootstrapping, we'd instead invoke the --write-flags path and call the kubelet directly. The generated node-config on the system would have the correct bootstrap settings.

Would require us to move to dynamic config in the kubelet or to add a secondary loop to pull down the latest kube-config. That's probably acceptable.

Also contains a set of changes that allow certificate rotation to happen completely in the background, rather than blocking the kubelet startup. This allows us to keep bootstrapping the node from the master, but to still launch static pods in the bacgkround (right now we can't launch static pods while bootstrapping because bootstrapping is happening before the kubelet pod sync loop runs). In this model, master containers as static pods will not require any node changes to make work (so master nodes wouldn't be different from other nodes). I'm going to clean this up and propose upstream.

Note that this path would not require --runonce mode, which is very good because it's effectively unsupported.

@deads2k we're block on static pod for kubelet until we sort out the path forward. I don't want to have two separate types of node config, and I think this is probably the best position in the long run (all nodes bootstrap and have static pod config, nodes background loop waiting for bootstrapping and reject requests that require client/server connections until bootstrapping completes).

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 28, 2018
@openshift-merge-robot openshift-merge-robot added the vendor-update Touching vendor dir or related files label Jan 28, 2018
// an empty string, `--enforce-node-allocatable=""` needs to be explicitly set
// cgroups-per-qos defaults to true
if cgroupArg, enforceAllocatable := args["cgroups-per-qos"], args["enforce-node-allocatable"]; len(cgroupArg) == 1 && cgroupArg[0] == "false" && len(enforceAllocatable) == 0 {
args["enforce-node-allocatable"] = []string{""}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the same thing. Empty string fails validiation. Empty value works. I tried this before and it failed the networking test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, --enforce-node-allocatable= fails but --enforce-node-allocatable="" succeeds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I recall it. --enforce-node-allocatable= worked, --enforce-node-allocatable="" failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, both seem to work for me right now. testing more

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generating --enforce-node-allocatable= when i test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generating --enforce-node-allocatable= when i test

Oh, that's a happy accident.

@@ -52,6 +52,10 @@ type NodeArgs struct {
// Components is the set of enabled components.
Components *utilflags.ComponentFlag

// WriteFlagsOnly will print flags to run the Kubelet from the provided arguments rather than launching
// the Kubelet itself.
WriteFlagsOnly bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdodson I knew it was only a matter of time....

@@ -256,7 +267,7 @@ func (o NodeOptions) RunNode() error {
return nil
}

if err := StartNode(*nodeConfig, o.NodeArgs.Components); err != nil {
if err := StartNode(*nodeConfig, o.NodeArgs.Components, o.NodeArgs.WriteFlagsOnly); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than passing an "behave differently flag" to StartNode, you can detect and print out the options from a different method right above this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}

// convert current settings to flags
args := append([]string{kubeletPath}, kubeletArgs...)
for i := glog.Level(10); i > 0; i-- {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this actually doing? Why is it doing it? I couldn't sort it out and a few lines down on the --vmodule the sprintf looks broken anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glog doesn't let you get the glog value directly, you have to test each level on your own.

@deads2k
Copy link
Contributor

deads2k commented Jan 29, 2018

If I'm reading the intent correctly, this change allows the kubelet to start up with failing transports to the kube-apiserver which will later start working. That update flow has to work anyway in order to keep the certificates up to date, BUT it's less obvious to me how the kubelet gets its configuration before the sync loop starts.

Without a valid connection to the master, the kublet won't have a complete configuration unless the configuration exists on disk before starting the process. Are you counting on the dynamic configuration to work correctly in order for this to function?

@enj
Copy link
Contributor

enj commented Jan 29, 2018

@openshift/sig-security

@smarterclayton
Copy link
Contributor Author

That update flow has to work anyway in order to keep the certificates up to date, BUT it's less obvious to me how the kubelet gets its configuration before the sync loop starts.

The installer needs to write out a minimal config. That was already a requirement in practice for flags like root-dir and the container runtime settings, as well as cgroup management. Those settings are coupled to the node install, and are not dynamic. Discussed in person more, capturing here.

@smarterclayton
Copy link
Contributor Author

Updated with all feedback

@@ -42,7 +43,9 @@ func ValidateInClusterNodeConfig(config *api.NodeConfig, fldPath *field.Path) Va
validationResults.AddErrors(ValidateHostPort(config.DNSBindAddress, fldPath.Child("dnsBindAddress"))...)
}
if len(config.DNSIP) > 0 {
validationResults.AddErrors(ValidateSpecifiedIP(config.DNSIP, fldPath.Child("dnsIP"))...)
if !hasBootstrapConfig || config.DNSIP != "0.0.0.0" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

demorgans please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think that's any easier to read. parenthesis are more confusing.

@@ -224,6 +235,17 @@ func (o NodeOptions) RunNode() error {
if addr := o.NodeArgs.ListenArg.ListenAddr; addr.Provided {
nodeConfig.ServingInfo.BindAddress = addr.HostPort(o.NodeArgs.ListenArg.ListenAddr.DefaultPort)
}
// do a local resolution of node config DNS IP, supports bootstrapping cases
if nodeConfig.DNSIP == "0.0.0.0" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is not a listen address, it is a connection address? Using 0.0.0.0 to do defaulting feels wrong. ***autodetect*** is an illegal value for a hostname or IP that will never be confused as something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.0.0.0 was never allowed before. We changed it in 3.7 to mean "the local address". It has to be an IP because of DNS. We can't change the value that means autodetect because 3.7 clusters are using 0.0.0.0

}
args["cluster-dns"] = getClusterDNS(externalKubeClient, args["cluster-dns"])

return args, nil
}

func KubeletArgsMapToArgs(argsMap map[string][]string) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any need for this to still be a separate method or can we tidy it up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, can collapse it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually used in two places, but I'm going to move it to the call site

@deads2k
Copy link
Contributor

deads2k commented Feb 1, 2018

0.0.0.0 meaning: "choose a default for me" rubs me the wrong way.

@smarterclayton
Copy link
Contributor Author

Ship sailed in 3.7 on that. Also, dnsmasq uses roughly that, so it's a DNS-ism

We weren't actually failing kubelet start, so waiting made no sense.
Instead, we always start. Also, if the last rotated cert has expired,
fallback to the bootstrap certificate info.
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@smarterclayton smarterclayton changed the title Speculative set of changes around next iteration of bootstrapping Write Kubelet flags as an option on openshift start node to support moving directly to kubelet Feb 1, 2018
// WriteKubeletFlags writes the correct set of flags to start a Kubelet from the provided node config to
// stdout, instead of launching anything.
func WriteKubeletFlags(nodeConfig configapi.NodeConfig) error {
kubeletFlagsAsMap, err := nodeoptions.ComputeKubeletFlagsAsMap(nodeConfig.KubeletArguments, nodeConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant: couldn't this just return the string slice of args? It's the first thing you do at both call sites

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.

@deads2k
Copy link
Contributor

deads2k commented Feb 1, 2018

one nit.

lgtm

This acts like --write-config, but instead outputs the flags that the
specified --config file would pass to the Kubelet. This prepares us to
stop calling openshift start node and instead direct invoke the Kubelet.

Remove the unsupported kubelet binary code because we are no longer
calling ourself. Also move a small chunk of flag specialization code
into a more appropriate place.
@smarterclayton
Copy link
Contributor Author

Fixed. Labelling

@smarterclayton smarterclayton added the lgtm Indicates that a PR is ready to be merged. label Feb 1, 2018
@smarterclayton
Copy link
Contributor Author

smarterclayton commented Feb 2, 2018 via email

2 similar comments
@smarterclayton
Copy link
Contributor Author

/retest

@smarterclayton
Copy link
Contributor Author

/retest

@smarterclayton
Copy link
Contributor Author

shakes fist at networking flake

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@smarterclayton
Copy link
Contributor Author

/retest

1 similar comment
@smarterclayton
Copy link
Contributor Author

/retest

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue.

@openshift-merge-robot openshift-merge-robot merged commit a5d2ee0 into openshift:master Feb 2, 2018
@@ -141,7 +141,6 @@ type manager struct {
certStore Store
certAccessLock sync.RWMutex
cert *tls.Certificate
rotationDeadline time.Time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton The changes in certificate_manager.go/certificate_manager_test.go look unrelated to introducing a new option. Are they really have to be a part of that PR or it was unintentionally?

@deads2k
Copy link
Contributor

deads2k commented Feb 5, 2018

@smarterclayton The changes in certificate_manager.go/certificate_manager_test.go look unrelated to introducing a new option. Are they really have to be a part of that PR or it was unintentionally?

Related. The rotation change was needed to allow starting a kubelet from flags without blocking in some conditions. You only need some of a config.

@php-coder
Copy link
Contributor

@deads2k Oh, I see now that it was mentioned in the PR description. My bad, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. sig/security size/L Denotes a PR that changes 100-499 lines, ignoring generated files. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants