Replace tectonic-network-operator with cluster-network-operator #600

squeed · 2018-11-02T18:40:17Z

This removes tectonic-network-operator and replaces it with the cluster-network-operator.

This won't work quite yet, because there is a bug in origin. But how we generate the configuration is up for review.

Network configuration is a funny beast. We basically can't change it once the cluster is up. However, there is a distinct "happy path" along with a large number of possible settings tweaks. So, right now this generates a sane network configuration from the installer network configuration.

We probably want to figure out a better way to do that at some point; the installer config doesn't capture even the full set of IP address options.

pkg/asset/manifests/network-operator.go

Gopkg.toml

pkg/asset/manifests/network-operator.go

squeed · 2018-11-05T17:35:03Z

Okay, refactored a bit. This change is now somewhat broader in scope.

I deprecated the badly-named podCIDR variable in the install-config, and added a correct multiple-cluster-cidr variable. We do need to think if we want to expose the complete network-operator API directly in the install-config, or if we want to just support the happy-path and leave the rest to customization.

Another open question is what the user should do when they change their cidrs. We actually will support expanding the service and cluster ip ranges. These changes will have to be propagated to their various consumers.

pkg/asset/manifests/cluster_k8s_io.go

abhinavdahiya · 2018-11-05T17:45:54Z

I deprecated the badly-named podCIDR variable in the install-config, and added a correct multiple-cluster-cidr variable. We do need to think if we want to expose the complete network-operator API directly in the install-config, or if we want to just support the happy-path and leave the rest to customization.

InstallConfig is supposed to provide very simple options. Why expose []ClusterCIDR ie list of these and if the NetworkOperator supports expanding these in running cluster allow users to give only one and customize later on.

abhinavdahiya

X

squeed · 2018-11-05T18:15:32Z

InstallConfig is supposed to provide very simple options. Why expose []ClusterCIDR ie list of these and if the NetworkOperator supports expanding these in running cluster allow users to give only one and customize later on.

Oh, it's so much muddier than that.

First of all, the network-operator currently doesn't support rolling out any changes, and it probably won't for a while. So we can't support that. Also, we won't ever support changes for non-Openshift-maintained network plugins. I'd also just like to avoid network changes as much as possible. They're always risky, even when done correctly.

Right now, the address configurations are read in to the installer, then split out in to the install-config, NetworkConfig.networkoperator.openshift.io, and Cluster.cluster.k8s.io. Too many things, right now, read from install-config, including the kube-apiserver-operator. Right now things are fine because IPs are immutable.

In the future, downstream consumers stop reading install-config and just use Cluster.cluster.k8s.io for addressing information - and we can just manage that object in the network-operator. The Cluster object isn't exclusively the domain of the network-operator, though, so we need to be somewhat subtle.

Even when we reach that ideal state, it's not clear how we should plumb through IP space configuration in the installer. Since IP blocks are general-case immutable, users do need to choose them correctly (or else!), so we need it to be part of even the happy-path configuration. However, any subsequent changes will presumably be made to the NetworkConfig object directly, leaving the install-config object in cluster out of date. Is that OK? Is there a way to give the installer IP configuration without writing it to the cluster's install-config?

squeed · 2018-11-06T13:53:18Z

Filed openshift/cluster-kube-apiserver-operator#106 to stop reading cidrs from the install-config. This is not a blocker.

squeed · 2018-11-06T13:55:38Z

I had to refactor this slightly one more time: the network-operator was setting KUBERNETES_SVC_HOST to 127.0.0.1, but that doesn't work because the bootstrap apiserver is still active. So I now have the installer creating an additional config map.

The alternative would be to special-casing the network-operator and running it on the bootstrap node. That seems uglier.

squeed · 2018-11-06T17:51:24Z

A few more things to work out:

Can the network-operator run in runlevel 1? I don't think so, but would love to be wrong
If not, is there a better way to pass the apiserver URL?

enxebre · 2018-11-06T18:12:43Z

pkg/asset/manifests/cluster_k8s_io.go

+		return errors.Wrapf(err, "Could not generate ClusterNetworkingConfig")
+	}
+
+	cluster := clusterv1a1.Cluster{


To add some context: currently we need to create the cluster object because the actuator interface that we use for creating machines expects it to exist. However we don’t rely on it at all and we aim to convince upstream to have a clear separation between "machine api" and "cluster api" and decouple it from the actuator interface.
Eventually we might want to consider and approach that progressively moves the installer definition/infra into a cluster actuator driven by that object so this might be a kind of an starting point but that's way far from scope and we need to be aware that the “cluster api” is effectively a “machine api” atm for us

Also I'm thinking this might be relevant for a 3.11 to 4.0 upgrade

Gotcha. This is a bit of a bigger topic, then. Other operators need a way to discover the servicecidrs and clustercidrs for their own configuration. It's ugly to have every operator parsing out the install configuration. Let's meet up and talk about this. Note that pulling clustercidr / servicecidr from installconfig is a big problem for the apiserver-operator, see openshift/cluster-kube-apiserver-operator#/105

pkg/asset/manifests/network.go

knobunc · 2018-11-09T18:32:11Z

@lucab do you know why there is any relabeling going on? It sounds wrong for the pod to be relabeling the host filesystem. Or am I misunderstanding what the problem is. Thanks!

* Generate cluster-network-operator config from install-config * Refactor install-config to better reflect network config * remove tectonic-network-operator * Remove temporary kube-proxy and cvo override

This is to break an import loop between pkg/assets/machines and pkg/assets/manifests.

knobunc · 2018-11-12T12:03:15Z

@squeed what if we had an init container run the loop to test?

squeed · 2018-11-12T13:20:46Z

Getting closer. In the bad runs, the container processes are started with context container_t. In the good runs, they start with spc_t. Not sure why this would change between executions. Thoughts, @mrunalp?

knobunc · 2018-11-12T13:28:42Z

@rhatdan Do you have any idea? Thanks.

squeed · 2018-11-12T13:38:00Z

@lucab suggests removing runAsUser: 0, which I'm trying now.

squeed · 2018-11-12T16:51:13Z

Just waiting for the network operator changes to be picked up by the release, then we can retest.

abhinavdahiya · 2018-11-12T17:53:37Z

/retest

squeed · 2018-11-12T18:25:55Z

That seems to have been it.

/hold cancel

squeed · 2018-11-12T18:34:36Z

All green. Just needs a lgtm.

abhinavdahiya · 2018-11-12T18:44:53Z

The PR is green and changes look fine. We can iterate on this. :yay:

/lgtm

openshift-ci-robot · 2018-11-12T18:45:02Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, squeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [abhinavdahiya]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rhatdan · 2018-11-12T22:01:43Z

Might have discovered a bug though. If RunAsUser was causing privileged to be ingored?

openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 2, 2018

openshift-ci-robot requested review from smarterclayton and wking November 2, 2018 18:40

abhinavdahiya reviewed Nov 2, 2018

View reviewed changes

pkg/asset/manifests/network-operator.go Outdated Show resolved Hide resolved

crawford reviewed Nov 2, 2018

View reviewed changes

Gopkg.toml Show resolved Hide resolved

wking reviewed Nov 2, 2018

View reviewed changes

pkg/asset/manifests/network-operator.go Outdated Show resolved Hide resolved

wking reviewed Nov 2, 2018

View reviewed changes

pkg/asset/manifests/network-operator.go Outdated Show resolved Hide resolved

squeed force-pushed the network-operator branch from 54e15b8 to be09f80 Compare November 5, 2018 17:31

abhinavdahiya reviewed Nov 5, 2018

View reviewed changes

pkg/asset/manifests/cluster_k8s_io.go Show resolved Hide resolved

squeed force-pushed the network-operator branch from be09f80 to 44c8d2f Compare November 5, 2018 17:40

abhinavdahiya reviewed Nov 5, 2018

View reviewed changes

squeed force-pushed the network-operator branch 2 times, most recently from ee2ed91 to c0d51ec Compare November 5, 2018 20:20

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2018

squeed force-pushed the network-operator branch from c0d51ec to e9280b0 Compare November 6, 2018 13:44

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2018

squeed changed the title ~~[wip] Replace tectonic-network-operator with cluster-network-operator~~ Replace tectonic-network-operator with cluster-network-operator Nov 6, 2018

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 6, 2018

squeed force-pushed the network-operator branch from e9280b0 to 37b8aca Compare November 6, 2018 14:01

squeed changed the title ~~Replace tectonic-network-operator with cluster-network-operator~~ [wip] Replace tectonic-network-operator with cluster-network-operator Nov 6, 2018

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 6, 2018

enxebre reviewed Nov 6, 2018

View reviewed changes

abhinavdahiya reviewed Nov 6, 2018

View reviewed changes

pkg/asset/manifests/network.go Outdated Show resolved Hide resolved

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 10, 2018

wking mentioned this pull request Nov 10, 2018

libvirt: apiserver, controller-manager, package server pods shows CrashLoopBackOff #484

Closed

squeed force-pushed the network-operator branch from 85e39ad to 99f3229 Compare November 12, 2018 09:51

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 12, 2018

squeed added 3 commits November 12, 2018 10:56

replace tectonic-network-operator with cluster-network-operator

31b78ea

* Generate cluster-network-operator config from install-config * Refactor install-config to better reflect network config * remove tectonic-network-operator * Remove temporary kube-proxy and cvo override

pkg/manifests: move cluster_k8s_io to manifests from machines.

687ff31

This is to break an import loop between pkg/assets/machines and pkg/assets/manifests.

dep ensure

94ca64a

squeed force-pushed the network-operator branch from 99f3229 to 94ca64a Compare November 12, 2018 18:01

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 12, 2018

openshift-ci-robot assigned abhinavdahiya Nov 12, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 12, 2018

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 12, 2018

openshift-merge-robot merged commit 54f341c into openshift:master Nov 12, 2018

This was referenced Nov 12, 2018

pkg/types: Push platform-specific types (AWS, etc.) into subdirs #657

Merged

scripts for nested-libvirt ci #625

Merged

This was referenced Nov 13, 2018

Node never becomes Ready openshift/cluster-network-operator#32

Closed

pkg/destroy/libvirt: Use prefix-based deletion #660

Merged

Inconsistent SELinux context for privileged pods cri-o/cri-o#1904

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace tectonic-network-operator with cluster-network-operator #600

Replace tectonic-network-operator with cluster-network-operator #600

squeed commented Nov 2, 2018

squeed commented Nov 5, 2018

abhinavdahiya commented Nov 5, 2018

abhinavdahiya left a comment •

edited

Loading

squeed commented Nov 5, 2018 •

edited

Loading

squeed commented Nov 6, 2018 •

edited

Loading

squeed commented Nov 6, 2018

squeed commented Nov 6, 2018

enxebre Nov 6, 2018

enxebre Nov 7, 2018

squeed Nov 7, 2018

knobunc commented Nov 9, 2018

knobunc commented Nov 12, 2018

squeed commented Nov 12, 2018

knobunc commented Nov 12, 2018

squeed commented Nov 12, 2018

squeed commented Nov 12, 2018

abhinavdahiya commented Nov 12, 2018

squeed commented Nov 12, 2018

squeed commented Nov 12, 2018

abhinavdahiya commented Nov 12, 2018

openshift-ci-robot commented Nov 12, 2018

rhatdan commented Nov 12, 2018

Replace tectonic-network-operator with cluster-network-operator #600

Replace tectonic-network-operator with cluster-network-operator #600

Conversation

squeed commented Nov 2, 2018

squeed commented Nov 5, 2018

abhinavdahiya commented Nov 5, 2018

abhinavdahiya left a comment • edited Loading

Choose a reason for hiding this comment

squeed commented Nov 5, 2018 • edited Loading

squeed commented Nov 6, 2018 • edited Loading

squeed commented Nov 6, 2018

squeed commented Nov 6, 2018

enxebre Nov 6, 2018

Choose a reason for hiding this comment

enxebre Nov 7, 2018

Choose a reason for hiding this comment

squeed Nov 7, 2018

Choose a reason for hiding this comment

knobunc commented Nov 9, 2018

knobunc commented Nov 12, 2018

squeed commented Nov 12, 2018

knobunc commented Nov 12, 2018

squeed commented Nov 12, 2018

squeed commented Nov 12, 2018

abhinavdahiya commented Nov 12, 2018

squeed commented Nov 12, 2018

squeed commented Nov 12, 2018

abhinavdahiya commented Nov 12, 2018

openshift-ci-robot commented Nov 12, 2018

rhatdan commented Nov 12, 2018

abhinavdahiya left a comment •

edited

Loading

squeed commented Nov 5, 2018 •

edited

Loading

squeed commented Nov 6, 2018 •

edited

Loading