Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert cluster.openshift.io/Network in to the NetworkConfig CRD; update Network status #47

Merged
merged 3 commits into from
Jan 15, 2019

Conversation

squeed
Copy link
Contributor

@squeed squeed commented Dec 4, 2018

This adds a second controller that watches the cluster-level network configuration. If changes are made, it merges those in to (or creates) the CRD configuration.

When the changes are applied, the "real" controller then updates the Network status object.

@squeed squeed added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 4, 2018
@squeed squeed requested a review from danwinship December 4, 2018 11:28
@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 4, 2018
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 4, 2018
@squeed
Copy link
Contributor Author

squeed commented Dec 4, 2018

Pending openshift/api#141 merge

@danwinship
Copy link
Contributor

But an admin can directly modify the CRD as well, right? (In fact, they have to, to configure some options). What happens if they modify the CRD to contradict the cluster config?

@squeed
Copy link
Contributor Author

squeed commented Dec 4, 2018

@danwinship yeah, I was thinking about that. I can see two possibilities. Either we say "hey, don't do that!", or we remove those fields from the CRD.

Removing overlapping fields from the CRD seems like the right way to me. I don't expect us to be adding much more in the way of fields to the Cluster object (except maybe Isolation).

I'll update the PR (and simplify most of the controller loop) to do this. We'll have to keep the duplicate fields for now, and roll it out in stages.

FWIW, part of this PR makes it possible to get networking up with just the cluster Network object (by defaulting to NetworkPolicy mode for openshift-sdn.

@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 4, 2019
@squeed
Copy link
Contributor Author

squeed commented Jan 4, 2019

Okay, updated this. Now, if there are any fields that were improperly updated in the "downstream" object, revert those changes when we reconcile.

@squeed squeed removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jan 4, 2019
@squeed squeed changed the title [wip] Convert cluster.openshift.io/Network in to the Network CRD; update Network status Convert cluster.openshift.io/Network in to the NetworkConfig CRD; update Network status Jan 4, 2019
@squeed
Copy link
Contributor Author

squeed commented Jan 7, 2019

Odd, operator logs seem fine.
/retest

@squeed
Copy link
Contributor Author

squeed commented Jan 7, 2019

Hit another AWS quota issue: Jan 07 21:30:41.565 W persistentvolume=pvc-11be8786-12c3-11e9-ab14-0a9830816660 error deleting EBS volume "vol-03b08c117479245b2": "RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: b4ada13a-d246-40a0-9f2e-c95ab2e81a1b"

/retest

@squeed
Copy link
Contributor Author

squeed commented Jan 8, 2019

Flake city.
/retest

@squeed
Copy link
Contributor Author

squeed commented Jan 8, 2019

/retest

@squeed
Copy link
Contributor Author

squeed commented Jan 8, 2019

All green. @danwinship, would you mind reviewing?

pkg/controller/clusterconfig/clusterconfig_controller.go Outdated Show resolved Hide resolved
pkg/controller/clusterconfig/clusterconfig_controller.go Outdated Show resolved Hide resolved
pkg/names/names.go Outdated Show resolved Hide resolved
size, _ := cidr.Mask.Size()
// The comparison is inverted; smaller number is larger block
if cnet.HostPrefix < uint32(size) {
return errors.Errorf("hostPrefix %d is larger than its cidr %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match origin's validation, this should be <= not <. Also it checks that HostSubnetLength >= 2 aka HostPrefix <= 30

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By my reading, it seems to allow a HostPrefix that is the same size as the CIDR? I'm looking here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh... somehow i looked at that same code and got the wrong answer.

pkg/controller/networkconfig/cluster.go Outdated Show resolved Hide resolved
// If there are changes to the "downstream" networkconfig, commit it back
// to the apiserver
log.Println("WARNING: NetworkConfig.networkoperator.openshift.io has fields being overwritten by Network.cluster.openshift.io configuration")
cfg.TypeMeta = metav1.TypeMeta{APIVersion: "networkoperator.openshift.io/v1", Kind: "NetworkConfig"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. There's some kind of bug with the shared cache that is unsetting the Kind, but only on the second run of the reconcile loop.

Sometimes the straightest path is through the mud :-/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this is one of those things where you're mutating an object you got from a cache rather than DeepCopying it and mutating the copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not that, but something similar. The client always copies in to a receiving object, and all the types should be registered with the schema. Just confusing.

pkg/controller/networkconfig/cluster.go Outdated Show resolved Hide resolved
pkg/controller/networkconfig/networkconfig_controller.go Outdated Show resolved Hide resolved
pkg/util/ip/addr.go Outdated Show resolved Hide resolved
@squeed
Copy link
Contributor Author

squeed commented Jan 10, 2019

@danwinship Thanks for the excellent review. suggestions incorporated, PTAL.

@squeed
Copy link
Contributor Author

squeed commented Jan 10, 2019

There are a few README changes needed, but I'd rather make them as part of a follow-up PR, since CI is all green and PRs exclusively with doc changes skip CI.

README.md Outdated
@@ -30,9 +59,14 @@ spec:
## Configuring IP address pools
Users must supply at least two address pools - one for pods, and one for services. These are the ClusterNetworks and ServiceNetwork parameter. Some network plugins, such as OpenShiftSDN, support multiple ClusterNetworks. All address blocks must be non-overlapping. You should select address pools large enough to fit your anticipated workload.

For future expansion, multiple `serviceNetwork` entries are allowed by the configuration but not actually supported by any network plugins. Supplying multiple addresses is invalid.

Each `clusterNetwork` entry has an additional required parameter, `hostPrefix`, that specifies the address size to assign to assign to each individual node. For example
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the example is missing, bah.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so either add the example or remove "For example"

README.md Outdated

Each `clusterNetwork` entry has an additional required parameter, `hostPrefix`, that specifies the address size to assign to assign to each individual node. For example

IP address pulls are always read from the Cluster configuration and propagated "downwards" in to the Operator configuration. Any changes to the Operator configuration will be ignored.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulls -> pools.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix that too

@squeed
Copy link
Contributor Author

squeed commented Jan 14, 2019

@danwinship can I get a LGTM?

Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just doc nits mostly

#### Configuration objects
*Cluster config*
- *Type Name*: `Network.config.openshift.io`
- *Instance Name*: `cluster`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I assume this name matches what other people are doing?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Once the installer switches from NetworkConfig to Network, we can align the names if we like.

README.md Outdated
*Cluster config*
- *Type Name*: `Network.config.openshift.io`
- *Instance Name*: `cluster`
- *View Command*: `oc get Network.openshift.io cluster -oyaml`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Network.config.openshift.io

README.md Outdated
networkType: OpenShiftSDN
```

*Operator Config*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Clarify that the operator config in this example would have been auto-generated. Or alternatively, add a non-auto-generatable line to it and then explain that everything except that one line was auto-generated.)

README.md Outdated
The network operator has a complex configuration, but most parameters have a sensible default.
The network operator gets its configuration from two objects: the Cluster and the Operator configuration. Most users only need to create the Cluster configuration - the operator will generate its configuration automatically. If you need finer-grained configuration of your network, you will need to create both configurations.

Any changes to the Cluster configuration are propagated down in to the Operator configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State explicitly that in case of conflicts, the operator config will be updated to match the cluster config

README.md Outdated
@@ -30,9 +59,14 @@ spec:
## Configuring IP address pools
Users must supply at least two address pools - one for pods, and one for services. These are the ClusterNetworks and ServiceNetwork parameter. Some network plugins, such as OpenShiftSDN, support multiple ClusterNetworks. All address blocks must be non-overlapping. You should select address pools large enough to fit your anticipated workload.

For future expansion, multiple `serviceNetwork` entries are allowed by the configuration but not actually supported by any network plugins. Supplying multiple addresses is invalid.

Each `clusterNetwork` entry has an additional required parameter, `hostPrefix`, that specifies the address size to assign to assign to each individual node. For example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so either add the example or remove "For example"

README.md Outdated

Each `clusterNetwork` entry has an additional required parameter, `hostPrefix`, that specifies the address size to assign to assign to each individual node. For example

IP address pulls are always read from the Cluster configuration and propagated "downwards" in to the Operator configuration. Any changes to the Operator configuration will be ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix that too

README.md Outdated
metadata:
name: cluster
spec:
serviceNetwork: ["172.30.0.0/16"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use the expanded syntax for parallel-ness with clusterNetwork? (And to match how oc get will display it.)

serviceNetwork:
  - "172.30.0.0/16"
clusterNetwork:
  - cidr: "10.128.0.0/14"
    hostPrefix: 23

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There, updated both to match oc get exactly (sans metadata).

// In other words, it watches Network.config.openshift.io/v1/cluster and updates
// NetworkConfig.networkoperator.openshift.io/v1/default.
func (r *ReconcileClusterConfig) Reconcile(request reconcile.Request) (reconcile.Result, error) {
log.Printf("Reconciling Network %s/%s\n", request.Namespace, request.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Network" here is ambiguous. Either Network.config.openshift.io or cluster config.

Also, it's non-namespaced so don't print the namespace. (That may apply to ReconcileNetworkConfig too?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; fixed (for both)

…twork status

This adds a second controller that watches the cluster-level network
configuration. If changes are made, it merges those in to (or creates)
the CRD configuration.

When the changes are applied, the "real" controller then updates the
Network status object.

In the event that the downstream NetworkConfig object has changes that
would be overwritten by the upstream Network object, do the overwrite
and update the NetworkConfig object back.
This can be removed if / when the installer does it.
@squeed
Copy link
Contributor Author

squeed commented Jan 14, 2019

@danwinship doc nits and log lines fixed, thanks.

@squeed
Copy link
Contributor Author

squeed commented Jan 14, 2019

/retest

@danwinship
Copy link
Contributor

/lgtm
/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 14, 2019
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 14, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, squeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@squeed
Copy link
Contributor Author

squeed commented Jan 15, 2019

/retest

@squeed
Copy link
Contributor Author

squeed commented Jan 15, 2019

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 15, 2019
@openshift-merge-robot openshift-merge-robot merged commit eb4e3a7 into openshift:master Jan 15, 2019
@squeed squeed deleted the cluster-config branch March 6, 2019 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants