Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stub in a best guess at cluster level configuration #124

Merged

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Nov 8, 2018

Cluster level configuration is a stable, discoverable API in the config.openshift.io group that a cluster admin will expect to use to interact and configure his cluster. Placing it in one location allows multiple operators and binaries to depend on a single source of truth for information. It also enable doc-less discovery by a cluster-admin and the divisions here allow individual teams to own their configuration inside of a cluster. Coupling across multiple processes and future potential for subdivision will become clear.

A cluster-admin will be able to go through a flow like

  1. I want to configure a thing. What options are available?
  2. oc api-resources --api-group=config.openshift.io - produces a list of high level features like Images, Builds, Networking, IdentityProvider, etc.
  3. oc explain networking.config.openshift.io - produces a list of API files and their documentation (pull open to kube)
  4. oc edit networking.config.openshift.io - to make a change

There is another set of actors as well. Many settings are actually observed from cluster and cannot reasonably be provided or set by a cluster-admin. For instance, the internalRegistryHostname is known by the image-registry-operator, not the cluster-admin. To represent this, configuration objects have a spec/status split. Controller/Operator maintained information lives in status, cluster-admin maintained information lives in spec. You should not have a field that multi-writer. If multiple writers, especially one machine and one human, try to coordinate writes on a single field, someone will get confused.

The divisions will be along feature lines, not teams or binaries. An operator can observe changes to these types to drive behavior. That observation and wiring is expected to be performed by the feature own in all the binaries that need to react to changes. For instance, if a change to the network configuration needs to be observed and handled by the kube-apiserver, openshift-apiserver, and openshift-controller-manager, the networking team will make the configuration available and manage the wiring in the individually affected processes.

The expected flow goes something like this:

  1. cluster-admin updates config object foo
  2. operator bar observes the parts it cares about into parameters for the operator configuration resource. this provides a configuration level (generation), that can later be checked against status.
  3. the operator resource change is observed by the same operator bar and operator bar writes a new configuration for the operand (binary being managed)
  4. the operand consumes the configuration and behaves as requested by the admin via the cluster-config
  5. operator bar indicates the operand is using the requested level in its status (status.generation)

The API for these types will be the main entrypoint of choice for a cluster-admin and must remain stable across releases. This is in contrast to the on-disk formats for particular binaries which will no longer need stability guarantees since they are operator managed.

This pull provides a first cut at the different buckets of configuration that known today.

/assign @smarterclayton @jwforres @derekwaynecarr

@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 8, 2018
Status CloudProviderStatus `json:"status"`
}

type CloudProviderSpec struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lowest common denominator I know of here would be:

Name string `json:"name"`

Where Name maps to the kube-controller-manager --cloud-provider argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or Provider.

// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

// DNS holds cluster-wide information about DNS. The canonical name is `cluster`
type DNS struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this represent cluster DNS config or external DNS config?

For cluster DNS, the minimal info would be:

ClusterDomain *string `json:"clusterDomain"`

Where ClusterDomain maps to the kubelet --cluster-domain argument. This is almost certainly immutable for the foreseeable future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if enough of this overlaps with Networking that we should consider putting it there. Service CIDR and Internal network domain are fundamental network things that everyone must respect.

Status NetworkStatus `json:"status"`
}

type NetworkSpec struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@squeed was just talking about this...

}

type NetworkSpec struct {
// serviceCIDR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use case: cluster-dns-operator picks a static cluster IP from this range.

// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

// Routing holds cluster-wide information about Routing. The canonical name is `cluster`
type Routing struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is this should be Ingress instead.

Status RoutingStatus `json:"status"`
}

type RoutingSpec struct {
Copy link
Contributor

@ironcladlou ironcladlou Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@squeed
Copy link
Contributor

squeed commented Nov 8, 2018

Hm. So should we work towards deprecating the cluster-network-operator CRD and configure it exclusively via this API? Otherwise we have a nasty duplicated data problem.

@deads2k
Copy link
Contributor Author

deads2k commented Nov 8, 2018

Hm. So should we work towards deprecating the cluster-network-operator CRD and configure it exclusively via this API? Otherwise we have a nasty duplicated data problem.

You can observe the value from here and bump spec in your operator resource to be able to have a single configuration level to determine whether the operator has observed it. See https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/pkg/operator/observe_config.go#L197-L258 for example.

// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

// CloudProvider holds cluster-wide information about CloudProvider. The canonical name is `cluster`
type CloudProvider struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to call this InfrastructureProvider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just Infrastructure

@deads2k deads2k force-pushed the config-10-best-guess branch from 765113f to dbf2acc Compare November 8, 2018 20:24
@deads2k
Copy link
Contributor Author

deads2k commented Nov 8, 2018

Names updated. @smarterclayton @ironcladlou I'm looking to get the categories merged and then looking to have individual feature owners start filling them in.

// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

// DNS holds cluster-wide information about DNS. The canonical name is `cluster`
type DNS struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if enough of this overlaps with Networking that we should consider putting it there. Service CIDR and Internal network domain are fundamental network things that everyone must respect.

// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

// Ingress holds cluster-wide information about Ingress. The canonical name is `cluster`
type Ingress struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely reserve this name.

// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

// OAuth holds cluster-wide information about OAuth. The canonical name is `cluster`
type OAuth struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible this should be part of an "Authentication" object instead of separate. I don't want too deep/complex objects, but 3-7 feels like a nice total number to avoid user fatigue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible this should be part of an "Authentication" object instead of separate. I don't want too deep/complex objects, but 3-7 feels like a nice total number to avoid user fatigue.

Authentication configuration is distinct from the configuration of the oauth server. OAuth and IDP may be worth collapsing, but not with Authentication too.

// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

// Project holds cluster-wide information about Project. The canonical name is `cluster`
type Project struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably say this becomes SelfService but we don't have to create it yet. Need to think about it.

}

type SchedulingSpec struct {
// default node selector (I would be happy to see this die....)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node selector is the most useful security thing we have done so far, so while I know you hate it... :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node selector is the most useful security thing we have done so far, so while I know you hate it... :)

I hate that we have two incompatible implementations that have existed for years and never been cleaned up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but both of those are global security config. Either way we agree.

@smarterclayton
Copy link
Contributor

Probably should clearly mark "maybe objects" with a comment indicating that they are subject to change (anything outside of the ones we've decided on like Build and Image).

@deads2k deads2k force-pushed the config-10-best-guess branch from dbf2acc to a82fcd5 Compare November 8, 2018 20:48
@deads2k
Copy link
Contributor Author

deads2k commented Nov 8, 2018

Probably should clearly mark "maybe objects" with a comment indicating that they are subject to change (anything outside of the ones we've decided on like Build and Image).

done

@smarterclayton
Copy link
Contributor

/lgtm

We can iterate in practice

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 8, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [deads2k,smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit c3b47bf into openshift:master Nov 8, 2018
@derekwaynecarr
Copy link
Member

What happened to Cloud? It seemed useful as long as we handled future case where “external” could be respected to only have meaning for kubelets. We just deferring to future PR?

@smarterclayton
Copy link
Contributor

smarterclayton commented Nov 9, 2018 via email

@Miciah Miciah mentioned this pull request Nov 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants