Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: v1alpha3 config proposal #340

Closed
BenTheElder opened this issue Feb 23, 2019 · 17 comments · Fixed by #390
Closed

RFC: v1alpha3 config proposal #340

BenTheElder opened this issue Feb 23, 2019 · 17 comments · Fixed by #390
Assignees
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@BenTheElder
Copy link
Member

BenTheElder commented Feb 23, 2019

I've been thinking for a while now about how to improve the ergonomics of our config and extend it for more use cases. I intend to bring this up at the next meeting. Proposal is as follows:


Proposed changes

Rename the Config type to Cluster

Rationale

This is more precise, it specifically configures a kind cluster, whereas some of the sub objects configure nodes etc. Additionally, config.Cluster is a better library type name than config.Config

Cons

This is a bikeshed. They both work. Users will need to update (but the same is pretty much true for changing the API version).

Allow specifying extra ports on nodes

Specifically allow config of the form:

apiVersion: kind.sigs.k8s.io/v1alpha3
kind: Cluster
nodes:
- role: worker
  nodePorts: [8080]
- role: worker
  nodePorts: [9090, 7070]

Rationale

In single node mode this will be especially easy to use and allow serving traffic from workloads with Kubernetes nodePorts.

This should solve #99, which is one of the earliest standing requests.

Cons

For multi-node we will likely need to label the nodes with the ports or make the node names predictable or configurable. Suggestions are welcome.

This may also not be the most favorable solution long term versus say, implementing a form of LoadBalancer / Ingress, but should be much easier to solve and unblock usage.

Add a networking struct to Cluster

Specifically:

apiVersion: kind.sigs.k8s.io/v1alpha3
kind: Cluster
networking:
  # 127.0.0.1 if unset, affects all nodes
  listenAddress: '0.0.0.0'
  apiServer:
    # random if unset, applies to the control plane node or load balancer
    listenPort: 6443
nodes:
 ....

Rationale

There's increasing demand for configuring this. Additionally, we need to change the defaut listen address to 127.0.0.1 for security reasons ... users will need a way to configure this.

These fields are similar to kubeadm's config.

Cons

I'm not aware of any. Minor complexity increase?

Drop the load-balancer node role

... but keep the node type internally, and mark it as "control-plane-load-balancer" (or something similarly precise)

Rationale

We only actually support the control plane load balancer, which is implicitly necessary for multiple control-plane clusters. Nothing about the control plane is currently configurable, and if it were it would actually be cluster-wide details (like the api server listen address).

We already intend to implicitly add these to v1alpha2 #274

Cons

Users need to stop explicitly specifying load-balancer nodes when they upgrade to v1alpha3.

Drop the replicas field from nodes

Rationale

Having nodes internally copied is increasingly confusing as we add host mounts and other configuration knobs to nodes.

Writing one entry per node is more explicit and reduces some internal complexity.

Cons

The config will be slightly more verbose, users need to copy + paste identical nodes.

Make kubeadm config patches explicitly cluster-wide

Specifically move the kubeadmConfigPatches and kubeadmConfigPatchesJson6902 fields off of the nodes in Cluster to the top level.

Rationale

We already only generate one cluster-wide config, and it's unclear that it would ever make sense to have multiple configs in one cluster. If it does wind up making sense, we can add fields back to nodes later with higher precedence than the cluster-wide fields.

This should clarify the behavior.

Cons

None? It will be straightforward to add per-node patches or other improvements if and when the use cases become apparent.

Example Config:

This is an example of a config with several nodes and every knob set after the above changes.

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
networking:
  # 127.0.0.1 if unset, affects all nodes
  listenAddress: '0.0.0.0'
  apiServer:
    # random if unset, applies to the control plane node or load balancer
    listenPort: 6443
# patch the generated kubeadm config with some extra settings
kubeadmConfigPatches:
- |
  apiVersion: kubeadm.k8s.io/v1beta1
  kind: ClusterConfiguration
  metadata:
  name: config
  networking:
    serviceSubnet: 10.0.0.0/16
# patch it further using a JSON 6902 patch
kubeadmConfigPatchesJson6902:
- group: kubeadm.k8s.io
  version: v1beta1
  kind: ClusterConfiguration
  patch: |
    - op: add
      path: /apiServer/certSANs/-
      value: my-hostname
# 3 control plane nodes and 3 workers
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
  # expose extra ports on this node
  nodePorts: [9090, 7070]
  # add extra bind mounts to this node
  extraMounts:
    # readonly bind mount the kind source code at /kind-source
    containerPath: /kind-source
    hostPath: /Users/bentheelder/go/src/sigs.k8s.io/kind
    readOnly: true
    # explicitly set some advanced values to their defaults
    selinuxRelabel: false
    propagation: None
- role: worker
  # add extra bind mounts to this node
  extraMounts:
    # readonly bind mount the kind source code at /kind-source
    containerPath: /kind-source
    hostPath: /Users/bentheelder/go/src/sigs.k8s.io/kind
    readOnly: true
- role: worker
  # add extra bind mounts to this node
  extraMounts:
    # readonly bind mount the kind source code at /kind-source
    containerPath: /kind-source
    hostPath: /Users/bentheelder/go/src/sigs.k8s.io/kind
    readOnly: true
@BenTheElder
Copy link
Member Author

/priority important-soon
/assign
cc @neolit123 @fabriziopandini @munnerz @pablochacin

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Feb 23, 2019
@BenTheElder BenTheElder added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Feb 23, 2019
@BenTheElder BenTheElder added this to the 0.2 milestone Feb 23, 2019
@neolit123
Copy link
Member

neolit123 commented Feb 23, 2019

This is more precise, it specifically configures a kind cluster, whereas some of the sub objects configure nodes etc. Additionally, config.Cluster is a better library type name than config.Config

support for multi-cluster topology e.g. clusters sitting behind a load balancer would have been fun, but out of scope so the change SGTM.

Drop the load-balancer node role

SGMT, but this drops eventual load-balancer configurations if we want to support such even.
probably not anytime soon.

Drop the replicas field from nodes

doesn't matter that much. slightly less friendly UX config wise.

Make kubeadm config patches explicitly cluster-wide

someone might complain about this eventually and file the request to have them per node.

Example Config:

SGTM.

BenTheElder added this to the 0.2 milestone 19 minutes ago

has the timeline of 0.2 shifted towards the 1.14 release and the 1.0 shifted for the future?
seems optimistic, but doable.

@BenTheElder
Copy link
Member Author

Thanks for the comment @neolit123

support for multi-cluster topology e.g. clusters sitting behind a load balancer would have been fun, but out of scope so the change SGTM.

We could probably still do this by having cluster objects set the name metadata and having another object reference them by name?

This does seem out of scope for now though. Kubernetes has no concepts for this AFAIK.

SGMT, but this drops eventual load-balancer configurations if we want to support such even.
probably not anytime soon.

Those settings can go under networking at the top level? (and we will be allowing setting eg the port)

doesn't matter that much. slightly less friendly UX config wise.

very slightly I think. it's less confusing in the config format and the code but more verbose. It shouldn't be too unbearable to repeat node specs in your config, most multi-node clusters will just look like:

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: worker
- role: worker

someone might complain about this eventually and file the request to have them per node.

If / when they do it will be useful to understand the use cases and why, and we can make per-node config override the cluster-global config at the node level.

This should be less confusing and avoid people repeating it for every node when really they just wanted to change something cluster wide anyhow (like say the API server cert SANs).

has the timeline of 0.2 shifted towards the 1.14 release and the 1.0 shifted for the future?
seems optimistic, but doable.

I shifted 0.2 to end of next week optimistically, most of these changes are quite trivial, and many of them like the ports can be added in a later edition of v1alpha3 without breaking change.

1.0 is definitely shifted to the future, this was always an aggressive and overly optimistic timeline to push for core feature design & iteration.

We should discuss these dates in another venue though (perhaps at the meeting)?

I'm also OK to not make this in 0.2, but I think we can at least get the breaking changes done quickly once we've reviewed the proposal.

@neolit123
Copy link
Member

we might also want to move retain and wait as part of the plan for this config.

We could probably still do this by having cluster objects set the name metadata and having another object reference them by name?
This does seem out of scope for now though. Kubernetes has no concepts for this AFAIK.

there are definitely multi-cluster topologies in products out there, yet this seems out of scope for the project. extarnal pivoting / tooling can be done to support such a scenario (if someone needs this).

Those settings can go under networking at the top level? (and we will be allowing setting eg the port)

yes, they can.

very slightly I think. it's less confusing in the config format and the code but more verbose. It shouldn't be too unbearable to repeat node specs in your config, most multi-node clusters will just look like:

true.

If / when they do it will be useful to understand the use cases and why, and we can make per-node config override the cluster-global config at the node level.
This should be less confusing and avoid people repeating it for every node when really they just wanted to change something cluster wide anyhow (like say the API server cert SANs).

yes, this can work if there is demand.

1.0 is definitely shifted to the future, this was always an aggressive and overly optimistic timeline to push for core feature design & iteration.
We should discuss these dates in another venue though (perhaps at the meeting)?

sounds good.

I'm also OK to not make this in 0.2, but I think we can at least get the breaking changes done quickly once we've reviewed the proposal.

i don't have a strong preference on the versioning.

@fabriziopandini
Copy link
Member

@BenTheElder thanks for keeping me in this loop!
The list of proposed changes SGTM; some comments / additional questions:

Rename the Config type to Cluster
Great to see moving cluster setting in a separated set of fields; this is consistend with kubeadm config / cluster api, and it was also in the original issue for multinode config in kind (but then punted in the initial prototypes for sake of simplicity). Now that the number of cluster settings is growing/consolidates, it makes definitely sense to rethink this.

Allow specifying extra ports on nodes

For multi-node we will likely need to label the nodes with the ports or make the node names predictable or configurable.

I think that making node names configurable can open to a new set of problems. -1
Instead, given that we are dropping the replicas field, it si now possible to name nodes according to the order in the config, and this makes this process more predictable / more evident in the logs

Add a networking struct to Cluster
What is the rational behind the decision to assigning random ports as a default?
Using well know ports for the control plane node or load balancer in the docker internal network makes thinks more easy predictable; nevertheless, there will always be a port port on the host machine...

Drop the load-balancer node role

support for multi-cluster topology e.g. clusters sitting behind a load balancer

is not fully clear to me. Is there an issue to look at?

what about external-etcd?
In the current config we are ready to manage an external-etcd node, but this is not implemented yet.
This is something that we want to test in CI as part of the kubeadm test matrix, so I would like to see this option preserved, but I think that also this should move at cluster level, similarly to the external lb. WDYT?

Drop the replicas field from nodes
Cluster API (last time I chechek) was preserving both machine and machine sets, but I think that it is fine for kind dropping the concept of replica ATM.

what about image settings?
Now image setting are in CLI (with cluster scope) and in config (with node scope); does it make sense to have in the new Cluster object an image field, to be eventually overriden at node level?

@BenTheElder
Copy link
Member Author

I think that making node names configurable can open to a new set of problems. -1
Instead, given that we are dropping the replicas field, it si now possible to name nodes according to the order in the config, and this makes this process more predictable / more evident in the logs

As a user needing to schedule a workload to a machine with that nodeport available, I'm supposed to predict the node name...?

There needs to be a mechanism to pin workloads to the right node with the exposed port since they can't all bind to the same port.

What is the rational behind the decision to assigning random ports as a default?

Multiple clusters on the host? To be clear this is the port on the host, not the port inside the container. This is the existing behavior + being able to select a specific one if you want (which will fail if you create multiple with the same, but it's been requested and many will only create one cluster).

Using well know ports for the control plane node or load balancer in the docker internal network makes thinks more easy predictable; nevertheless, there will always be a port port on the host machine...

Yeah this is the host port. Not the internal port. Perhaps we should rename this field.

support for multi-cluster topology e.g. clusters sitting behind a load balancer

I don't think this is a problem for kind to solve. This can be done externally.

what about external-etcd?
In the current config we are ready to manage an external-etcd node, but this is not implemented yet.
This is something that we want to test in CI as part of the kubeadm test matrix, so I would like to see this option preserved, but I think that also this should move at cluster level, similarly to the external lb. WDYT

SGTM, let's make this a cluster control instead too and keep nodes for Kubernetes nodes.

what about image settings?
Now image setting are in CLI (with cluster scope) and in config (with node scope); does it make sense to have in the new Cluster object an image field, to be eventually overriden at node level?

I think that makes sense, and we can make the flag simply override this field 👍

@neolit123
Copy link
Member

think that making node names configurable can open to a new set of problems. -1
Instead, given that we are dropping the replicas field, it si now possible to name nodes according to the order in the config, and this makes this process more predictable / more evident in the logs

As a user needing to schedule a workload to a machine with that nodeport available

how about requiring a node name only if nodePorts: .... is set?

@BenTheElder
Copy link
Member Author

I'm not sure we should require one ever (we can always autogenerate) or that name should be the mechanism to pin. Extra user supplied node labels could work maybe (that we apply at the end of cluster up).

@pablochacin
Copy link
Contributor

Dropping replicas

I'm not comfortable with the idea of dropping the replica number. For me, It make sense to have a role with some common attributes, such as the image, and then easily change the number. I find having to add/remove entries in the node list much more inconvenient.

I've been digging into the code this weekend to understand what it would take to create a new node role and I haven't identified any issue regarding using replicas. Could you please elaborate?

Drop the load balancer role

I think it makes sense. I would be possible to allow specifying an alternative image for this node at the cluster level, in case someone want to experiment with another implementation?

Port naming

I'm a little bit confused here. The approach we had been discussing so far, and I expected to be discussed the next meeting, is different: Instead of exposing the NodePort on a particular node, create a kind of ingress controller which exposes the ports. I that case, we don't need to decide on which node to expose the ports. Exposing ports on particular nodes seems neither natural nor convenient to me. Maybe I'm missing something in the conversation.

@BenTheElder
Copy link
Member Author

Dropping replicas

I'm not comfortable with the idea of dropping the replica number. For me, It make sense to have a role with some common attributes, such as the image, and then easily change the number. I find having to add/remove entries in the node list much more inconvenient.

How many entries are we talking about? kind doesn't really "scale" beyond a handful of nodes? (at least not without a pretty beefy machine, since they're all on the same host).

If you're only setting the image and the role then copying / removing is only 2 lines per node, that seems like a pretty small burden. The other fields don't work well with repeated nodes and the repeated node logic complicates our code.

If we keep replicas but continue to add useful per-node configuration, we're going to wind up needing some form of user-facing templating for the node configs (eg to make mounts have per-node host paths) and I think that logic should live outside of kind.

Similarly, some other yaml munging tool can generate / manipulate kind configurations and implement repetition, templating over the paths, etc..

I've been digging into the code this weekend to understand what it would take to create a new node role and I haven't identified any issue regarding using replicas. Could you please elaborate?

"roles" are unrelated, this has to do with all other node config, like mounts, ports etc.

I think we may have a terminology mismatch here, "roles" are just the node type, not any of the other fields? Replicas were essentially copying + pasting the rest of the fields, of which only image/role make any sense to copy.

Related: "Roles" are also not a super amazing config UX, we're dropping down to just control-planes and workers. Details like external-etcd and control-plane load-balancer are cluster wide and are not Kubernetes nodes.

Drop the load balancer role

I think it makes sense. I would be possible to allow specifying an alternative image for this node at the cluster level, in case someone want to experiment with another implementation?

We definitely could add a cluster-wide field for the image used for this, but I think I'd rather this experimentation be done by modifying kind's source code. In particular I don't think it's reasonable for us to support multiple kinds of load balancers currently, and we need to know what implementation is used to generate the configuration, copy the certs, etc.

Given this, if someone experiments with different load balancers, they are going to need to touch the internals of kind anyhow.

Port naming

I'm a little bit confused here. The approach we had been discussing so far, and I expected to be discussed the next meeting, is different: Instead of exposing the NodePort on a particular node, create a kind of ingress controller which exposes the ports. I that case, we don't need to decide on which node to expose the ports. Exposing ports on particular nodes seems neither natural nor convenient to me. Maybe I'm missing something in the conversation.

Right. I did some more experimentation, and I think we can unblock simple use cases in a fairly simple way by providing something like this, but it's not ideal.

Let's punt this one until further discussion.

@pablochacin
Copy link
Contributor

How many entries are we talking about? kind doesn't really "scale" beyond a handful of nodes? (at least not without a pretty beefy machine, since they're all on the same host).
If you're only setting the image and the role then copying / removing is only 2 lines per node, that seems like a pretty small burden.

The point is not the number of nodes, but having to edit the files. There was a proposal for adding command line flags for setting the number of replicas, which would not be possible under this format.

The other fields don't work well with repeated nodes and the repeated node logic complicates our code.
If we keep replicas but continue to add useful per-node configuration, we're going to wind up needing some form of user-facing templating for the node configs (eg to make mounts have per-node host paths) and I think that logic should live outside of kind.

Maybe we should separate the role (template) from the replica. This would still allow generic settings and per-replica settings.

I think that making this decision considering current situation is possibly not the best perspective. The framework is growing quickly, new features are been added and it is to expect more settings to be added.

@BenTheElder
Copy link
Member Author

The point is not the number of nodes, but having to edit the files. There was a proposal for adding command line flags for setting the number of replicas, which would not be possible under this format.

That's not what the PR does, it creates flags per role that set a number of all default value nodes. Doing that is totally possible under this, it would just set the node list to contain N nodes of each role, as opposed to setting it to one entry per role with the replicas set. It won't really be affected by this.

I would expect that changing the number of configured nodes involved editing the file ..?

Maybe we should separate the role (template) from the replica. This would still allow generic settings and per-replica settings.

There are plenty of tools that can do this sort of thing to config files. I don't think kind needs to own a templating mechanism for it's own nodes.

Template us a better term when discussing this fwiw, "role" already has a specific meaning in kind and kubeadm which is not this.

I don't think we should own a templating DSL.

I think that making this I think that making this decision considering current situation is possibly not the best perspective. The framework is growing quickly, new features are been added and it is to expect more settings to be added.

Except that none of these new settings make much sense in the face of replicas, and I expect more of that. Replicas really only makes sense for default nodes, which are only one line each (- role: worker). The other fields don't really replicate or are cluster wide.

If this turns out to be a real problem for some reason it will be easier to add replicas or something similar back later than it will be to remove it, especially once we try to go beta.

@DazWorrall
Copy link

Add a networking struct to Cluster

To clarify, this is about configuring the container runtime (e.g. #277)?

@BenTheElder
Copy link
Member Author

@DazWorrall the options in networking outlined currently are cluster-wide settings for the API server, but we can add more fields for other networking features.

@BenTheElder BenTheElder added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Feb 28, 2019
@BenTheElder
Copy link
Member Author

initial version done in #373

will follow up to add the networking struct

@fabriziopandini
Copy link
Member

@BenTheElder with latest changes we are loosing the possibility to create a cluster with an external load balancer and a single control-plane node.
This is useful for the use case when you want to test adding control-plane plane nodes at a later stage.
(There are workarounds, but I wanted to let you know because probably it is quite simple to restore this capability)

@BenTheElder
Copy link
Member Author

@fabriziopandini that is a great point, thanks! We can definitely make this possible again 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants