proposal for handling complicated node actions and scenarios like upgrades #252

neolit123 · 2019-01-28T23:58:50Z

decided to drop some comments post the meeting today.
i think we can land on a compromise for upgrades and similar complicated scenarios exposing the following commands on the CLI.

a command to execute a command on a clusterX, nodeY.
this was already proposed some time ago?
(optionally a command to execute a whole bash script).
a command to copy files from host to clusterY, nodeX.
a command to copy files from clusterY, nodeX to clusterM, nodeN.
a command to copy files from clusterY, nodeX to host.

the backend for these commands is already, mostly in place and also this is already doable using raw docker commands but it's a bit verbose. this is pretty much adding support for scp and ssh in kind.

my argument here is that this allows all sorts of different, advanced scenarios:

copying specific logs files from a cluster (outside of default kind command for logs).
testing a different version of kubelet or kubeadm on a certain node.
applying manifest changes and restarting pods.
stopping a node and seeing if it safely joins back up.
multi-cluster, HA upgrades.
etc.
please note that some of these are testing scenarios.

so on the kubeadm side, what we can do is write a tool in Go (so that we don't write in bash) that doesn't even have to vendor the kind backend, but can simply execute different scenarios using the kind binary.

something else i wish we add (this one is a stretch for the future) is using kind as a machine provisioner - i.e. don't create a cluster by default, but spawn all the nodes, install kubelet, kubeadm and the kubeadm config on them but allow the user to start the cluster on demand.

thoughts?

/kind design
/priority important-longterm
/assign @munnerz @BenTheElder @fabriziopandini

The text was updated successfully, but these errors were encountered:

fabriziopandini · 2019-01-29T22:27:55Z

If I can give my two cents on this, it is important to break down and prioritize the use cases we want to address. As far as I'm concerned

P0 is "use kind for upgrade tests"
P1 is "use kind for kubeadm development"

As agreed in the call, I'm going to open another issue in order to break down "use kind for upgrade tests" into actionable items, while this write up is an excellent first write up (thanks @neolit123!) about what will be helpful for the "use kind for kubeadm development" where maximum flexibility is required

neolit123 · 2019-01-29T22:51:25Z

"use kind for kubeadm development" where maximum flexibility is required

i would say instead of "kubeadm development" this is more about enabling a couple of useful tools in kind that anybody can use and we can also use them for tests as kubeadm developers.

pablochacin · 2019-01-30T08:35:03Z

@neolit123

something else i wish we add (this one is a stretch for the future) is using kind as a machine provisioner - i.e. don't create a cluster by default, but spawn all the nodes, install kubelet, kubeadm and the kubeadm config on them but allow the user to start the cluster on demand.

I have also this use case.

inercia · 2019-01-30T11:13:09Z

Besides the already mentioned use cases about testing updates or running e2e tests, I would also like to expose a couple of use cases we are interested in for using this project:

testing a custom k8s distribution based in kubeadm. For example, we will run some tests for checking that the control plane is started auitomatically by a systemd unit that starts kubeadm. This will require some level of customization in the way kind starts things...
using kind as a library for testing topology changes. For example, we could have a e2e._tests.go that creates a basic master-worker cluster, and then add some extra node, run some tests, then remove the first master, run some other tests, etc...
testing operators that are installed after the cluster setup. This would require the ability to copy some custom docker images to the node image.

In general, we are interested in using kind from a distribution point of view (more than a developer pov).

BenTheElder · 2019-01-31T02:00:29Z

In general, we are interested in using kind from a distribution point of view (more than a developer pov).

er do you mean that you are users of kind in the context of yourselves running a distribution of kubernetes, or that you want to distribute via kind?

) is definitely supported.
) yeah, see eg First class image side-loading support #28. plan to add something soon but this is also pretty trivial to do today

1.) is still the trickiest from our POV, I think we'll be looking more into that for #255.

I don't really like this one because what kind does is create clusters, there's not an easy + clean way to relinquish control of the provisioning steps without exposing exactly what steps and ordering occur, and those are subject to and definitely will need to change. EG to fix #200

Can you outline more of what you do custom in 1) ? We need to supply info to kubeadm in order to properly configure both working in our container env, and informing it somewhat of the cluster topology / endpoints / credentials.

BenTheElder · 2019-01-31T02:07:28Z

something else i wish we add (this one is a stretch for the future) is using kind as a machine provisioner - i.e. don't create a cluster by default, but spawn all the nodes, install kubelet, kubeadm and the kubeadm config on them but allow the user to start the cluster on demand.

so stop short of running init, installing the CNI etc.? create the container, do fixup, start systemd / docker, leave kubelet crashlooping? but still generate kubeadm config?

Also want to point out: most of the things mentioned in the OP are just docker cp / docker exec etc. The node names == host names == container names so discovering which containers to copy to/from should be trivial. I'm not sure if we should have kind ... CLI commands for these built in, but the library does have wrappers for these.

Getting images into the cluster definitely needs some built in tooling. Copying files I'm not so sure 🤔

neolit123 · 2019-01-31T03:01:32Z

so stop short of running init, installing the CNI etc.? create the container, do fixup, start systemd / docker, leave kubelet crashlooping? but still generate kubeadm config?

something in those lines, yes.

leave kubelet crashlooping

or leave it stopped, kubeadm will restart the service.

Also want to point out: most of the things mentioned in the OP are just docker cp / docker exec etc

that is true.

I'm not sure if we should have kind ... CLI commands

it's just the docker cp/exec:

binds to docker
crictl would have been great here except it lacks copy TMK.
docker is not kind cluster/node aware only container name aware.
mostly a UX inconvenience.

but docker cp/exec is definitely the backup plan here.

Getting images into the cluster definitely needs some built in tooling

i think it's currently possible without extra tooling.
docker cp, docker load, restarting pods etc.

inercia · 2019-01-31T09:40:05Z

In general, we are interested in using kind from a distribution point of view (more than a developer pov).

er do you mean that you are users of kind in the context of yourselves running a distribution of kubernetes, or that you want to distribute via kind?

) is definitely supported.

) yeah, see eg First class image side-loading support #28. plan to add something soon but this is also pretty trivial to do today

1.) is still the trickiest from our POV, I think we'll be looking more into that for #255.

I don't really like this one because what kind does is create clusters, there's not an easy + clean way to relinquish control of the provisioning steps without exposing exactly what steps and ordering occur, and those are subject to and definitely will need to change. EG to fix #200

Can you outline more of what you do custom in 1) ? We need to supply info to kubeadm in order to properly configure both working in our container env, and informing it somewhat of the cluster topology / endpoints / credentials.

The workflow we are currently developing for our next release is

cloud-init or someone else writes some kubic-init config file (not a kubeadm config) with the kubeadm token, if this is a seeder or a regular worker, the certificates hashes and any other relevant information.
some systemd unit starts a kubic-init service (it could be in a container) that loads that config, checks if we really need to run kubeadm or the cluster is already set, checks the environment is fine, check for updates, etc... if kubeadm must be run, it generates a kubeadm config file and does a kubeadm init/join.
(only in the seeder) if the cluster is healthy, load some CNI, some operators and so on...

We are interested in creating a base image based on openSUSE and then copy the unit file and the kubic-init binary or container, and then the kubeadm init phase in kind should be replaced by a cp kubic-init.cfg to <the node> and docker start kubic-init. This would allow us to create a testing environment that would be very similar to what a real cluster would be.

As you can see, we want kind to be a general provisioning system for dnd containers and a customizable runner for things in those containers. But it currently has a very specific way of provisioning and starting things that do not match our current workflow. For us it would be enough to have a kind library that we could use for creating our own e2e tests, but that is currently not possible.

pablochacin · 2019-01-31T12:16:01Z

I'm copying here a comment I made to the slack channel, with some minor edits

I'm mostly interested in the second scenario. Provisioning clusters/nodes for testing cluster api controller development.

I think provisioning a new cluster can re-use the existing kind command, building a config file from the cluster api definition. I haven't yet digged too-much in this but I don't expect much problems here, given I can configure things like the base image to use and tweak kubernetes' configuration.

To explore how to add nodes to an existing cluster, I started looking at the existent code to see how to re-use it. My first idea is to create a PoC adding a command like kind add nodes --cluster=1 --role=worker --replicas=1

So far, the main challenge I have found is how to provide the necessary cluster configuration such as the base image to use for the node and the kubeadm token for joining the node. The image could be passed as a parameter, or even as a config file. But the token is obtained from the in-memory configuration and not persisted.

In the cluster-api scenario, this information could be stored in the Cluster CRD object and retrieved by the controller, but in the general case, I don't know how to obtain this information once the cluster is created. I was considering creating a directory per cluster in ~/.kind/<cluster>/ with the cluster definition.

Maybe, this case makes sense only using kind as a library from a program that manages the cluster metadata and can provide the necessary information

pablochacin · 2019-01-31T12:21:36Z

something else i wish we add (this one is a stretch for the future) is using kind as a machine provisioner - i.e. don't create a cluster by default, but spawn all the nodes, install kubelet, kubeadm and the kubeadm config on them but allow the user to start the cluster on demand.

I have also this use case.

On a second reading, my use case is similar, but maybe not identical. However, having the freedom to start initialize the nodes from the actual provisioning could be also beneficial.

neolit123 · 2019-02-04T22:01:03Z

i'm going to close this ticket.
we should use google docs and separate tickets for tracking use cases.

k8s-ci-robot assigned BenTheElder, fabriziopandini and munnerz Jan 28, 2019

k8s-ci-robot added kind/design Categorizes issue or PR as related to design. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jan 28, 2019

neolit123 mentioned this issue Jan 29, 2019

Consider using external addons #253

Closed

fabriziopandini mentioned this issue Jan 29, 2019

Define "unkind" #255

Closed

neolit123 closed this as completed Feb 4, 2019

stg-0 pushed a commit to stg-0/kind that referenced this issue Sep 6, 2023

Upgrade permission for eks logging cluster (kubernetes-sigs#252)

3355db6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal for handling complicated node actions and scenarios like upgrades #252

proposal for handling complicated node actions and scenarios like upgrades #252

neolit123 commented Jan 28, 2019 •

edited

Loading

fabriziopandini commented Jan 29, 2019

neolit123 commented Jan 29, 2019

pablochacin commented Jan 30, 2019

inercia commented Jan 30, 2019

BenTheElder commented Jan 31, 2019

BenTheElder commented Jan 31, 2019

neolit123 commented Jan 31, 2019

inercia commented Jan 31, 2019 •

edited

Loading

pablochacin commented Jan 31, 2019 •

edited

Loading

pablochacin commented Jan 31, 2019

neolit123 commented Feb 4, 2019

proposal for handling complicated node actions and scenarios like upgrades #252

proposal for handling complicated node actions and scenarios like upgrades #252

Comments

neolit123 commented Jan 28, 2019 • edited Loading

fabriziopandini commented Jan 29, 2019

neolit123 commented Jan 29, 2019

pablochacin commented Jan 30, 2019

inercia commented Jan 30, 2019

BenTheElder commented Jan 31, 2019

BenTheElder commented Jan 31, 2019

neolit123 commented Jan 31, 2019

inercia commented Jan 31, 2019 • edited Loading

pablochacin commented Jan 31, 2019 • edited Loading

pablochacin commented Jan 31, 2019

neolit123 commented Feb 4, 2019

neolit123 commented Jan 28, 2019 •

edited

Loading

inercia commented Jan 31, 2019 •

edited

Loading

pablochacin commented Jan 31, 2019 •

edited

Loading