KIND infra, SRIOV proivder, Run in-memory etcd #478

ormergi · 2020-11-15T19:32:54Z

Currently we encounter bad performance of KIND cluster on DinD setup,
we get 'etcdserver: timeout errors' on SRIOV lane jobs causing jobs to fail often.

In such cases it is recommended to run in-memory etcd
kubernetes-sigs/kind#1922
kubernetes-sigs/kind#845
Running etcd in memory should improve performance and make sriov lane more stabilized

This PR patch the kind-config yaml for creating a cluster to run etcd from a specified path
which will be backed by in memory volume kubevirt/project-infra#709

ormergi · 2020-11-15T19:34:09Z

/cc @qinqon @EdDev @dhiller @danielBelenky

qinqon

Looks great, have some nits and a big question.

qinqon · 2020-11-17T09:23:57Z

cluster-up/cluster/kind/common.sh

+    mkdir -p ${KUBEVIRTCI_CONFIG_PATH}/$KUBEVIRT_PROVIDER/cni-plugins
+    curl -L https://github.com/containernetworking/plugins/releases/download/v0.8.5/cni-plugins-linux-amd64-v0.8.5.tgz | tar xz -C ${KUBEVIRTCI_CONFIG_PATH}/$KUBEVIRT_PROVIDER/cni-plugins
    for node in $(_get_nodes | awk '{print $1}'); do
-        docker exec $node /bin/sh -c "curl -L https://github.com/containernetworking/plugins/releases/download/v0.8.5/cni-plugins-linux-amd64-v0.8.5.tgz | tar xz -C /opt/cni/bin"
+        docker cp ${KUBEVIRTCI_CONFIG_PATH}/$KUBEVIRT_PROVIDER/cni-plugins/. $node:/opt/cni/bin
    done


This is not related and leaves a tarball and a directory there, also the number of nodes we usually have is quite small so is like 2 curl + unpacks vs 1 curl + unpack.

My intent was to reduce as much time possible for cluster-up to finish so I prefer to curl and unpack one time and only then copy the content
I can add remove the tar and the directory right after the for loop, is that ok with you?

Well but this is really not related to run etcd in memory, we can do that at different PR.

No problem I will remove this commit

qinqon · 2020-11-17T09:25:56Z

cluster-up/cluster/kind/common.sh

@@ -210,9 +196,91 @@ function setup_kind() {
    prepare_config
 }

+function ensure_memory_dir_for_etcd_data() {
+    if [ "$ETCD_DATA_DIR" == none ];then
+        echo "ETCD_DATA_DIR is not defined" && exit 1


Better two lines

echo "ETCD_DATA_DIR is not defined" exit 1

qinqon · 2020-11-17T09:28:36Z

cluster-up/cluster/kind/common.sh

@@ -15,6 +15,8 @@ REGISTRY_NAME=${CLUSTER_NAME}-registry
 MASTER_NODES_PATTERN="control-plane"
 WORKER_NODES_PATTERN="worker"

+ETCD_DATA_DIR=${ETCD_DATA_DIR:-none}


You don't need this, just check if the variable set or not before use it

[ -z "$ETCD_DATA_DIR" ]

I added this to kind/common.sh for better visibility of that variable, so if one will review for example ensure_memory_dir_for_etcd_data function it will be easier to understand from where this variable comes form
What you think?

If you want to do that, then just throw an error here is the variable is not set.

qinqon · 2020-11-17T09:29:13Z

cluster-up/cluster/kind/common.sh

@@ -210,9 +196,91 @@ function setup_kind() {
    prepare_config
 }

+function ensure_memory_dir_for_etcd_data() {
+    if [ "$ETCD_DATA_DIR" == none ];then


ditto: [ -z "$ETCD_DATA_DIR" ]

qinqon · 2020-11-17T09:32:57Z

cluster-up/cluster/kind/common.sh

+        ls -lsah $ETCD_DATA_DIR
+
+        echo "mount etcd data directory as tempfs"
+        mount -t tmpfs tmpfs $ETCD_DATA_DIR


Maybe I am wrong but when we run this on prow the tmpfs is done by k8s when we crate the volume of typoe Memory, right, so we don't need to do this ourselves.

Great question :)
I added this logic to support development environments flow where you dont have k8s to
create this mount for you.
On local dev env this function will create a directory and mount it,
if the mount already exists it will delete the mount content to prevent contradicts with new data.
On CI since we ask from the k8s to create a mount for us,
this function will check that the mount exists, delete the mount content to prevent contradicts with new data
and prints its details

at local env we have to also remove it after we do cluster-down right ?

You right I will add a clean up for this on cluster-down

qinqon · 2020-11-17T10:03:19Z

cluster-up/cluster/kind-k8s-sriov-1.17.0/provider.sh

-
-    setup_kind
+
+    export ETCD_DATA_DIR="/mnt/etcd"


What will happend if we open more than one kind cluster ? they cannot share the etcd directory. Let's better add something that is unique per kind cluster.

Currently we deploy each job at a time on a CI node which means that there could be only one kind cluster at a given time.

I think we should tackle this when we would need to support (if ever) multiple sriov lane jobs on single CI job.
Also it should be changed together with changes we might what to apply such as handling the cluster name (should be unique),
attaching PFs to each cluster (check which is available)
and any other change we would need to do.
What you think?

But you also have assumptions here which could be avoided.

That the prev deployment was properly cleaned.

That someone experiments on the node and runs once manually and once through the CI.
3 .The above unintentionally... i.e. bugs.

As the option of this to occur does exist, it makes sense to do it right from the beginning. I would argue that not accounting for this in the first place in the original deployment scripts was the wrong approach, so we should try and not repeat it now.

Of course, you can reason not to do it now because it requires too much effort.
I would also not attempt to do/fix this until all works.

I think that we should see if running etcd in memory works for us and actually solve the problem,
If it will and the lane will be green most of the time we could start to think about more optimizations (running more then one sriov lane job on a single CI node)

Note that the amount of sriov lane jobs that could run on single CI node is bound to the number of PF's sriov card provides which is two at the current state.

I am already working on running multi instances on one node (already works locally), POC soon,
as was raised in the CI meeting.

Now that I think of, if two sriov lanes run at same node they will have different pods so the are not going to share the /mnt/etcd but still we have beneficts from using different etcd directories, if it's not too much effort.

Also let's add more info to the name and use something like /mnt/kubevirt-kind...-[unique name]-etcd. One thing you can use is the is the container of the first master or the like or the one from the registry container.

I think we should address this when we will work on supporting running multiple jobs on the same CI node
especially when we dont know in what way we are going to do it.
The way I see it, we need to manipulate sriov-cni on the CI cluster so it will attach single PF for each prow job pod
than we dont need to change the current code and the whole process will be transparent.
And like you said since each job run on the different pod there shouldn't be a problem with the directories
What you think?

I can add another directory for visibility so the path will be like /mnt/kubevirt-kind/etcd
is that ok?

I think @qinqon mentioned that it is not about running multiple lanes in parallel, so I do not understand why you refer to that in your answer.

Please reason why there is no need for an unique, per-run folder name in this case, regardless of usage by other features we may encounter in the future.

On CI unique names is not a must since we create in memory directory using emptyDir volume API (on CI cluster) on the prow job pod kubevirt/project-infra#709

After discussing about it with Quique I realized we unique name for local environments in case cluster-up is stopped in the middle for any reason.
I pushed new commit to resolve this

ormergi · 2020-11-17T11:06:41Z

Changes:

Removed first commit that address cni-plugins downloads
Added teardown login for ectd data mount that will be invoked during make cluster-down

qinqon · 2020-11-18T08:34:08Z

cluster-up/cluster/kind-k8s-sriov-1.17.0/provider.sh

-
-    setup_kind
+
+    export ETCD_DATA_DIR="/mnt/etcd"


Now that I think of, if two sriov lanes run at same node they will have different pods so the are not going to share the /mnt/etcd but still we have beneficts from using different etcd directories, if it's not too much effort.

Also let's add more info to the name and use something like /mnt/kubevirt-kind...-[unique name]-etcd. One thing you can use is the is the container of the first master or the like or the one from the registry container.

ormergi · 2020-11-18T14:20:21Z

Changes:

Use unique names when creating etcd data directory for the running kind cluster
now the ectd data dir will path will be constructed by $BASE_ETCD_DATA_DIR/$ETCD_DATA_DIR
Add trap to teardown etcd in memory dir mount

Please let me know if I should squash the commits

qinqon · 2020-11-20T12:55:25Z

@ormergi, what do you mean ? we cannot mount Memory at prow into /var/lib/etcd or that we cannot use containerPath: /var/lib/etcd since kind node container has already the directory in place?

ormergi · 2020-11-22T07:28:29Z

@ormergi, what do you mean ? we cannot mount Memory at prow into /var/lib/etcd or that we cannot use containerPath: /var/lib/etcd since kind node container has already the directory in place?

We can mount /var/lib/etcd from prow job pod to kind containers only through extraMount,
on kind containers /var/lib/etcd is directory inside the container and does not shard or mounted from the host

But it dosent matter anymore because I found a way to run etcd in memory w/o the need to create a mount! :)

ormergi · 2020-11-22T09:17:26Z

Changes:

Remove the need to create tmpfs mount to back etcd data directory and teardown and reladed commits
Adding kubeadm ClusterConfiguration to kind config yaml that points etcd data dir to /tmp/... inside kind
cluster nodes, which is already mounted as tmpfs :).
To enable it on any kind provider simply add RUN_ETCD_IN_MEMORY=true to provider.sh

ormergi · 2020-11-22T09:18:46Z

/test check-up-kind-1.17-sriov

ormergi · 2020-11-22T13:01:05Z

Kubevirt tests are passing
https://prow.apps.ovirt.org/view/gcs/kubevirt-prow/pr-logs/pull/kubevirt_kubevirt/4583/pull-kubevirt-e2e-kind-1.17-sriov/1330469917769601024

qinqon · 2020-11-23T06:39:24Z

@ormergi this means that we don't need to mount a volume at the prow job ?

I understand that what /tmp means inside the kind node depends what the kind project want to do with it, are we sure they are not going to change it ? In the future they can mount a container volume that is writing to disk and then we are at square one.

Although the solution is better Thant what we have now and is self contained.

qinqon · 2020-11-23T06:50:32Z

cluster-up/cluster/kind-k8s-sriov-1.17.0/provider.sh

-
-    setup_kind
+
+    RUN_ETCD_IN_MEMORY="true"


Remove this and leave true as default

qinqon · 2020-11-23T06:51:08Z

cluster-up/cluster/kind/common.sh

@@ -15,6 +15,8 @@ REGISTRY_NAME=${CLUSTER_NAME}-registry
 MASTER_NODES_PATTERN="control-plane"
 WORKER_NODES_PATTERN="worker"

+RUN_ETCD_IN_MEMORY=${RUN_ETCD_IN_MEMORY:-"false"}


All the variables that can be set outside of providers are prefixed with KUBEVIRT, let's rename this KUBEVIRT_WITH_KIND_ETCD_IN_MEMORY, also set the default to true at the end is what we want, if there is a problem we can go back to false changing just the prow job.

ormergi · 2020-11-23T07:11:47Z

@ormergi this means that we don't need to mount a volume at the prow job ?

I understand that what /tmp means inside the kind node depends what the kind project want to do with it, are we sure they are not going to change it ? In the future they can mount a container volume that is writing to disk and then we are at square one.

Although the solution is better Thant what we have now and is self contained.

Sorry I didnt mean to say the obvious about /tmp I am sure you familiar with how it works.
This was suggested by KIND it also suggested by kind kubernetes-sigs/kind#845 (comment) like we already know etcd will write to disk by default
as far as I know there is no intent to change this API, if they will so like any other API change we will adapt.
I also understand that Kind is also proposed for production and should survive restart (at least)
Even if they will come up with container volume /tmp should be mounted to RAM in any case
so I think we good

ormergi · 2020-11-23T09:40:12Z

Changes:

Squashed last two commits and updated the commit message accordingly
Address comments:
- Present KUBEVIRT_WITH_KIND_ETCD_IN_MEMORY env var to control etcd in memory form kubevirt
- Add a check at cluster-up (at kind_up) that verifies the mount actually exists

ormergi · 2020-11-23T09:40:48Z

/test check-up-kind-1.17-sriov

qinqon · 2020-11-23T09:48:38Z

cluster-up/cluster/kind/common.sh

@@ -15,6 +15,9 @@ REGISTRY_NAME=${CLUSTER_NAME}-registry
 MASTER_NODES_PATTERN="control-plane"
 WORKER_NODES_PATTERN="worker"

+KUBEVIRT_WITH_KIND_ETCD_IN_MEMORY=${KUBEVIRT_WITH_KIND_ETCD_IN_MEMORY:-"true"}
+ETCD_DATA_DIR="/tmp/kind-cluster-etcd"


Rename this to ETCD_IN_MEMORY_DATA_DIR.

qinqon · 2020-11-23T09:50:31Z

cluster-up/cluster/kind/common.sh

+        echo "Checking KIND cluster etcd data is mounted to RAM: $ETCD_DATA_DIR"
+        docker exec "$CLUSTER_NAME-control-plane" df -h $(dirname $ETCD_DATA_DIR)
+        docker exec "$CLUSTER_NAME-control-plane" du -h $ETCD_DATA_DIR
+        [ $(echo $?) != 0 ] && echo "falid to check etcd data directory" && return 1


typo: failed

qinqon · 2020-11-23T09:50:59Z

cluster-up/cluster/kind/common.sh

+        docker exec "$CLUSTER_NAME-control-plane" df -h $(dirname $ETCD_DATA_DIR)
+        docker exec "$CLUSTER_NAME-control-plane" du -h $ETCD_DATA_DIR


This will check that it's mounted but not that it's mounted in memory, can you check that it's using tmpfs ?

Currently we encounter bad performance of etcd on sriov provider cluster on DinD setup. We get 'etcdserver: timeout errors' that causes jobs to fail often. In cases where etcd has bad performance and the data shouldnt be persistant (e.g: on CI and dev environments) it is recommanded [1] to use in-memory etcd To do that this commit: - Adds kubeadm ClusterConfiguration to kind config, setting etcd data directory to '/tmp/...' inside kind cluster nodes. '/tmp/' directory is already mounted to RAM memory as tmpfs. - 'KUBEVIRT_WITH_KIND_ETCD_IN_MEMORY', expected variables: "true", "false" controls running etcd in memory on kind providers. - 'ETD_DATA_DIR', expects directory path that mounted to RAM memory controls the path of etcd data directory inside kind cluster nodes Running etcd in memory should improve performance and will stabilize sriov provider and lanes. [1] kubernetes-sigs/kind#1922 Signed-off-by: Or Mergi <[email protected]>

ormergi · 2020-11-23T10:39:52Z

Changes:

Addressed comments:
- Typos
- Rename ETCD_DATA_DIR
- Add check that etcd data directory is mounted to RAM memory
Preform etcd data directory checks on all control-plane nodes

ormergi · 2020-11-23T10:40:10Z

/test check-up-kind-1.17-sriov

qinqon · 2020-11-23T13:39:22Z

/lgtm
/approve

kubevirt-bot · 2020-11-23T13:39:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qinqon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [qinqon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

qinqon · 2020-11-23T13:58:45Z

/hold cancel

kubevirt-bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Nov 15, 2020

kubevirt-bot requested review from davidvossel and dhiller November 15, 2020 19:33

kubevirt-bot added the size/XS label Nov 15, 2020

ormergi mentioned this pull request Nov 15, 2020

SRIOV lane: Mount in memory directory for backing in-memory etcd data kubevirt/project-infra#709

Closed

kubevirt-bot requested review from danielBelenky, EdDev and qinqon November 15, 2020 19:34

ormergi force-pushed the kind_infra_in_memory_etcd branch 2 times, most recently from 428988c to fe6fbe0 Compare November 17, 2020 08:28

kubevirt-bot added size/L and removed size/XS labels Nov 17, 2020

qinqon suggested changes Nov 17, 2020

View reviewed changes

kubevirt-bot assigned qinqon Nov 17, 2020

qinqon reviewed Nov 17, 2020

View reviewed changes

ormergi force-pushed the kind_infra_in_memory_etcd branch from fe6fbe0 to 1195f36 Compare November 17, 2020 11:05

ormergi force-pushed the kind_infra_in_memory_etcd branch from 1195f36 to cc2cf46 Compare November 17, 2020 12:37

ormergi requested a review from qinqon November 17, 2020 14:10

ormergi force-pushed the kind_infra_in_memory_etcd branch from cc2cf46 to e6e37c6 Compare November 17, 2020 14:47

qinqon suggested changes Nov 18, 2020

View reviewed changes

kubevirt-bot added dco-signoff: no Indicates the PR's author has not DCO signed all their commits. and removed dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Nov 18, 2020

ormergi force-pushed the kind_infra_in_memory_etcd branch 3 times, most recently from 6c40193 to 549396c Compare November 18, 2020 14:18

kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. and removed dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Nov 18, 2020

ormergi force-pushed the kind_infra_in_memory_etcd branch from b869c81 to 68bbc66 Compare November 22, 2020 09:10

kubevirt-bot added size/L and removed size/M labels Nov 22, 2020

ormergi requested a review from qinqon November 22, 2020 09:18

qinqon suggested changes Nov 23, 2020

View reviewed changes

ormergi force-pushed the kind_infra_in_memory_etcd branch from 68bbc66 to 64da9ce Compare November 23, 2020 09:34

ormergi requested a review from qinqon November 23, 2020 09:40

qinqon suggested changes Nov 23, 2020

View reviewed changes

ormergi force-pushed the kind_infra_in_memory_etcd branch from 64da9ce to 393d0cd Compare November 23, 2020 10:36

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 23, 2020

kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 23, 2020

kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 23, 2020

kubevirt-bot merged commit 2a57af4 into kubevirt:master Nov 23, 2020

ormergi mentioned this pull request Nov 25, 2020

bump kubevirtci kubevirt/kubevirt#4588

Merged

rmohr mentioned this pull request Dec 11, 2020

Move etcd on k8s-1.19 to in-memory storage #498

Merged

		docker exec "$CLUSTER_NAME-control-plane" df -h $(dirname $ETCD_DATA_DIR)
		docker exec "$CLUSTER_NAME-control-plane" du -h $ETCD_DATA_DIR

KIND infra, SRIOV proivder, Run in-memory etcd #478

KIND infra, SRIOV proivder, Run in-memory etcd #478

Conversation

ormergi commented Nov 15, 2020 • edited Loading

ormergi commented Nov 15, 2020

qinqon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ormergi Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ormergi Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ormergi Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oshoval Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ormergi Nov 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ormergi commented Nov 17, 2020

Choose a reason for hiding this comment

ormergi commented Nov 18, 2020 • edited Loading

qinqon commented Nov 20, 2020 • edited Loading

ormergi commented Nov 22, 2020 • edited Loading

ormergi commented Nov 22, 2020

ormergi commented Nov 22, 2020

ormergi commented Nov 22, 2020 • edited Loading

qinqon commented Nov 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ormergi commented Nov 23, 2020

ormergi commented Nov 23, 2020

ormergi commented Nov 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ormergi commented Nov 23, 2020 • edited Loading

ormergi commented Nov 23, 2020

qinqon commented Nov 23, 2020

kubevirt-bot commented Nov 23, 2020

qinqon commented Nov 23, 2020

ormergi commented Nov 15, 2020 •

edited

Loading

ormergi Nov 17, 2020 •

edited

Loading

ormergi Nov 17, 2020 •

edited

Loading

ormergi Nov 17, 2020 •

edited

Loading

oshoval Nov 17, 2020 •

edited

Loading

ormergi Nov 18, 2020 •

edited

Loading

ormergi commented Nov 18, 2020 •

edited

Loading

qinqon commented Nov 20, 2020 •

edited

Loading

ormergi commented Nov 22, 2020 •

edited

Loading

ormergi commented Nov 22, 2020 •

edited

Loading

qinqon commented Nov 23, 2020 •

edited

Loading

ormergi commented Nov 23, 2020 •

edited

Loading