Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KIND infra, SRIOV proivder, Run in-memory etcd #478

Merged
merged 3 commits into from
Nov 23, 2020

Conversation

ormergi
Copy link
Contributor

@ormergi ormergi commented Nov 15, 2020

Currently we encounter bad performance of KIND cluster on DinD setup,
we get 'etcdserver: timeout errors' on SRIOV lane jobs causing jobs to fail often.

In such cases it is recommended to run in-memory etcd
kubernetes-sigs/kind#1922
kubernetes-sigs/kind#845
Running etcd in memory should improve performance and make sriov lane more stabilized

This PR patch the kind-config yaml for creating a cluster to run etcd from a specified path
which will be backed by in memory volume kubevirt/project-infra#709

@ormergi
Copy link
Contributor Author

ormergi commented Nov 15, 2020

@ormergi ormergi force-pushed the kind_infra_in_memory_etcd branch 2 times, most recently from 428988c to fe6fbe0 Compare November 17, 2020 08:28
Copy link
Contributor

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, have some nits and a big question.

Comment on lines 159 to 173
mkdir -p ${KUBEVIRTCI_CONFIG_PATH}/$KUBEVIRT_PROVIDER/cni-plugins
curl -L https://github.com/containernetworking/plugins/releases/download/v0.8.5/cni-plugins-linux-amd64-v0.8.5.tgz | tar xz -C ${KUBEVIRTCI_CONFIG_PATH}/$KUBEVIRT_PROVIDER/cni-plugins
for node in $(_get_nodes | awk '{print $1}'); do
docker exec $node /bin/sh -c "curl -L https://github.com/containernetworking/plugins/releases/download/v0.8.5/cni-plugins-linux-amd64-v0.8.5.tgz | tar xz -C /opt/cni/bin"
docker cp ${KUBEVIRTCI_CONFIG_PATH}/$KUBEVIRT_PROVIDER/cni-plugins/. $node:/opt/cni/bin
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related and leaves a tarball and a directory there, also the number of nodes we usually have is quite small so is like 2 curl + unpacks vs 1 curl + unpack.

Copy link
Contributor Author

@ormergi ormergi Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent was to reduce as much time possible for cluster-up to finish so I prefer to curl and unpack one time and only then copy the content
I can add remove the tar and the directory right after the for loop, is that ok with you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well but this is really not related to run etcd in memory, we can do that at different PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem I will remove this commit

@@ -210,9 +196,91 @@ function setup_kind() {
prepare_config
}

function ensure_memory_dir_for_etcd_data() {
if [ "$ETCD_DATA_DIR" == none ];then
echo "ETCD_DATA_DIR is not defined" && exit 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better two lines

echo "ETCD_DATA_DIR is not defined"
exit 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -15,6 +15,8 @@ REGISTRY_NAME=${CLUSTER_NAME}-registry
MASTER_NODES_PATTERN="control-plane"
WORKER_NODES_PATTERN="worker"

ETCD_DATA_DIR=${ETCD_DATA_DIR:-none}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this, just check if the variable set or not before use it

[  -z "$ETCD_DATA_DIR" ]

Copy link
Contributor Author

@ormergi ormergi Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to kind/common.sh for better visibility of that variable, so if one will review for example ensure_memory_dir_for_etcd_data function it will be easier to understand from where this variable comes form
What you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to do that, then just throw an error here is the variable is not set.

@@ -210,9 +196,91 @@ function setup_kind() {
prepare_config
}

function ensure_memory_dir_for_etcd_data() {
if [ "$ETCD_DATA_DIR" == none ];then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: [ -z "$ETCD_DATA_DIR" ]

ls -lsah $ETCD_DATA_DIR

echo "mount etcd data directory as tempfs"
mount -t tmpfs tmpfs $ETCD_DATA_DIR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am wrong but when we run this on prow the tmpfs is done by k8s when we crate the volume of typoe Memory, right, so we don't need to do this ourselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question :)
I added this logic to support development environments flow where you dont have k8s to
create this mount for you.
On local dev env this function will create a directory and mount it,
if the mount already exists it will delete the mount content to prevent contradicts with new data.
On CI since we ask from the k8s to create a mount for us,
this function will check that the mount exists, delete the mount content to prevent contradicts with new data
and prints its details

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at local env we have to also remove it after we do cluster-down right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You right I will add a clean up for this on cluster-down


setup_kind

export ETCD_DATA_DIR="/mnt/etcd"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happend if we open more than one kind cluster ? they cannot share the etcd directory. Let's better add something that is unique per kind cluster.

Copy link
Contributor Author

@ormergi ormergi Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we deploy each job at a time on a CI node which means that there could be only one kind cluster at a given time.

I think we should tackle this when we would need to support (if ever) multiple sriov lane jobs on single CI job.
Also it should be changed together with changes we might what to apply such as handling the cluster name (should be unique),
attaching PFs to each cluster (check which is available)
and any other change we would need to do.
What you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you also have assumptions here which could be avoided.

  1. That the prev deployment was properly cleaned.
  2. That someone experiments on the node and runs once manually and once through the CI.
    3 .The above unintentionally... i.e. bugs.

As the option of this to occur does exist, it makes sense to do it right from the beginning. I would argue that not accounting for this in the first place in the original deployment scripts was the wrong approach, so we should try and not repeat it now.

Of course, you can reason not to do it now because it requires too much effort.
I would also not attempt to do/fix this until all works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should see if running etcd in memory works for us and actually solve the problem,
If it will and the lane will be green most of the time we could start to think about more optimizations (running more then one sriov lane job on a single CI node)

Note that the amount of sriov lane jobs that could run on single CI node is bound to the number of PF's sriov card provides which is two at the current state.

Copy link
Contributor

@oshoval oshoval Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am already working on running multi instances on one node (already works locally), POC soon,
as was raised in the CI meeting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think of, if two sriov lanes run at same node they will have different pods so the are not going to share the /mnt/etcd but still we have beneficts from using different etcd directories, if it's not too much effort.

Also let's add more info to the name and use something like /mnt/kubevirt-kind...-[unique name]-etcd. One thing you can use is the is the container of the first master or the like or the one from the registry container.

Copy link
Contributor Author

@ormergi ormergi Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should address this when we will work on supporting running multiple jobs on the same CI node
especially when we dont know in what way we are going to do it.
The way I see it, we need to manipulate sriov-cni on the CI cluster so it will attach single PF for each prow job pod
than we dont need to change the current code and the whole process will be transparent.
And like you said since each job run on the different pod there shouldn't be a problem with the directories
What you think?

I can add another directory for visibility so the path will be like /mnt/kubevirt-kind/etcd
is that ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @qinqon mentioned that it is not about running multiple lanes in parallel, so I do not understand why you refer to that in your answer.

Please reason why there is no need for an unique, per-run folder name in this case, regardless of usage by other features we may encounter in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On CI unique names is not a must since we create in memory directory using emptyDir volume API (on CI cluster) on the prow job pod kubevirt/project-infra#709

After discussing about it with Quique I realized we unique name for local environments in case cluster-up is stopped in the middle for any reason.
I pushed new commit to resolve this

@ormergi ormergi force-pushed the kind_infra_in_memory_etcd branch from fe6fbe0 to 1195f36 Compare November 17, 2020 11:05
@ormergi
Copy link
Contributor Author

ormergi commented Nov 17, 2020

Changes:

  • Removed first commit that address cni-plugins downloads
  • Added teardown login for ectd data mount that will be invoked during make cluster-down

@ormergi ormergi force-pushed the kind_infra_in_memory_etcd branch from 1195f36 to cc2cf46 Compare November 17, 2020 12:37
@ormergi ormergi requested a review from qinqon November 17, 2020 14:10
@ormergi ormergi force-pushed the kind_infra_in_memory_etcd branch from cc2cf46 to e6e37c6 Compare November 17, 2020 14:47

setup_kind

export ETCD_DATA_DIR="/mnt/etcd"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think of, if two sriov lanes run at same node they will have different pods so the are not going to share the /mnt/etcd but still we have beneficts from using different etcd directories, if it's not too much effort.

Also let's add more info to the name and use something like /mnt/kubevirt-kind...-[unique name]-etcd. One thing you can use is the is the container of the first master or the like or the one from the registry container.

@kubevirt-bot kubevirt-bot added dco-signoff: no Indicates the PR's author has not DCO signed all their commits. and removed dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Nov 18, 2020
@ormergi ormergi force-pushed the kind_infra_in_memory_etcd branch 3 times, most recently from 6c40193 to 549396c Compare November 18, 2020 14:18
@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. and removed dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Nov 18, 2020
@ormergi
Copy link
Contributor Author

ormergi commented Nov 18, 2020

Changes:

  • Use unique names when creating etcd data directory for the running kind cluster
    now the ectd data dir will path will be constructed by $BASE_ETCD_DATA_DIR/$ETCD_DATA_DIR
  • Add trap to teardown etcd in memory dir mount

Please let me know if I should squash the commits

@qinqon
Copy link
Contributor

qinqon commented Nov 20, 2020

@ormergi, what do you mean ? we cannot mount Memory at prow into /var/lib/etcd or that we cannot use containerPath: /var/lib/etcd since kind node container has already the directory in place?

@ormergi
Copy link
Contributor Author

ormergi commented Nov 22, 2020

@ormergi, what do you mean ? we cannot mount Memory at prow into /var/lib/etcd or that we cannot use containerPath: /var/lib/etcd since kind node container has already the directory in place?

We can mount /var/lib/etcd from prow job pod to kind containers only through extraMount,
on kind containers /var/lib/etcd is directory inside the container and does not shard or mounted from the host

But it dosent matter anymore because I found a way to run etcd in memory w/o the need to create a mount! :)

@ormergi ormergi force-pushed the kind_infra_in_memory_etcd branch from b869c81 to 68bbc66 Compare November 22, 2020 09:10
@ormergi
Copy link
Contributor Author

ormergi commented Nov 22, 2020

Changes:

  • Remove the need to create tmpfs mount to back etcd data directory and teardown and reladed commits
  • Adding kubeadm ClusterConfiguration to kind config yaml that points etcd data dir to /tmp/... inside kind
    cluster nodes
    , which is already mounted as tmpfs :).
    To enable it on any kind provider simply add RUN_ETCD_IN_MEMORY=true to provider.sh

@ormergi
Copy link
Contributor Author

ormergi commented Nov 22, 2020

/test check-up-kind-1.17-sriov

@ormergi ormergi requested a review from qinqon November 22, 2020 09:18
@ormergi
Copy link
Contributor Author

ormergi commented Nov 22, 2020

@qinqon
Copy link
Contributor

qinqon commented Nov 23, 2020

@ormergi this means that we don't need to mount a volume at the prow job ?

I understand that what /tmp means inside the kind node depends what the kind project want to do with it, are we sure they are not going to change it ? In the future they can mount a container volume that is writing to disk and then we are at square one.

Although the solution is better Thant what we have now and is self contained.


setup_kind

RUN_ETCD_IN_MEMORY="true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this and leave true as default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -15,6 +15,8 @@ REGISTRY_NAME=${CLUSTER_NAME}-registry
MASTER_NODES_PATTERN="control-plane"
WORKER_NODES_PATTERN="worker"

RUN_ETCD_IN_MEMORY=${RUN_ETCD_IN_MEMORY:-"false"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the variables that can be set outside of providers are prefixed with KUBEVIRT, let's rename this KUBEVIRT_WITH_KIND_ETCD_IN_MEMORY, also set the default to true at the end is what we want, if there is a problem we can go back to false changing just the prow job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ormergi
Copy link
Contributor Author

ormergi commented Nov 23, 2020

@ormergi this means that we don't need to mount a volume at the prow job ?

I understand that what /tmp means inside the kind node depends what the kind project want to do with it, are we sure they are not going to change it ? In the future they can mount a container volume that is writing to disk and then we are at square one.

Although the solution is better Thant what we have now and is self contained.

Sorry I didnt mean to say the obvious about /tmp I am sure you familiar with how it works.
This was suggested by KIND it also suggested by kind kubernetes-sigs/kind#845 (comment) like we already know etcd will write to disk by default
as far as I know there is no intent to change this API, if they will so like any other API change we will adapt.
I also understand that Kind is also proposed for production and should survive restart (at least)
Even if they will come up with container volume /tmp should be mounted to RAM in any case
so I think we good

@ormergi ormergi force-pushed the kind_infra_in_memory_etcd branch from 68bbc66 to 64da9ce Compare November 23, 2020 09:34
@ormergi
Copy link
Contributor Author

ormergi commented Nov 23, 2020

Changes:

  • Squashed last two commits and updated the commit message accordingly
  • Address comments:
    • Present KUBEVIRT_WITH_KIND_ETCD_IN_MEMORY env var to control etcd in memory form kubevirt
    • Add a check at cluster-up (at kind_up) that verifies the mount actually exists

@ormergi ormergi requested a review from qinqon November 23, 2020 09:40
@ormergi
Copy link
Contributor Author

ormergi commented Nov 23, 2020

/test check-up-kind-1.17-sriov

@@ -15,6 +15,9 @@ REGISTRY_NAME=${CLUSTER_NAME}-registry
MASTER_NODES_PATTERN="control-plane"
WORKER_NODES_PATTERN="worker"

KUBEVIRT_WITH_KIND_ETCD_IN_MEMORY=${KUBEVIRT_WITH_KIND_ETCD_IN_MEMORY:-"true"}
ETCD_DATA_DIR="/tmp/kind-cluster-etcd"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to ETCD_IN_MEMORY_DATA_DIR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

echo "Checking KIND cluster etcd data is mounted to RAM: $ETCD_DATA_DIR"
docker exec "$CLUSTER_NAME-control-plane" df -h $(dirname $ETCD_DATA_DIR)
docker exec "$CLUSTER_NAME-control-plane" du -h $ETCD_DATA_DIR
[ $(echo $?) != 0 ] && echo "falid to check etcd data directory" && return 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 162 to 163
docker exec "$CLUSTER_NAME-control-plane" df -h $(dirname $ETCD_DATA_DIR)
docker exec "$CLUSTER_NAME-control-plane" du -h $ETCD_DATA_DIR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will check that it's mounted but not that it's mounted in memory, can you check that it's using tmpfs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem

Currently we encounter bad performance of etcd on
sriov provider cluster on DinD setup.
We get 'etcdserver: timeout errors' that causes jobs to
fail often.
In cases where etcd has bad performance and the data shouldnt
be persistant (e.g: on CI and dev environments) it is recommanded [1]
to use in-memory etcd

To do that this commit:
-  Adds kubeadm ClusterConfiguration to kind config, setting
   etcd data directory to '/tmp/...' inside kind cluster nodes.
   '/tmp/' directory is already mounted to RAM memory as tmpfs.

- 'KUBEVIRT_WITH_KIND_ETCD_IN_MEMORY', expected variables: "true", "false"
   controls running etcd in memory on kind providers.

- 'ETD_DATA_DIR', expects directory path that mounted to RAM memory
   controls the path of etcd data directory inside kind cluster nodes

Running etcd in memory should improve performance and
will stabilize sriov provider and lanes.

[1] kubernetes-sigs/kind#1922

Signed-off-by: Or Mergi <[email protected]>
@ormergi ormergi force-pushed the kind_infra_in_memory_etcd branch from 64da9ce to 393d0cd Compare November 23, 2020 10:36
@ormergi
Copy link
Contributor Author

ormergi commented Nov 23, 2020

Changes:

  • Addressed comments:
    • Typos
    • Rename ETCD_DATA_DIR
    • Add check that etcd data directory is mounted to RAM memory
  • Preform etcd data directory checks on all control-plane nodes

@ormergi
Copy link
Contributor Author

ormergi commented Nov 23, 2020

/test check-up-kind-1.17-sriov

@qinqon
Copy link
Contributor

qinqon commented Nov 23, 2020

/lgtm
/approve

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 23, 2020
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qinqon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 23, 2020
@qinqon
Copy link
Contributor

qinqon commented Nov 23, 2020

/hold cancel

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 23, 2020
@kubevirt-bot kubevirt-bot merged commit 2a57af4 into kubevirt:master Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants