Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add functionality to deploy cloud-sa secret and driver #705

Merged
merged 1 commit into from
Aug 2, 2019

Conversation

hantaowang
Copy link

@hantaowang hantaowang commented Jul 29, 2019

@davidz627 @wojtek-t @jingxu97 @verult

This PR adds the ability to test PVs provisioned and managed by the CSI GCP PD Driver. Included is a change to cl2's run-e2e.sh that deploys a secret (if it exists) and a yaml that deploys the driver and all its associates rbacs and SAs (one of which uses this secret).

I will rebase this after #657 is merged so that the ability to deploy the driver yaml (if necessary) is build into the storage test config. It will be turned off by default and an additional override file in volume-types/persistentvolume will turn it on.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 29, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @hantaowang. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 29, 2019
@k8s-ci-robot k8s-ci-robot requested review from krzysied and oxddr July 29, 2019 22:25
@hantaowang
Copy link
Author

/hold
for rebase

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 29, 2019
@hantaowang hantaowang force-pushed the deploy-csi-driver-cl2 branch from b422498 to 22457a4 Compare July 30, 2019 08:10
@hantaowang
Copy link
Author

Ive rebased it, but as it stands this PR 99% will not work. A couple to consider:

  1. what will happen if i hard code default into the driver yaml and the CL2 tries to deploy to a test namespace? The driver and its secrets should be in namespace default.
  2. if an error is raised or the test namespace overrides the hard coded one, then should i add functionality to manually specify a namespace in phase? This breaks the abstraction of phase a little because phasr allows multiple possible namespaces, hence a range.
  3. if not, then how can i deploy the driver? Possibly via a script to be run pre test, but then this script will be an exact copy of the deploy script in the pd driver repo.

@@ -22,5 +22,10 @@ CLUSTERLOADER_ROOT=$(dirname "${BASH_SOURCE}")
export KUBECONFIG="${KUBECONFIG:-${HOME}/.kube/config}"
export KUBEMARK_ROOT_KUBECONFIG="${KUBEMARK_ROOT_KUBECONFIG:-${HOME}/.kube/config}"

# Deploy a secret that contains the e2e Google credentials
if [[ -z "${E2E_GOOGLE_APPLICATION_CREDENTIALS}" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do we have access to this env-var here? [I hope we do, just asking if you verified it.]

  2. Please add :- to avoid using unset var (both here and below)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The env var is set by prow afaik and is inherited for all prow jobs.

- Identifier: WaitForRunningDriverPods
Method: WaitForRunningPods
Params:
desiredPodCount: {{SumInt .Nodes 1}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why N+1 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One driver per node plus a cluster wide driver.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth commenting in the code here

@@ -0,0 +1,2 @@
DEPLOY_CSI_DRIVER: true
CSI_DRIVER_PATH: "volume-types/persistentvolume/gcp-csi-driver-stable.yaml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add end-line here

@@ -0,0 +1,343 @@
apiVersion: v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42 @davidz627 - can you please review this file - I don't know enough about what exactly is needed and how to configure it to make it work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my latest comment. Tldr: it wont but raises an interesting problem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a copy paste of the stable driver :) Please comment at top of file a xref to the original files.

This is the correct set of yamls to deploy a driver into a cluster (along with the secret defined in that other file)

@wojtek-t
Copy link
Member

Ive rebased it, but as it stands this PR 99% will not work. A couple to consider:

  1. what will happen if i hard code default into the driver yaml and the CL2 tries to deploy to a test namespace? The driver and its secrets should be in namespace default.
  2. if an error is raised or the test namespace overrides the hard coded one, then should i add functionality to manually specify a namespace in phase? This breaks the abstraction of phase a little because phasr allows multiple possible namespaces, hence a range.
  3. if not, then how can i deploy the driver? Possibly via a script to be run pre test, but then this script will be an exact copy of the deploy script in the pd driver repo.

I missed this initially - those are good question.
I don't have very strong opinion here - I believe we should allow users to overwrite namespaces if they want (so not specifying namespace range would mean that we don't try to use automanaged ones).
OTOH, I would also be fine with running something as part run-e2e.sh before we trigger CL, but then we need some way to wait for waiting for them, so we may need to reimplement some stuff too...

@mm4tt - for thoughts too

@hantaowang
Copy link
Author

OTOH, I would also be fine with running something as part run-e2e.sh before we trigger CL, but then we need some way to wait for waiting for them, so we may need to reimplement some stuff too...

I should point out in our tests we do this by just waiting 60 seconds after deploying the driver. But i dont know if that will work for larger scale clusters.

@msau42
Copy link
Member

msau42 commented Jul 30, 2019

Keep in mind that eventually the driver will be bundled in gke. Which can be optin during gcloud container create

@msau42
Copy link
Member

msau42 commented Jul 30, 2019

Regarding waiting for the driver to be up, we can watch csinode and wait for the driver to be registered

@@ -22,5 +22,10 @@ CLUSTERLOADER_ROOT=$(dirname "${BASH_SOURCE}")
export KUBECONFIG="${KUBECONFIG:-${HOME}/.kube/config}"
export KUBEMARK_ROOT_KUBECONFIG="${KUBEMARK_ROOT_KUBECONFIG:-${HOME}/.kube/config}"

# Deploy a secret that contains the e2e Google credentials
if [[ -z "${E2E_GOOGLE_APPLICATION_CREDENTIALS}" ]]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log/echo a warning if this couldn't be found. Looks like if the file doesn't exist we just give up

- Identifier: WaitForRunningDriverPods
Method: WaitForRunningPods
Params:
desiredPodCount: {{SumInt .Nodes 1}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth commenting in the code here

@@ -0,0 +1,343 @@
apiVersion: v1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a copy paste of the stable driver :) Please comment at top of file a xref to the original files.

This is the correct set of yamls to deploy a driver into a cluster (along with the secret defined in that other file)

@hantaowang
Copy link
Author

Regarding waiting for the driver to be up, we can watch csinode and wait for the driver to be registered

If we just deploy with an addition script in run-e2e.sh, we can actually just use kubectl wait I believe. (https://www.jeffgeerling.com/blog/2018/updating-kubernetes-deployment-and-waiting-it-roll-out-shell-script) This also makes the config a lot simplier because CSI and non CSI configs will now look the same where as previously I would need an {{ if $CSI }} condition.

@wojtek-t
Copy link
Member

wojtek-t commented Aug 1, 2019

@hantaowang - did you verify that this actually works? I think that @mm4tt yesterday mentioned that having multiple objects in one file will not work with CL...

If we just deploy with an addition script in run-e2e.sh, we can actually just use kubectl wait I believe. (https://www.jeffgeerling.com/blog/2018/updating-kubernetes-deployment-and-waiting-it-roll-out-shell-script) This also makes the config a lot simplier because CSI and non CSI configs will now look the same where as previously I would need an {{ if $CSI }} condition.

I'm fine with that too. Technically, installing driver is more cluster-setup than test itself, so I would be fine with running something in run-e2e (if we can wait for it)

@hantaowang
Copy link
Author

@hantaowang - did you verify that this actually works? I think that @mm4tt yesterday mentioned that having multiple objects in one file will not work with CL...

No I believe with how object bundles work right now, it won't work. This was just to mock what deploying with CL2 would look like. The default namespace issue also presents a problem, so thats another thing that won't work.

I'll try the run-e2e.sh method and make sure that works, and then ill push here and remove the hold.

@hantaowang hantaowang closed this Aug 1, 2019
@hantaowang hantaowang force-pushed the deploy-csi-driver-cl2 branch from 22457a4 to 1fb9a5c Compare August 1, 2019 21:39
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 1, 2019
@hantaowang hantaowang reopened this Aug 1, 2019
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 1, 2019
@hantaowang
Copy link
Author

I tested this method and it does work. The deployment is built into run-e2e.sh. There is no tear down, but that should be fine - we just need to not run csi and non csi tests on the same cluster.

To use it, add --deploy-csi-driver as the first flag of run-e2e.sh. Then override the USE_CSI config variable.

/assign @wojtek-t
/cc @davidz627 @jingxu97
cc @msau42 @verult
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 1, 2019
@davidz627
Copy link

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 1, 2019
@@ -22,5 +22,19 @@ CLUSTERLOADER_ROOT=$(dirname "${BASH_SOURCE}")
export KUBECONFIG="${KUBECONFIG:-${HOME}/.kube/config}"
export KUBEMARK_ROOT_KUBECONFIG="${KUBEMARK_ROOT_KUBECONFIG:-${HOME}/.kube/config}"

# Deploy the GCP PD CSI Driver is required
if [ $1 = "--deploy-csi-driver" ]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how back params work but I feel like theres a better way to do this, doesn't this just check the first arg for an exact string match? What if I use the "single-dash" syntax instead.

Worth investigating other methods, I've seen other things before that seem to work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I not sure what other methods can work. i didnt want to check every argument although that is a possibility. Its a bit harder since that arg cant be passed to CL2, so it needs to be spliced out. Open to suggestions.

@@ -22,5 +22,19 @@ CLUSTERLOADER_ROOT=$(dirname "${BASH_SOURCE}")
export KUBECONFIG="${KUBECONFIG:-${HOME}/.kube/config}"
export KUBEMARK_ROOT_KUBECONFIG="${KUBEMARK_ROOT_KUBECONFIG:-${HOME}/.kube/config}"

# Deploy the GCP PD CSI Driver is required
if [ $1 = "--deploy-csi-driver" ]; then
echo "${E2E_GOOGLE_APPLICATION_CREDENTIALS}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to print the contents?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry, that was for testing

clusterloader2/run-e2e.sh Show resolved Hide resolved
@@ -13,6 +13,7 @@
{{$VOLUMES_PER_POD := .VOLUMES_PER_POD}}
{{$VOLUME_TEMPLATE_PATH := .VOLUME_TEMPLATE_PATH}}
{{$PROVISION_VOLUME := DefaultParam .PROVISION_VOLUME false}}
{{$USE_CSI := DefaultParam .USE_CSI false}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this param be more specific like DEPLOY_PD_CSI_DRIVER or something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the future, we may want to support more drivers to test, so its vague here in the config on purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

USE_CSI sounds fine - (in test we're not deploying anything, just potentially using the functionality)

@hantaowang hantaowang force-pushed the deploy-csi-driver-cl2 branch from 5032354 to 3a89b69 Compare August 2, 2019 00:49
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks fine - I added a couple comments

@@ -0,0 +1,354 @@
# This config generated from the GCP PD CSI Driver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of putting this file in the clusterloader2/ main dir.
Can you move it to "clusterloader2/drivers" or "cl2/csi/" or sth like that.

[e.g. cl2/testing/prometheus contains manifests

@@ -22,5 +22,18 @@ CLUSTERLOADER_ROOT=$(dirname "${BASH_SOURCE}")
export KUBECONFIG="${KUBECONFIG:-${HOME}/.kube/config}"
export KUBEMARK_ROOT_KUBECONFIG="${KUBEMARK_ROOT_KUBECONFIG:-${HOME}/.kube/config}"

# Deploy the GCP PD CSI Driver is required
if [ $1 = "--deploy-csi-driver" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this approach - other than supporting different flag, I prefer to be much more explicit.
Can we instead simply introduce a dedicated env var and use it.

e.g. "DEPLOY_GCI_DRIVER" (and just check if it is true).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ok to configure the test with env vars? I defaulted to flags because everything else was configured by flags but this seems like a more flexible solution.

@@ -22,5 +22,18 @@ CLUSTERLOADER_ROOT=$(dirname "${BASH_SOURCE}")
export KUBECONFIG="${KUBECONFIG:-${HOME}/.kube/config}"
export KUBEMARK_ROOT_KUBECONFIG="${KUBEMARK_ROOT_KUBECONFIG:-${HOME}/.kube/config}"

# Deploy the GCP PD CSI Driver is required
if [ $1 = "--deploy-csi-driver" ]; then
if [[ -z "${E2E_GOOGLE_APPLICATION_CREDENTIALS}" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-

otherwise you're risking that this will fail with unbounded variable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

echo "Env var E2E_GOOGLE_APPLICATION_CREDENTIALS must be set to deploy driver"
exit 1
else
kubectl create secret generic cloud-sa --from-file="${E2E_GOOGLE_APPLICATION_CREDENTIALS}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

clusterloader2/run-e2e.sh Show resolved Hide resolved
fi
kubectl apply -f ${CLUSTERLOADER_ROOT}/gcp-csi-driver-stable.yaml
kubectl wait pods -l app=gcp-compute-persistent-disk-csi-driver --for condition=Ready --timeout=300s
shift
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it do?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shift shifts the arguments by 1, essentially popping off the first arg. I will remove this for using env vars.

@@ -13,6 +13,7 @@
{{$VOLUMES_PER_POD := .VOLUMES_PER_POD}}
{{$VOLUME_TEMPLATE_PATH := .VOLUME_TEMPLATE_PATH}}
{{$PROVISION_VOLUME := DefaultParam .PROVISION_VOLUME false}}
{{$USE_CSI := DefaultParam .USE_CSI false}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

USE_CSI sounds fine - (in test we're not deploying anything, just potentially using the functionality)

@@ -0,0 +1 @@
USE_CSI: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add end of line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@hantaowang hantaowang force-pushed the deploy-csi-driver-cl2 branch 2 times, most recently from 454480e to 50d4d95 Compare August 2, 2019 08:30
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two additional minor comments

@@ -22,5 +22,17 @@ CLUSTERLOADER_ROOT=$(dirname "${BASH_SOURCE}")
export KUBECONFIG="${KUBECONFIG:-${HOME}/.kube/config}"
export KUBEMARK_ROOT_KUBECONFIG="${KUBEMARK_ROOT_KUBECONFIG:-${HOME}/.kube/config}"

# Deploy the GCP PD CSI Driver is required
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is/if/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

if [[ -z "${E2E_GOOGLE_APPLICATION_CREDENTIALS:-}" ]]; then
echo "Env var E2E_GOOGLE_APPLICATION_CREDENTIALS must be set to deploy driver"
exit 1
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove else - there is exit 1 above anyway

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@hantaowang hantaowang force-pushed the deploy-csi-driver-cl2 branch from 50d4d95 to 771484f Compare August 2, 2019 08:50
@wojtek-t
Copy link
Member

wojtek-t commented Aug 2, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 2, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hantaowang, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 2, 2019
@k8s-ci-robot k8s-ci-robot merged commit 572490b into kubernetes:master Aug 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants