Reenable kubeadm presubmit test. #2976

pipejakob · 2017-06-06T09:13:05Z

This had been previously disabled by #2568. Adding the job back to the bazel pipeline and reenabling.

Merging this is blocked by kubernetes/kubernetes#46864 which will fix kubeadm join, but I wanted to get the PR out earlier to get feedback and make sure I clear all presubmit checks.

spiffxp · 2017-06-07T22:43:34Z

/lgtm
/cc @fejta @krzyzacy

luxas · 2017-06-08T11:26:29Z

@pipejakob when the kubeadm is green again (that is after kubernetes/kubernetes#46879), we can merge this

luxas · 2017-06-08T11:26:52Z

/assign @mikedanese @roberthbailey

roberthbailey · 2017-06-09T08:16:10Z

lgtm once we are sure the test is working.

The latest run I see on testgrid is failing due to not having enough quota to even stand up VMs on which to run kubeadm:

W0609 02:00:23.099] Error applying plan:
W0609 02:00:23.099] 
W0609 02:00:23.099] 1 error(s) occurred:
W0609 02:00:23.099] 
W0609 02:00:23.100] * google_compute_instance.e2e-3791-master: Error creating instance: googleapi: Error 403: Quota 'CPUS' exceeded. Limit: 24.0, quotaExceeded
W0609 02:00:23.100] 
W0609 02:00:23.100] Terraform does not automatically rollback in the face of errors.
W0609 02:00:23.101] Instead, your Terraform state file has been partially updated with
W0609 02:00:23.101] any resources that successfully completed. Please address the error
W0609 02:00:23.101] above and apply again to incrementally change your infrastructure.
W0609 02:00:23.101] make[1]: *** [do] Error 1
W0609 02:00:23.102] make: *** [deploy-cluster] Error 2

This had been previously disabled by kubernetes#2568. Adding the job back to the bazel pipeline and reenabling. Also, remove sporadic trailing whitespace.

pipejakob · 2017-06-17T02:59:12Z

Rebased, and increased the quota for this project to 60 cores. I'm not sure how bursty presubmit runs will be, but prow makes no attempt to serialize runs of the same job, it just spawns jobs as quickly as PRs need.

luxas · 2017-06-17T06:31:32Z

/assign @fejta @krzyzacy
PTAL

krzyzacy · 2017-06-19T18:48:55Z

@pipejakob ready to merge?

roberthbailey · 2017-06-20T07:00:23Z

The tests against master are now running green.

luxas

Can we make it run automatically on cmd/kubeadmchanges?

Or will it already since bazel is always running and this job runs after the bazel one?

luxas · 2017-06-26T07:03:04Z

@pipejakob This job currently fails with:

W0626 06:58:51.035] 2017/06/26 06:58:51 util.go:129: Running: gcloud auth activate-service-account --key-file=/etc/service-account/service-account.json
W0626 06:58:51.670] Activated service account credentials for: [[email protected]]
W0626 06:58:51.696] 2017/06/26 06:58:51 util.go:131: Step 'gcloud auth activate-service-account --key-file=/etc/service-account/service-account.json' finished in 660.196024ms
W0626 06:58:51.745] 2017/06/26 06:58:51 main.go:161: Saved XML output to /workspace/k8s.io/kubernetes/_artifacts/junit_runner.xml.
W0626 06:58:51.746] 2017/06/26 06:58:51 util.go:198: Running: bash -c . hack/lib/version.sh && KUBE_ROOT=. kube::version::get_version_vars && echo "${KUBE_GIT_VERSION-}"
W0626 06:58:52.251] 2017/06/26 06:58:52 util.go:200: Step 'bash -c . hack/lib/version.sh && KUBE_ROOT=. kube::version::get_version_vars && echo "${KUBE_GIT_VERSION-}"' finished in 504.717402ms
W0626 06:58:52.251] 2017/06/26 06:58:52 main.go:195: Something went wrong: failed to acquire k8s binaries: open /go/src/k8s.io/kubernetes/_output/gcs-stage: no such file or directory
W0626 06:58:52.252] +(/workspace/e2e-runner.sh:1): main(): chmod -R o+r /workspace/k8s.io/kubernetes/_artifacts
W0626 06:58:52.255] Traceback (most recent call last):
W0626 06:58:52.256]   File "/workspace/./test-infra/jenkins/../scenarios/kubernetes_e2e.py", line 535, in <module>
W0626 06:58:52.256]     main(parse_args())
W0626 06:58:52.257]   File "/workspace/./test-infra/jenkins/../scenarios/kubernetes_e2e.py", line 449, in main
W0626 06:58:52.257]     mode.start(runner_args)
W0626 06:58:52.257]   File "/workspace/./test-infra/jenkins/../scenarios/kubernetes_e2e.py", line 219, in start
W0626 06:58:52.258]     check_env(env, self.runner, *args)
W0626 06:58:52.258]   File "/workspace/./test-infra/jenkins/../scenarios/kubernetes_e2e.py", line 56, in check_env
W0626 06:58:52.258]     subprocess.check_call(cmd, env=env)
W0626 06:58:52.258]   File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
W0626 06:58:52.277]     raise CalledProcessError(retcode, cmd)
W0626 06:58:52.278] subprocess.CalledProcessError: Command '('/workspace/e2e-runner.sh', '--up', '--down', '--extract=local', '--kubernetes-anywhere-kubernetes-version=latest', '--deployment=kubernetes-anywhere', '--timeout=55m', '--check-leaked-resources=false', '--kubernetes-anywhere-path=/workspace/kubernetes-anywhere', '--kubernetes-anywhere-phase2-provider=kubeadm', '--kubernetes-anywhere-cluster=e2e-5296', '--kubernetes-anywhere-kubeadm-version=gs://kubernetes-release-dev/bazel/48042/master:53a66020e4bf54d66aab5b9f625af7d10ed4c3f5,48042:c257eb358f6cebe35aa871a817c04c17147d98bc/bin/linux/amd64/')' returned non-zero exit status 1
E0626 06:58:52.282] Build failed
I0626 06:58:52.282] process 471 exited with code 1 after 0.0m
E0626 06:58:52.283] FAIL: pull-kubernetes-e2e-kubeadm-gce
I0626 06:58:52.283] Upload result and artifacts...
I0626 06:58:52.284] Gubernator results at https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/48042/pull-kubernetes-e2e-kubeadm-gce/5296
I0626 06:58:52.285] Call:  gsutil -m -q -o GSUtil:use_magicfile=True cp -r -c -z log,txt,xml _artifacts gs://kubernetes-jenkins/pr-logs/pull/48042/pull-kubernetes-e2e-kubeadm-gce/5296/artifacts
I0626 06:58:53.877] process 544 exited with code 0 after 0.0m
I0626 06:58:53.878] Call:  git rev-parse HEAD
I0626 06:58:53.885] process 888 exited with code 0 after 0.0m
... skipping 2 lines ...
I0626 06:58:54.785] Call:  gsutil -q cat 'gs://kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-kubeadm-gce/jobResultsCache.json#1497892018921423'
I0626 06:58:55.972] process 1027 exited with code 0 after 0.0m
I0626 06:58:55.980] Call:  gsutil -q -h Content-Type:application/json -h x-goog-if-generation-match:1497892018921423 cp - gs://kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-kubeadm-gce/jobResultsCache.json
I0626 06:58:57.736] process 1167 exited with code 0 after 0.0m
I0626 06:58:57.738] Call:  gsutil stat gs://kubernetes-jenkins/pr-logs/pull/48042/pull-kubernetes-e2e-kubeadm-gce/jobResultsCache.json
W0626 06:58:58.679] No URLs matched: gs://kubernetes-jenkins/pr-logs/pull/48042/pull-kubernetes-e2e-kubeadm-gce/jobResultsCache.json
E0626 06:58:58.679] Build failed
I0626 06:58:58.679] process 1337 exited with code 1 after 0.0m
I0626 06:58:58.680] Call:  gsutil -q -h Content-Type:application/json -h x-goog-if-generation-match:0 cp - gs://kubernetes-jenkins/pr-logs/pull/48042/pull-kubernetes-e2e-kubeadm-gce/jobResultsCache.json
I0626 06:59:00.138] process 1475 exited with code 0 after 0.0m
I0626 06:59:00.139] Call:  gsutil -q -h Content-Type:application/json cp - gs://kubernetes-jenkins/pr-logs/pull/48042/pull-kubernetes-e2e-kubeadm-gce/5296/finished.json
I0626 06:59:01.516] process 1645 exited with code 0 after 0.0m
I0626 06:59:01.517] Call:  gsutil -q -h Content-Type:text/plain -h 'Cache-Control:private, max-age=0, no-transform' cp - gs://kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-kubeadm-gce/latest-build.txt

ref: https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/48042/pull-kubernetes-e2e-kubeadm-gce/5296/?log#log

timothysc · 2017-06-27T16:37:25Z

I'll bring this up in sig-testing today. /cc @fejta

pipejakob · 2017-06-27T17:35:27Z

As part of the (completely understandable) security lockdown of our prow cluster, I've lost my direct access recently. I just pinged @fejta + @spxtr out of band to request temporary access to debug and fix up these recent failures.

ixdy · 2017-06-27T20:28:11Z

when last I looked at this job, I think I concluded that --extract=local is being misused here (with kubetest), but I don't know what would be more appropriate.

timothysc · 2017-07-10T17:56:02Z

What is the state on this I think this should be an imperative for 1.8.

luxas · 2017-07-10T18:25:24Z

Jacob just said he will take a look and update

…

On 10 Jul 2017, at 20:56, Timothy St. Clair ***@***.***> wrote: What is the state on this I think this should be an imperative for 1.8. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

pipejakob · 2017-07-12T03:22:32Z

I've been diving into debugging this and found a few issues that I need to fix (beyond just rebasing). It might take a little longer than expected.

spiffxp · 2017-07-26T03:34:59Z

/unassign

luxas · 2017-07-26T07:02:22Z

ping @pipejakob Are you able to look at this anytime soon?
Otherwise we should assign someone else to take a shot at it; will be crucial to have soon

cc @timothysc @roberthbailey

pipejakob · 2017-07-27T20:36:04Z

A quick update on this: the presubmit job was using --extract local, which means to use the local artifacts from the current build, which we don't do. I'm not sure how a build was being triggered before (since I wasn't passing --build to kubetest), but the e2es were passing before, and now the binaries can't be found, which makes sense.

One easy option is to use --extract ci/latest to just grab the latest e2e.test from another CI build, but that has a lot of downsides. It wouldn't actually exercise any e2e test changes in the current PR, which means someone could still very easily merge a breaking change, and the PR to fix it would still be failing the e2e tests.

This job is chained off of the existing bazel presubmit test, so another option (my preference) is to add a new kubetest extractStrategy to be able to pull the bazel build that was already run for the candidate PR and reuse those binaries.

We could also repeat the build in the kubeadm e2e, but that's also problematic: our make release and make quick-release builds use the Docker build image, but this job already runs within a container, and EngProd generally advises against using Docker-in-Docker. The bazel build doesn't require launching a new container, but then we have to make sure that the bazel version and environment stays well in sync with the existing bazel build image so that the build matches.

I'm open to other ideas, but I think the best option is to add support for the new kubetest extraction strategy to reuse the binaries built during the bazel presubmit job. I'll start coding that up.

luxas · 2017-07-31T15:03:02Z

I'm open to other ideas, but I think the best option is to add support for the new kubetest extraction strategy to reuse the binaries built during the bazel presubmit job. I'll start coding that up.

@pipejakob SGTM

fejta · 2017-07-31T16:19:54Z

I would like to see the build job do kubetest --stage=gs://something-specific-to-the-pr and then all the chained e2e jobs do kubetest --extract=gs://something-specific-to-the-pr

pipejakob · 2017-07-31T17:00:31Z

@fejta Thanks for the suggestion. I think that'll turn out more simple than what I was doing. I'll give it a shot.

fejta · 2017-08-07T05:51:11Z

FYI please sync up with @BenTheElder who is also interested in refactoring the e2e jobs to do this --extracting

fejta · 2017-08-07T05:51:38Z

/cc @BenTheElder

spxtr · 2017-08-23T17:39:05Z

/test all

k8s-ci-robot · 2017-08-23T17:40:39Z

@pipejakob: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-test-infra-verify-bazel	`5e0a9cf`	link	`/test pull-test-infra-verify-bazel`
pull-test-infra-bazel	`5e0a9cf`	link	`/test pull-test-infra-bazel`
pull-test-infra-verify-gofmt	`5e0a9cf`	link	`/test pull-test-infra-verify-gofmt`
pull-test-infra-verify-govet	`5e0a9cf`	link	`/test pull-test-infra-verify-govet`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

luxas · 2017-08-23T18:50:22Z

@pipejakob have you had a chance to look at this yet?

pipejakob · 2017-08-23T19:02:00Z

My related PRs to support using the correct e2e.test binary have now been merged, but this is going to be blocked on kubernetes/kubernetes#50760, since we currently can't get a green run of kubeadm jobs (despite all e2e tests passing), so I don't want to add in an already failing presubmit blocker.

Since this PR is particularly painful to rebase and keeps hitting conflicts with changes to config.yaml, I'd like to wait until we get kubernetes/kubernetes#50760 sorted out before moving forward. I can close this for now to get it out of people's review queues and reopen when it's actually ready to be reviewed again.

luxas · 2017-08-23T19:03:10Z

Okay, thanks @pipejakob

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 6, 2017

pipejakob force-pushed the reenable-kubeadm-pull branch from 9dd42a6 to 77e23e4 Compare June 6, 2017 09:21

k8s-ci-robot requested review from fejta and krzyzacy June 7, 2017 22:43

k8s-ci-robot assigned spiffxp Jun 7, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 7, 2017

k8s-ci-robot assigned mikedanese and roberthbailey Jun 8, 2017

pipejakob mentioned this pull request Jun 17, 2017

Create kubeadm presubmit e2e job kubernetes/kubeadm#250

Closed

Reenable kubeadm presubmit test.

5e0a9cf

This had been previously disabled by kubernetes#2568. Adding the job back to the bazel pipeline and reenabling. Also, remove sporadic trailing whitespace.

pipejakob force-pushed the reenable-kubeadm-pull branch from 77e23e4 to 5e0a9cf Compare June 17, 2017 02:55

k8s-ci-robot assigned fejta and krzyzacy Jun 17, 2017

luxas reviewed Jun 20, 2017

View reviewed changes

k8s-ci-robot unassigned spiffxp Jul 26, 2017

pipejakob mentioned this pull request Aug 4, 2017

Add support for pulling bazel-published artifacts. kubernetes/kubernetes#49884

Closed

k8s-ci-robot requested a review from BenTheElder August 7, 2017 05:51

pipejakob closed this Aug 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reenable kubeadm presubmit test. #2976

Reenable kubeadm presubmit test. #2976

pipejakob commented Jun 6, 2017

spiffxp commented Jun 7, 2017

luxas commented Jun 8, 2017

luxas commented Jun 8, 2017

roberthbailey commented Jun 9, 2017

pipejakob commented Jun 17, 2017

luxas commented Jun 17, 2017

krzyzacy commented Jun 19, 2017

roberthbailey commented Jun 20, 2017

luxas left a comment

luxas commented Jun 26, 2017

timothysc commented Jun 27, 2017

pipejakob commented Jun 27, 2017

ixdy commented Jun 27, 2017 •

edited

Loading

timothysc commented Jul 10, 2017

luxas commented Jul 10, 2017 via email

pipejakob commented Jul 12, 2017

spiffxp commented Jul 26, 2017

luxas commented Jul 26, 2017

pipejakob commented Jul 27, 2017

luxas commented Jul 31, 2017

fejta commented Jul 31, 2017

pipejakob commented Jul 31, 2017

fejta commented Aug 7, 2017

fejta commented Aug 7, 2017

spxtr commented Aug 23, 2017

k8s-ci-robot commented Aug 23, 2017

luxas commented Aug 23, 2017

pipejakob commented Aug 23, 2017

luxas commented Aug 23, 2017

Reenable kubeadm presubmit test. #2976

Reenable kubeadm presubmit test. #2976

Conversation

pipejakob commented Jun 6, 2017

spiffxp commented Jun 7, 2017

luxas commented Jun 8, 2017

luxas commented Jun 8, 2017

roberthbailey commented Jun 9, 2017

pipejakob commented Jun 17, 2017

luxas commented Jun 17, 2017

krzyzacy commented Jun 19, 2017

roberthbailey commented Jun 20, 2017

luxas left a comment

Choose a reason for hiding this comment

luxas commented Jun 26, 2017

timothysc commented Jun 27, 2017

pipejakob commented Jun 27, 2017

ixdy commented Jun 27, 2017 • edited Loading

timothysc commented Jul 10, 2017

luxas commented Jul 10, 2017 via email

pipejakob commented Jul 12, 2017

spiffxp commented Jul 26, 2017

luxas commented Jul 26, 2017

pipejakob commented Jul 27, 2017

luxas commented Jul 31, 2017

fejta commented Jul 31, 2017

pipejakob commented Jul 31, 2017

fejta commented Aug 7, 2017

fejta commented Aug 7, 2017

spxtr commented Aug 23, 2017

k8s-ci-robot commented Aug 23, 2017

luxas commented Aug 23, 2017

pipejakob commented Aug 23, 2017

luxas commented Aug 23, 2017

ixdy commented Jun 27, 2017 •

edited

Loading