Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull-kubernetes-e2e-kops-aws failed for 1.19 #9526

Closed
jqmichael opened this issue Jul 8, 2020 · 11 comments · Fixed by kubernetes/test-infra#18222 or #9535
Closed

pull-kubernetes-e2e-kops-aws failed for 1.19 #9526

jqmichael opened this issue Jul 8, 2020 · 11 comments · Fixed by kubernetes/test-infra#18222 or #9535

Comments

@jqmichael
Copy link

Hey kops experts,

Could anyone help me understand why the presubmit job below consistantly failed for 1.19?

https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/91513/pull-kubernetes-e2e-kops-aws/1280849041231450112/

The build job output seemed to suggest kube-apiserver isn't up running for some reason? If so, any hints where I could go find kube-apiserver/kubelet logs?

W0708 13:12:49.909] cannot determine hash for "https://storage.googleapis.com/kubernetes-release-pull/ci/pull-kubernetes-e2e-kops-aws/v1.19.0-beta.2.793+01d6753b5c7271/bin/linux/arm64/kubelet" (have you specified a valid file location?)

W0708 13:12:49.939] 2020/07/08 13:12:49 process.go:155: Step '/workspace/kops create cluster --name e2e-bd9ab6df6f-dba53.test-cncf-aws.k8s.io --ssh-public-key /workspace/.ssh/kube_aws_rsa.pub --node-count 4 --node-volume-size 48 --master-volume-size 48 --master-count 1 --zones sa-east-1c --master-size c5.large --kubernetes-version https://storage.googleapis.com/kubernetes-release-pull/ci/pull-kubernetes-e2e-kops-aws/v1.19.0-beta.2.793+01d6753b5c7271 --admin-access 35.226.123.140/32 --cloud aws --override cluster.spec.nodePortAccess=0.0.0.0/0 --yes' finished in 9.795259329s

W0708 13:12:49.945] 2020/07/08 13:12:49 process.go:153: Running: kubectl -n kube-system get pods -ojson -l k8s-app=kops-controller

W0708 13:12:50.739] The connection to the server localhost:8080 was refused - did you specify the right host or port?

W0708 13:12:50.759] 2020/07/08 13:12:50 kubernetes.go:117: kubectl get pods failed: error during kubectl -n kube-system get pods -ojson -l k8s-app=kops-controller: exit status 1

W0708 13:12:50.802] W0708 13:12:50.802076    7513 vfs_castore.go:604] CA private key was not found
W0708 13:12:50.803] 
W0708 13:12:50.803] cannot find CA certificate

Thanks

1. What kops version are you running? The command kops version, will display
this information.

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

3. What cloud provider are you using?

4. What commands did you run? What is the simplest way to reproduce this issue?

5. What happened after the commands executed?

6. What did you expect to happen?

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

@hakman
Copy link
Member

hakman commented Jul 8, 2020

Hey @jqmichael. The problem is that your setup doesn't build the ARM64 binaries. Not quite sure how the build part works in that job or if it is possible to build ARM64 or cross-builld .

@jqmichael
Copy link
Author

jqmichael commented Jul 8, 2020

Hey @jqmichael. The problem is that your setup doesn't build the ARM64 binaries. Not quite sure how the build part works in that job or if it is possible to build ARM64 or cross-builld .

Ah, thanks @hakman. Wasn't sure if that message was red-herring. Let me look into this.

@rifelpet

@hakman
Copy link
Member

hakman commented Jul 8, 2020

Btw, any special reason you need to run this test, just curious. :)
This may also be of interest: #9510

@jqmichael
Copy link
Author

@hakman

That was run as part of a similar PR in k8s/k8s. kubernetes/kubernetes#91513

@BenTheElder
Copy link
Member

/reopen

kops needs to not fail when there are only binaries for the current platform.

it's very expensive (think 8 cores x 50GB takes over an hour) to do a full Kubernetes cross build / release build, and a ton of disk space on the node.

kubernetes presubmit builds a bazel-cached amd64 only build. even if we started running arm in presubmit for some reason we would not want to build every platform.

@k8s-ci-robot k8s-ci-robot reopened this Jul 9, 2020
@k8s-ci-robot
Copy link
Contributor

@BenTheElder: Reopened this issue.

In response to this:

/reopen

kops needs to not fail when there are only binaries for the current platform.

it's very expensive (think 8 cores x 50GB takes over an hour) to do a full Kubernetes cross build / release build, and a ton of disk space on the node.

kubernetes presubmit builds a bazel-cached amd64 only build. even if we started running arm in presubmit for some reason we would not want to build every platform.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@BenTheElder
Copy link
Member

alternatively we can remove this presubmit and use the shared CI cross builds. but we strongly avoid doing make release in presubmit for a reason.

@hakman
Copy link
Member

hakman commented Jul 9, 2020

@BenTheElder Thanks for the feedback. Being a manually triggered job thought it may be ok. Anyway, it doesn't work, so I will revert the changes.
Not sure what the purpose of this presubmit is in k/k and if it can be removed and just have periodic job instead (I think that is what you were suggesting).

/cc @justinsb @rifelpet

@BenTheElder
Copy link
Member

BenTheElder commented Jul 9, 2020 via email

@jqmichael
Copy link
Author

jqmichael commented Jul 9, 2020

that's the simplest option, I don't know who may or may not be trying to use this though and what the use case is. one option I kicked around with justin (we happened to be talking already) is having an option / env to allow missing architectures. KOPS_YOLO_MISSING_BINARIES if you will

Wondering if it is feasible for kops to take a whitelist instead of a blacklist for architectures. I'm interested in taking that task if someone can point me to where that change should be.

(Apologies again for not doing the code digging myself. A busy week.)

@hakman
Copy link
Member

hakman commented Jul 9, 2020

Thanks for volunteering @jqmichael :).
Please check my PR that fixes this issue to get an idea of what and where to change.
Would probably need an extra cluster spec setting to specify the supported architectures, but at the moment Kops only supports AMD64 and ARM64 (there aren't any other uses cases for other architectures). Maybe open an issue and some others can provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants