Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e-cos-device-plugin-gpu job fails #31849

Closed
bart0sh opened this issue Feb 6, 2024 · 5 comments
Closed

e2e-cos-device-plugin-gpu job fails #31849

bart0sh opened this issue Feb 6, 2024 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@bart0sh
Copy link
Contributor

bart0sh commented Feb 6, 2024

What happened:

https://testgrid.k8s.io/sig-node-containerd#e2e-cos-device-plugin-gpu job fails continuously since at least Jan 22 2024.

What you expected to happen:

The device plugins test cases are running locally just fine, so the job should run too.

Anything else we need to know?:
It looks like the job fails to start api server. Here is what I see in the buildlog.txt:

Last output from querying API server follows:
-----------------------------------------------------
*   Trying 34.168.10.172:443...
* connect to 34.168.10.172 port 443 failed: Connection refused
* Failed to connect to 34.168.10.172 port 443 after 39 ms: Couldn't connect to server
* Closing connection 0
curl: (7) Failed to connect to 34.168.10.172 port 443 after 39 ms: Couldn't connect to server
@bart0sh bart0sh added the kind/bug Categorizes issue or PR as related to a bug. label Feb 6, 2024
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Feb 6, 2024
@bart0sh
Copy link
Contributor Author

bart0sh commented Feb 6, 2024

/cc @swatisehgal @ffromani @SergeyKanzhelev
/sig node
/sig testing

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 6, 2024
@bart0sh bart0sh changed the title e2e-cos-device-plugin-gpu job fails with "" e2e-cos-device-plugin-gpu job fails Feb 6, 2024
@kannon92
Copy link
Contributor

kannon92 commented Feb 7, 2024

@dims could this be related to the cloud provider deprecation? I remember seeing this GPU list in that job but didn't see if the job was successful or not before.

@dims
Copy link
Member

dims commented Feb 7, 2024

@kannon92 please see below:

Snippet from:

Feb 07 13:36:48.024915 bootstrap-e2e-master containerd[1191]: time="2024-02-07T13:36:48.024858788Z" level=error msg="PullImage \"registry.k8s.io/kube-apiserver-amd64:v1.30.0-alpha.1.107_052bce26f4f48b\" failed" error="rpc error: code = NotFound desc = failed to pull and unpack image \"registry.k8s.io/kube-apiserver-amd64:v1.30.0-alpha.1.107_052bce26f4f48b\": failed to resolve reference \"registry.k8s.io/kube-apiserver-amd64:v1.30.0-alpha.1.107_052bce26f4f48b\": registry.k8s.io/kube-apiserver-amd64:v1.30.0-alpha.1.107_052bce26f4f48b: not found"

Discussion thread:
https://cloud-native.slack.com/archives/GGEQHJ0AE/p1707284414739809?thread_ts=1707230141.026879&cid=GGEQHJ0AE

containerd PR that just merged:
containerd/containerd#9779

artifacts got built:
https://prow.k8s.io/?repo=containerd%2Fcontainerd&type=periodic

Now we have wait for the CI jobs to recover, feel free to kick them off now. Hopefully we get past this point (and not see this specific problem). Hoping that is enough.

(NOTE: not related to cloud provider deprecation, just a hiccup in upstream containerd)

@bart0sh
Copy link
Contributor Author

bart0sh commented Feb 7, 2024

The job started to run. Thank you for fixing it! Closing the issue.

@bart0sh bart0sh closed this as completed Feb 7, 2024
@aojea
Copy link
Member

aojea commented Feb 7, 2024

love it, thanks all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
Archived in project
Development

No branches or pull requests

5 participants