Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minikube flakes #21931

Closed
edsantiago opened this issue Mar 4, 2024 · 6 comments · Fixed by #23237
Closed

minikube flakes #21931

edsantiago opened this issue Mar 4, 2024 · 6 comments · Fixed by #23237
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

No useful diagnostics:

# X Exiting due to RUNTIME_ENABLE: Failed to enable container runtime: sudo systemctl restart cri-docker.socket: Process exited with status 1

Smells like the quay flakiness, but there's nothing to go on. Tests should probably be instrumented to run journalctl, minikube logs, and anything else that could give a user some hints.

x x x x x x
minikube(3) podman(3) fedora-39(3) rootless(3) host(3) sqlite(3)
@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Mar 4, 2024
@afbjorklund
Copy link
Contributor

afbjorklund commented Mar 4, 2024

Weird that already the cri-docker.socket fails, you would think it would wait until cri-docker.service

https://github.com/Mirantis/cri-dockerd/tree/master/packaging/systemd

edsantiago added a commit to edsantiago/libpod that referenced this issue Mar 19, 2024
New run_minikube() helper, modeled after run_podman(). Echoes
each command being run and its output. On failure, runs minikube logs.

Addresses (does not close) containers#21931 which is hitting us hard in CI.
Probably quay flakes, but it's impossible to tell without logs.

Also: bug fix: one "run podman" fixed to run_podman

Signed-off-by: Ed Santiago <[email protected]>
@edsantiago
Copy link
Member Author

Caught one:

<+010ms> # $ minikube kubectl -- apply -f /tmp/minikube_deploy_SEeITt.yaml
<+593ms> # pod/test-ctr-pod created
         #
<+023ms> # $ minikube kubectl get pods
<+266ms> # NAME           READY   STATUS              RESTARTS   AGE
         # test-ctr-pod   0/1     ContainerCreating   0          0s
....
<+1.03s> # $ minikube kubectl get pods
<+232ms> # NAME           READY   STATUS         RESTARTS   AGE
         # test-ctr-pod   0/1     ErrImagePull   0          18s        <<<<<<<<<<<<<--------------------------------
....
<+1.03s> # $ minikube kubectl get pods
<+265ms> # NAME           READY   STATUS             RESTARTS   AGE
         # test-ctr-pod   0/1     ImagePullBackOff   0          30s       <<<<<<<<<<<<------------------
....
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #| FAIL: Timed out waiting for pod to move to 'Running' state
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

"ErrImagePull" smells to me like quay flake. Anyone know for sure?

@cevich
Copy link
Member

cevich commented May 6, 2024

I believe I hit this using a "new" F40 image (still under test) or possibly a new flake?

https://api.cirrus-ci.com/v1/artifact/task/5131719219609600/html/minikube-podman-fedora-40-rootless-host-sqlite.log.html

The output seems similar to a previous (above) hit on:

You are using the QEMU driver without a dedicated network, which doesn't support minikube service&minikube tunnel commands.

I don't think that test should be trying to use QEMU, but maybe that's a red herring? In any case, I re-ran the task and it passed.

@edsantiago
Copy link
Member Author

This is starting to compete with #22551 for the Most Annoying Flake award.

x x x x x x
minikube(20) podman(20) fedora-39(12) rootless(20) host(20) sqlite(20)
fedora-40(8)

@cevich
Copy link
Member

cevich commented May 21, 2024

Maybe worth asking Urvashi to take a look? IIRC she wrote these tests, and might have a quick/easy answer.

@cevich
Copy link
Member

cevich commented Jul 9, 2024

FWIW, I attempted to reproduce this in a hack/get_ci_vm.sh environment. Painstakingly copy-pasting commands in the code-path one-by-one. This worked perfectly fine for minikube - check cluster is up and minikube - deploy generated container yaml to minikube. I was hoping to get lucky and it would reproduce for me given how seemingly often it breaks 😢 So I'm giving up.

Ref: #23237

@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 9, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Oct 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants