-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Podman site crash leaves setup in confusing state #1298
Comments
When the CPU architecture requirement is not met, the container restarts in a loop, until the Podman network runs out of IP addresses (there is a podman fix to this issue included as part of 4.8 release). Once this situation happens, there are rootlessport processes left behind which keep the host ports, used by Skupper podman, bound, preventing new attempts to create sites from working. I am going to push a preventive fix that will make sure the containers can run |
Description
There are possibly a few bugs and enhancements in this issue...
There is a way to cause a router crash when initialising a Podman Site that leaves the environment in a state that is difficult for a user to understand what is going on and how to get a clean state that another skupper init can be performed.
In this case the crash is caused by a new RHEL9 requirement for the CPU architecture. If running in a virtual machine then the defaults may not pass the actual host architecture through to the guest, causing the router to crash once skupper cli thinks it has been successfully started. The error the router crashes with is: Fatal glibc error: CPU does not support x86-64-v2. Configuring the hypervisor to pass the actual CPU architecture through stops the crash.
However, this crash has uncovered an issue related to the state that the host is left in when the router crashes. Other crash types will likely have the same issue.
At the heart of this issue is a lack of guidance to help a user get out of the mess.
What did I do?
What happened
The router started and then crashed. No feedback was provided to the user.
The logs from journalctl (attached) had two significant entries (see below and attached)
The router port remained locked until I rebooted the machine.
Errors in system logs
1. CPU Architecture
Nov 29 07:27:28 rhsi-2 skupper-controller-podman[1956]: Fatal glibc error: CPU does not support x86-64-v2
2. Core Dump
What did I expect to happen
For the crash I would have expected skupper cli not to return until the router had properly stabilised.
On the second skupper init that failed I would have expected some help to guide the user as to how to clean up the environment
Environment Details
kvm64
.Attachments
Journalctl output
joutnalctl.log
The text was updated successfully, but these errors were encountered: