-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
broken plan status symmetry being reported by autopilot #4718
Comments
Maybe the solution would be to just check if a status for a given command id already exists before adding a new one here. Inspecting the logs I can see that all three nodes added, at some point, statuses:
I am not sure all nodes should be processing this very same object, all around the same time. Giving that we have only 3 statuses in the Plan but much more "Adding new status" log lines I suspected that some of these additions were dropped due to conflict and indeed I could find these lines in the log:
Clearly smells like a concurrency issue. Was this operation of adding the status supposed to happen only in one node ? I guess this is what we expect. |
Oh, I think it has to do with the apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
creationTimestamp: "2024-07-04T15:01:27Z"
name: k0s-autopilot-controller
namespace: k0s-autopilot
resourceVersion: "446357"
uid: 9fa9df27-c8ed-48c3-b776-865ad93ebc25
spec:
acquireTime: "2024-07-05T12:33:21.082083Z"
holderIdentity: 00e28c1ef61aa37947ea76c0529928521f9feebf778ff180313c99c09fa18a92
leaseDurationSeconds: 60
leaseTransitions: 11
renewTime: "2024-07-05T12:36:39.599608Z" I have seen this this has already been addressed on #4230, for v1.30 IIUC. |
Before creating an issue, make sure you've checked the following:
Platform
Version
v1.28.10+k0s.0
Sysinfo
`k0s sysinfo`
What happened?
I have started an upgrade from v1.28.10 to v1.29.5 using the following autopilot plan:
The first node seems to have been upgraded successfully:
After that nothing else happened, the plan is as follow for over 30 minutes:
What is strange here is the fact that we have three plan statuses but only one command. We could see the following in one of the nodes
journal
for k0scontroller:Steps to reproduce
I think this isn't deterministic otherwise this would have been caught earlier but this is how I reproduced it.
Expected behavior
Upgrade succeeds.
Actual behavior
The first node is upgraded but the second and third aren't and we can see that "symmetry error" in one of the node logs. After 30 minutes on the same state I decided to open this ticket.
Screenshots and logs
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: