-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
502: bad gateway on <put>
when removing cluster / fleet agent overrides from an existing rke2 cluster
#9012
Comments
As per our meeting, I couldn't reproduce this issue. Moving to test. |
Still no luck here, even with your system @slickwarren . Check running this without the browser extensions on your system, like you suggested. I went through your system and wasn't able to reproduce it. 🙏 |
I'm able to reproduce this on both chrome and safari. Safari has a slightly different error message. DM'd with the exact payload.
|
Good day @slickwarren . Since this has been open for a few days on the UI side and it's a 502 (Bad Gateway) error, which indicates to me that this most probably a backend issue, I would ask that this issue should be reassigned to the the backend team for further investigation. 🙏 On the UI side, not only we couldn't reproduce this, but also couldn't find any indication that this is a UI/frontend issue. Thanks for taking the time to go over this with me in a couple of calls. 🤜 🙇 |
@slickwarren , could you please provide the details from the call that returned 502 from "Network" tab of Chrome Developer Tools for the complete picture? At a glance it seems the call from the UI didn't even reach Rancher backend code or Rancher was down at that exact time for some reason. Similar issue - https://www.reddit.com/r/kubernetes/comments/oaxarg/intermittent_502_bad_gateway_issue/ |
Transferred from |
Here's the info I have right now, please lmk if you need more:
outside of these screenshots, I don't have the network data at this time. If the above info isn't including what you're looking for, please lmk and I can get it for you. @snasovich |
@slickwarren Thanks for the info! From this description, An error is shown in the UI and the user is kept on the cluster edit screen, however the update was actually still applied and the overrides were removed
cluster did not appear to go into an updating state. If the update to remove to the agent customization went through, then that tells me that the cluster agent is still connected to rancher and rancher's network connection failed due to other reasons, local network or ingress issues. If the connection to the cluster agent failed, you'd see more like Also, it's fine if sometimes you don't see the cluster go into an updating state because the update can be fast. To verify the agent was redeployed with the update, check the rancher logs for
If you were having local network issues, I'd recommend trying to repro again today on a fresh install of Rancher. Rancher logs from the time of API call may also help. |
|
@slickwarren When you see the request go through, do you see the overrides removed from the cluster agent? Or can you not access the cluster anymore? Gotcha, I meant similar GH issues where a 502 issue was seen. We can also discuss offline. |
The request does appear to update the spec / cluster / fleet agent appropriately, however the user doesn't know that until they leave the 502 error page, and go to edit the cluster again (or go and view the cluster / fleet agent deployments) |
I've made a follow-up issue here #9016 |
tested on v2.7-head (f0d4078):
notes:
|
@slickwarren , thank you for testing these. The warnings you mentioned should be covered as part of #9016. |
Setup
Describe the bug
when updating an rke2 cluster with existing cluster and fleet agent overrides -> removing all the settings, resulted in a 502 gateway error.
To Reproduce
Result
An error is shown in the UI and the user is kept on the cluster edit screen, however the update was actually still applied and the overrides were removed
cluster did not appear to go into an updating state
Expected Result
if there's an error, I wouldn't expect the new spec to have applied to the cluster
Screenshots
Additional context
did not happen for an rke1 cluster
The text was updated successfully, but these errors were encountered: