-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrading from v1.12.1 to v1.14.0 makes all GKE cluster and nodepools update fail #242
Comments
Hi @tonybenchsci, thanks for reporting this. It's a known issue and we have a fix coming in the next release. |
Thanks @jcanseco in that case I will revert the upgrade and wait for your fix. Mind enlightening us a bit on the root cause and expected patch release date? |
Sure, we introduced validation logic recently for many of our resources to better help our users from writing faulty configurations, and this is a case where the validation logic itself turned out be faulty. See this for (somewhat) more details. |
Though to clarify, the incoming fix will fix the validation issue with The That said, the |
Gotcha thanks @jcanseco. Just a FYI, reverting back to v1.12.1 resolved this issue, but now (though not breaking)
I assume there is a step missing in https://cloud.google.com/config-connector/docs/how-to/install-upgrade-uninstall#upgrading (manual upgrade), or that to downgrade I need to run an additional |
@tonybenchsci ah, unfortunately it seems you followed the manual upgrade steps to perform an in-place downgrade? We don't officially support in-place downgrades currently. v1.14.0 introduced the Our recommended solution to handle this issue is to do a full uninstall and reinstall. Is this an option for you? |
That is too risky I'm afraid, and since the SecretManagerSecret issue isn't really breaking, I'm hoping to upgrade to whatever v1.14.x fix is. When does that get released? |
Gotcha, I'm very glad to hear that the issue is not breaking. I'm sorry for the trouble. The fix will be part of the next release which will come out by the end of the week. |
Not to keep harping this @jcanseco , but I noticed after the in-place downgrade, sqlinstance resources are no longer part of the reconciliation loop. The kubectl describe shows configs updating and up-to-date but the GCP settings are not tracked/mutated. Is it something to do with the cnrm-lease label? Understand that downgrades are not fully supported, but wondering if this is something that needs to be fixed in general and if it's specific to sqlinstance CRDs. Hoping there is a simple workaround for me to sync up KCC and GCP EDIT: I just re-installed (v1.12.1 to v1.12.1) and this is now resolved. |
Hi @tonybenchsci, awesome, I'm glad the problem has been resolved! We also just released v1.15.0 which should fix the ContainerCluster validation issue. Please try it out and let us know if it fixes the problem. |
Thanks @jcanseco v1.15.0 seems to have fixed the ContainerCluster issue (and obviously the minor SecretManager issue). I am seeing odd but transient errors below (which then get quickly corrected to
I'm ready to close this issue though, if you could explain both points quickly. To me, sqldatabase and containernodepool are both resources that reference other KCC resources (i.e. sqlinstance and containercluster), and I suspect the validation logic added in v1.13.1 might be too quick to detect and report errors when the reference resource isn't finished re-conciliating? |
Great, thanks for confirming the fix worked! And thanks for reporting these log messages and their transient behavior. These do look like something we should fix. I would say these don't really look like issues with the validation logic, but rather, we seem to not be properly detecting that the parent resource is not yet ready (e.g. the parent We'll take a look and let you know when we have updates. |
Oh and if you're ok with it, I'll be closing the issue now since the original problem has been fixed, but please feel free to reopen if you have any further issues! |
Hi @tonybenchsci, the issue that was causing resources like |
Thanks. Confirming that it did fix the errors. |
We're experiencing the same issue as in #242 (comment) here with the new ConfigConnector GKE add-on. When I check its version in |
Hi @dinvlad, yes, it takes about 3 weeks for the KCC GKE add-on to pick up a new KCC release. We're looking into separating the KCC GKE add-on upgrade schedule from the GKE upgrade schedule so that users could, for example, get KCC upgrades sooner. However, this is still very much a work-in-progress. If you want to get the latest KCC release, you could install the KCC operator as a standalone (which is really what the KCC GKE add-on uses under the hood to manage KCC installations). You can do so by following the instructions here. |
From our experience, the manual install using Workload Identity has been very smooth and reliable. |
Thanks for the heads-up on 3-weeks schedule, I think that works for us. The manual install is what we've been using for many months now, fairly smoothly. I've decided to test the new add-on, which is why I was wondering if it could be upgraded. The last "stable" version that worked for us smoothly was |
Also, happy to report that I eliminated this particular error for now (since we aren't using SecretManager resources yet, though planning for them) by running
and then re-installing the |
I just hit this issue on 1.15.1 , but updating to 1.24.0 fixed it. |
Describe the bug
All GKE clusters:
All GKE nodepools:
ConfigConnector Version
When going from v1.12.1 to v1.14.0
To Reproduce
Follow the manual upgrade instructions for the Workload Identity installation.
YAML snippets:
N/A
The text was updated successfully, but these errors were encountered: