-
Notifications
You must be signed in to change notification settings - Fork 399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Setting a proxy can break the reconciler #1295
Comments
Noted that the risk of this problem occurring is mentioned in the discussion of #1286 by @weisdd. The workaround of setting NO_PROXY in each namespace where the Grafana operator is used is not a great one, especially on clusters where the proxy setting is propagated to each namespace via global configuration. Wondering if it might be workable to add an env var, eg NO_PROXY_GRAFANA_API=true|false ? |
I think the best option is to call the API via the URL with the It would be great to get a fix for this soon because users, like me, which are installing the operator from operatorhub.io, have no chance to get back to the old version and are blocked at the moment. |
@m-kay couple of ideas :
2 and 3 seem cleaner to me as the user doesn't have to change anything. BTW if you are blocked, you can override HTTP(S)_PROXY in your operator config to "". That should get you past the problem. |
Unfortunately I'm not able to override HTTP(S)_PROXY when installing the operator via operatorhub because the global proxy configuration is overriding my operator configuration. As Option 3 would definitely fix the problem, however after reading the k8s docs I think the correct dns would be the one including the suffix. If it is possible to discover that suffix somehow it would for sure be cleaner to use the discovered suffix. Maybe the mounted |
@bheading @m-kay Please, take a look at #1300 and let us know if it's something that meets your expectations. It basically follows what @bheading suggested in point 3 (use proxy for external instances). Test image is published here: export KO_DOCKER_REPO=quay.io/weisdd/grafana-operator
ko build --sbom=none --bare --platform linux/arm64,linux/arm/v7,linux/amd64 -t v5.4.2-proxy-external |
Thanks @weisdd, yes this looks good to me. |
In principle it looks good @weisdd, I hope it is OK if I left one comment on a minor coding style point on the MR - feel free to reject it if not appropriate. However, when I tried your container, it crashed (see stack trace below). I am guessing this is because the TLSClientConfig field is not initialized. That problem should be solved if you accept my coding style change ;)
|
Is there a chance to get this released as a hotfix? |
Description of the bug
When a proxy is set in the operator (ie by setting
HTTP_PROXY/HTTPS_PROXY/NO_PROXY
in the Operator deployment) the Operator's interactions with the Grafana API may be routed through the proxy, causing reconciliation to fail. This will happen if the hostname of the Grafana API endpoint is not matched by theNO_PROXY
setting.Typically, users in disconnected environments (where using a proxy to reach the internet is mandatory) will have a standard set of proxy settings they will apply to each of their namespaces. Since the Grafana operator does not use the FQDN of the Grafana service when accessing the Grafana API, it is less likely that the
NO_PROXY
setting will avoid proxying the API calls.This problem appears to have been introduced with the integration of #1286.
Version
v5.4.2
To Reproduce
I originally observed the issue on an Openshift cluster but separately investigated how to reproduce the issue using
kind
. I'm using Fedora 38.192.168.10.36:3128
.kind create cluster
kubectl create -k deploy/kustomize/overlays/namespace_scoped
. This creates thegrafana
namespace and installs the operator/CRDs there.kubectl config set-context --current --namespace grafana
kubectl -n grafana apply -k config/samples
Observed behaviour
The proxy (which is running outside the cluster) begins receiving Grafana API calls, as follows :
The Operator produces logs as follows (I've edited for brevity/readability, as the squid proxy returns a chunk of HTML)
Expected behavior
The Operator should not route Grafana API calls via the proxy.
If the running version of the operator is reverted to v5.3.0, the problem goes away, most likely because the proxy support is not present in that version.
One workaround is to set NO_PROXY to include the namespace (eg
.grafana
in the above example). However this does not seem satisfactory as if the proxy settings are common and/or global every namespace will need to be included.Alternatively, the code could be changed so that Grafana API calls are never proxied. That will work provided there are no use cases where the Grafana API lives outside the cluster.
A third option would be to use the FQDN of the Grafana service, so that the URL in the above example would become
http://grafana-a-service.grafana.svc.cluster.local:3000
. The.svc
,.svc.cluster
and.svc.cluster.local
suffixes are more likely to be present in a globally-configuredNO_PROXY
setting. However, that won't work for clusters where the suffix has been changed from the default - some way to detect the suffix might help here.Suspect component/Location where the bug might be occurring
Introducing proxy support is likely to have led to this problem, see #1286.
Runtime (please complete the following information):
The text was updated successfully, but these errors were encountered: