-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reloader: don't fail on envvar expansion errors #7429
Conversation
d8b54d9
to
8a16f7b
Compare
pkg/reloader/reloader.go
Outdated
return err | ||
} | ||
r.configApplyErrors.Inc() | ||
level.Error(r.logger).Log("msg", "expand environment variables", "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So by logging out, sidecar doesn't fail, but there would be errors on Prometheus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, due to unset variables? Referencing the parent issue, there's a need to allow the reloader to not crash on such instances, as is the case for all >1
file modifications (and reloads), and thus align environment expansion errors to that same behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Maybe @MichaHoffmann also wants to take a look?
Maybe we can make the behavior configurable? FWIW most of the prometheus config reloader functionality comes from the pkg/reloader but as long as we have an option to create a Reloader instance that doesn't fail on env subst, it is also fine to keep the current behavior as the default. |
pkg/reloader/reloader.go
Outdated
@@ -701,7 +708,7 @@ func expandEnv(b []byte) (r []byte, err error) { | |||
|
|||
v, ok := os.LookupEnv(string(n)) | |||
if !ok { | |||
err = errors.Errorf("found reference to unset environment variable %q", n) | |||
err = errors.Wrapf(expandEnvError, "%s", n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the first failed expansion will stop the processing of the config. I'd rather see this as best-effort (e.g. do all possible replacements and record any error).
9d05302
to
53b0a29
Compare
CI failures don't seem to be stemming from the patch. |
CHANGELOG.md
Outdated
@@ -25,6 +25,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re | |||
|
|||
### Changed | |||
|
|||
- [#7429](https://github.com/thanos-io/thanos/pull/7429): Reloader: introduce `suppressEnvironmentVariablesExpansionErrors` to allow suppressing errors when expanding environment variables in the configuration file. When set, this will ensure that the reloader won't crash when an unset environment variable is encountered. Note that all unset environment variables are left as is, whereas all set environment variables are expanded as usual. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [#7429](https://github.com/thanos-io/thanos/pull/7429): Reloader: introduce `suppressEnvironmentVariablesExpansionErrors` to allow suppressing errors when expanding environment variables in the configuration file. When set, this will ensure that the reloader won't crash when an unset environment variable is encountered. Note that all unset environment variables are left as is, whereas all set environment variables are expanded as usual. | |
- [#7429](https://github.com/thanos-io/thanos/pull/7429): Reloader: introduce `suppressEnvironmentVariablesExpansionErrors` to allow suppressing errors when expanding environment variables in the configuration file. When set, this will ensure that the reloader won't consider the operation to fail when an unset environment variable is encountered. Note that all unset environment variables are left as is, whereas all set environment variables are expanded as usual. |
pkg/reloader/reloader.go
Outdated
cfgFile string | ||
cfgOutputFile string | ||
cfgDirs []CfgDirOption | ||
suppressEnvironmentVariablesExpansionErrors bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(suggestion)
suppressEnvironmentVariablesExpansionErrors bool | |
discardEnvVarExpansionErrors bool |
pkg/reloader/reloader.go
Outdated
cfgFile string | ||
cfgOutputFile string | ||
cfgDirs []CfgDirOption | ||
suppressEnvironmentVariablesExpansionErrors bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be exposed in Options
too?
pkg/reloader/reloader.go
Outdated
@@ -692,16 +693,24 @@ func RuntimeInfoURLFromBase(u *url.URL) *url.URL { | |||
|
|||
var envRe = regexp.MustCompile(`\$\(([a-zA-Z_0-9]+)\)`) | |||
|
|||
func expandEnv(b []byte) (r []byte, err error) { | |||
func expandEnv(logger log.Logger, b []byte, suppressEnvironmentVariablesExpansionErrors bool, configApplyErrorsPtr *prometheus.Counter) (r []byte, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than passing the additional parameters, why not make it a *Reloader
function?
pkg/reloader/reloader.go
Outdated
n = n[2 : len(n)-1] | ||
|
||
v, ok := os.LookupEnv(string(n)) | ||
if !ok { | ||
err = errors.Errorf("found reference to unset environment variable %q", n) | ||
(*configApplyErrorsPtr).Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather see a new metric to track this: either an expansion errors counter or a gauge (the latter probably makes more sense as a counter needs constant increases to trigger an alert).
pkg/reloader/reloader.go
Outdated
(*configApplyErrorsPtr).Inc() | ||
errStr := errors.Errorf("found reference to unset environment variable %q", n) | ||
if suppressEnvironmentVariablesExpansionErrors { | ||
level.Debug(logger).Log("msg", "expand environment variable", "err", errStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect a Warn
80de75d
to
2d9509d
Compare
Test failure seems to indicate that the filesystem changes dropped significantly since the last commit, I don't see any changes in the reloader that may visibly warrant that though. I'll take another look. Also, the documentation failure seems unrelated (I believe some URLs were moved)? |
IIUC the failure on the |
It seems it wasn't moved, but failed to render due to a recent change.
My guess is they tried to integrate HubSpot which didn't go as planned, and crashed the DOM. |
a28d14e
to
5bcbc33
Compare
pkg/reloader/reloader.go
Outdated
n = n[2 : len(n)-1] | ||
|
||
v, ok := os.LookupEnv(string(n)) | ||
if !ok { | ||
err = errors.Errorf("found reference to unset environment variable %q", n) | ||
r.configEnvVarExpansionErrors.Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'd need to save the number of errors in a local variable and set the gauge before exiting expandEnv()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right! It makes sense to do this on a per-config basis, so as to not take into account increments from a previous config expansion. I'll make the change.
pkg/reloader/reloader.go
Outdated
level.Warn(r.logger).Log("msg", "expand environment variable", "err", errStr) | ||
return m | ||
} | ||
level.Error(r.logger).Log("msg", "expand environment variable", "err", errStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm the error should be logged by the caller in this case?
f3a17f7
to
edf9a3f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test similar changes in the past, I opened a PR against prometheus-operator/prometheus-operator replacing the upstream version by my fork. This way I made sure that it worked as expected (e.g. no need for back-and-forth changes between the 2 repos).
pkg/reloader/reloader.go
Outdated
@@ -348,8 +360,8 @@ func (r *Reloader) Watch(ctx context.Context) error { | |||
} | |||
} | |||
|
|||
func normalize(logger log.Logger, inputFile, outputFile string) error { | |||
b, err := os.ReadFile(inputFile) | |||
func (r *Reloader) normalize(input, output string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) I preferred inputFile and outputFile as it makes it obvious that we talk about files and not data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, an overlook on my part. Reverted.
@simonpasquier Makes sense, I remember doing a similar thing in I'll raise a PR downstream and hold off on merging this before that's been verified. |
Refer: thanos-io/thanos#7429 Fixes: prometheus-operator#6136 Signed-off-by: Pranshu Srivastava <[email protected]>
CHANGELOG.md
Outdated
@@ -25,6 +25,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re | |||
|
|||
### Changed | |||
|
|||
- [#7429](https://github.com/thanos-io/thanos/pull/7429): Reloader: introduce `suppressEnvironmentVariablesExpansionErrors` to allow suppressing errors when expanding environment variables in the configuration file. When set, this will ensure that the reloader won't consider the operation to fail when an unset environment variable is encountered. Note that all unset environment variables are left as is, whereas all set environment variables are expanded as usual. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name doesn't match the code. Also I'm not sure if the change should be listed in the changelog since it's not user-visible.
cc @saswatamcode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Also moved this to the Added
section since we are adding to a, IIUC, user-visible API (reloader.Options
)?
509d346
to
e2d158d
Compare
# 500 when requested my mdox in GH actions. | ||
- regex: 'outshift\.cisco\.com' | ||
- regex: 'outshift\.cisco\.com\/blog\/multi-cluster-monitoring' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The domain works just fine, only /blog/multi-cluster-monitoring
has a problem.
Allow suppressing environment variables expansion errors when unset, and thus keep the reloader from crashing. Instead leave them as is. Signed-off-by: Pranshu Srivastava <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM!
Refer: thanos-io/thanos#7429 Fixes: prometheus-operator#6136 Signed-off-by: Pranshu Srivastava <[email protected]>
Refer: thanos-io/thanos#7429 Fixes: prometheus-operator#6136 Signed-off-by: Pranshu Srivastava <[email protected]>
Refer: thanos-io/thanos#7429 Fixes: prometheus-operator#6136 Signed-off-by: Pranshu Srivastava <[email protected]>
Refer: thanos-io/thanos#7429 Fixes: prometheus-operator#6136 Signed-off-by: Pranshu Srivastava <[email protected]>
Allow suppressing environment variables expansion errors when unset, and thus keep the reloader from crashing. Instead leave them as is. Signed-off-by: Pranshu Srivastava <[email protected]> Signed-off-by: Tidhar Klein Orbach <[email protected]>
Allow suppressing environment variables expansion errors when unset, and thus keep the reloader from crashing. Instead leave them as is. Signed-off-by: Pranshu Srivastava <[email protected]>
Don't fail the reloader on environment variable expansion errors.
Refer: prometheus-operator/prometheus-operator#6136
Changes
Verification