-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] Remote storages can warn rather than error on failure #1938
Conversation
) Remote { | ||
if override != nil { | ||
global = *override | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit(ignore if you'd like): Any reason to do this in the constructor, than perhaps just in the for loop creating the remotes?
Had to follow to next method call to work out what the ... defaultBehavior, nil)
meant (i.e. what is nil param used for).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah missed responding to this; tried it both ways and this looked a little cleaner
fmt.Fprintf(os.Stderr, "unable to load %s: %v", runOpts.ConfigFiles, err) | ||
os.Exit(1) | ||
} | ||
fmt.Fprintf(os.Stdout, "using %s config files: %v", serviceName, runOpts.ConfigFiles) | ||
|
||
fmt.Fprintf(os.Stdout, "using %s config files: %v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I haven't seen this fmt.Printf before - should we perhaps not be printing out this out? Seems strange and don't think any of our other servers do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/query/server/server.go
Outdated
remotes, enabled, err := remoteClient(cfg, tagOptions, poolWrapper, | ||
readWorkerPool, instrumentOptions) | ||
opts := config.RemoteOptionsFromConfig(cfg.RPC) | ||
lookback := *cfg.LookbackDuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, isn't it safer to use cfg.LookbackDurationOrDefault()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was pretty bleh about the way this whole thing is setup, but its guaranteed to be safe here since this happens just above: https://sourcegraph.com/github.com/m3db/m3@arnikola/remote-error-behavior/-/blob/src/query/server/server.go#L218:29
I didn't want to just pull it out since some stuff depends on that lookback duration being set; wouldn't mind refactoring it a bit but wanted to stay more focused here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although looks like I can use lookbackDuration
here at least
import "fmt" | ||
|
||
var ( | ||
// NB: Container behavior must not be defined in configs, so it does not count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Is it worth adding what a container behavior is? Haven't managed to grok what that is, might be worth mentioning here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, sounds good. Might change name to composite or something, basically it's only for fanout.Storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sg
zap.Error(err), | ||
zap.String("store", store.Name()), | ||
zap.String("function", "FetchBlocks")) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this slightly more go like:
if warning, err := storage.IsWarning(store, err); warning {
numWarning++
s.instrumentOpts.Logger().Warn(
"partial results: fanout to store returned warning",
zap.Error(err),
zap.String("store", store.Name()),
zap.String("function", "FetchBlocks"))
return
}
multiErr = multiErr.Add(err)
s.instrumentOpts.Logger().Warn(
"fanout to store returned error",
zap.Error(err),
zap.String("store", store.Name()),
zap.String("function", "FetchBlocks"))
return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, looks a little weird to me but if that's the convention easy times
zap.String("function", "SearchSeries")) | ||
|
||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm thinking you're missing a "continue" for the warning case (since otherwise metrics = append(metrics, result.Metrics...)
will likely be a nil pointer error?).
Also to make it more go like, maybe this:
if warning, err := storage.IsWarning(store, err); warning {
s.instrumentOpts.Logger().Warn(
"partial results: fanout to store returned warning",
zap.Error(err),
zap.String("store", store.Name()),
zap.String("function", "SearchSeries"))
continue
}
s.instrumentOpts.Logger().Warn(
"fanout to store returned error",
zap.Error(err),
zap.String("store", store.Name()),
zap.String("function", "SearchSeries"))
return nil, err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
zap.String("function", "CompleteTags")) | ||
|
||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing as above, think you're missing a continue
and may as well do the de-nesting "Go like" pattern/thing I typed up above.
BehaviorWarn | ||
// BehaviorContainer is for storages that contain substorages. It is necessary | ||
// to look at the returned error to determine if it's a failing error or | ||
// a warning error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, maybe a link to this or just a brief duplicate-like explanation at the other definition would be good too.
Codecov Report
@@ Coverage Diff @@
## master #1938 +/- ##
=========================================
- Coverage 60.5% 55.5% -5.1%
=========================================
Files 1105 1099 -6
Lines 104971 102648 -2323
=========================================
- Hits 63604 57011 -6593
- Misses 37102 41601 +4499
+ Partials 4265 4036 -229
Continue to review full report at Codecov.
|
|
||
// Remote is the configuration for a single remote host. | ||
type Remote struct { | ||
// RemoteConfiguration is the configuration for a single remote host. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why do you need this? It's already in the config
package and in the config.go
file. So you end up with config.RemoteConfiguration
vs config.Remote
. I prefer the shorter version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly to avoid conflicts with the new more concrete Remote
type; ideally we shouldn't be using RemoteConfiguration
anywhere outside this package, but we have to make it public for yaml parsing
) | ||
|
||
// Remote is an option for remote storage. | ||
type Remote struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why do you have both Remote
and RemoteConfiguration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So RemoteConfiguration
is the parsed yaml, Remote
is the parsed structure
@@ -450,6 +458,11 @@ type RPCConfiguration struct { | |||
// RemoteListenAddresses will only allow for a single remote zone to be used. | |||
RemoteListenAddresses []string `yaml:"remoteListenAddresses"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is RemoteListenAddresses
deprecated in favour of Remotes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah; it used to limit to a single zone; with this PR, can technically do both but would recommend sticking to Remotes
At some point soon we'll update docs as well as our example configs to mirror this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// ErrorBehavior overrides the default error behavior for this host. | ||
// | ||
// NB: defaults to warning on error. | ||
ErrorBehavior *storage.ErrorBehavior `yaml:"errorBehavior"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ErrorBehavior
is uint8
how do you serialize it to yaml? Shouldn't you have some string representation instead of numbers? Also, I saw that the constants are using iota
assignment which means that if in future someone adds more values, these need to always be appended at the end. Otherwise, existing apps might break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we've defined a custom yaml unmarshall here, which allows us to use the plaintext version of the desired behavior in the yaml. The documentation is definitely missing at the moment, but you can look at some of the new tests added in remote_options_test.go to see how the options will look in yaml (in all honesty you'd probably be fine with just taking the default approach which sets all remotes to BehaviorWarn)
What this PR does / why we need it:
Previously any failing remote storage calls would cause the entire fanout query to fail. This PR sets default remote storage options to warn users but keep going, with an option to override it and set remotes as necessary.
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: