-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xds/outlierdetection: fix config handling #6361
Conversation
d7701d2
to
e9c6d27
Compare
// Shouldn't happen, registered through imported Cluster Resolver, | ||
// defensive programming. | ||
logger.Errorf("%q LB policy is needed but not registered", clusterresolver.Name) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not actually legal to return nil, though. We should return an LB policy instance that is a nop for everything and provides a TF picker or something in this situation. Maybe add a package to internal/balancer with this trivial implementation.
// This is illegal and should never happen; we clear the balancerWrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Here and in Cluster Resolver.
if !ok { | ||
// Shouldn't happen, imported Cluster Resolver builder has this method. | ||
logger.Errorf("%q LB policy does not implement a config parser", clusterresolver.Name) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Here and in Cluster Resolver.
// "In the cds LB policy, if the outlier_detection field is not set in | ||
// the Cluster resource, a "no-op" outlier_detection config will be | ||
// generated in the corresponding DiscoveryMechanism config, with all | ||
// fields unset." - A50 | ||
if odJSON == nil { | ||
// This will pick up top level defaults in Cluster Resolver | ||
// ParseConfig, but sre and fpe will be nil still so still a | ||
// "no-op" config. | ||
odJSON = json.RawMessage(`{}`) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this logic be moved to where we produce the JSON OD config from the proto instead? This is part of converting from xds OD config to OD's JSON config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, unfortunately not, because the language in the gRFC explicitly states "in the cds lb policy". The issue was I mapped that language to the paragraph following to. We triaged this, and this was the only behavior scoped to the cds lb policy. @murgatroid99
var cfg *LBConfig | ||
if err := json.Unmarshal(j, &cfg); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointer to a pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is how my wrr_locality balancer works too. https://github.com/grpc/grpc-go/blob/master/xds/internal/balancer/wrrlocality/balancer.go#L93. If I switch it it hangs. I thinkkkk it's because if you declare a value type on the stack, it gets all the zero values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of stuff like child config etc.
for i, dm := range cfg.DiscoveryMechanisms { | ||
lbCfg, err := odParser.ParseConfig(dm.OutlierDetection) | ||
if err != nil { | ||
return nil, fmt.Errorf("error parsing Outlier Detection config: %v", dm.OutlierDetection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to return err
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Good catch.
if envconfig.XDSOutlierDetection { | ||
for i, odCfg := range odCfgs { | ||
cfg.DiscoveryMechanisms[i].outlierDetection = odCfg | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this checking the env var, but parsing isn't? That seems wrong. We don't usually validate things we won't use. I think they both should be in this condition and then we don't need two different for loops, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right fair point. Changed.
@@ -72,7 +73,10 @@ var ( | |||
} | |||
|
|||
noopODCfg = outlierdetection.LBConfig{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does a noop config have fields set to specific numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't do anything about it. The gRFC specifically states "cds generates with all fields unset" for the no-op. Thus, the way our system is set up now, it is literally impossible to uphold the balancer API while keeping all fields unset (will always call ParseConfig() and get defaults). The important part is sre and fpe == nil, which keeps the picker from doing unnecessary counting on the critical rpc path. I talked to Michael about this and he says it's fine. I would like the fields to stay unset, but unfortunately can't uphold API and keep fields unset. Also can't set to zero values because that is different that the language of the gRFC, which states "all fields unset".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait how is this config used in these tests, then? It looked like they were inputs, which means they could be completely empty, right? Or are you trying to make the input test data match real-world input data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter.
lbCfg := &LBConfig{ | ||
// Default top layer values as documented in A50. | ||
Interval: iserviceconfig.Duration(10 * time.Second), | ||
BaseEjectionTime: iserviceconfig.Duration(30 * time.Second), | ||
MaxEjectionTime: iserviceconfig.Duration(300 * time.Second), | ||
MaxEjectionPercent: 10, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the multi-level nature of this...there are now other defaults elsewhere. Can you move these defaults into config.go
so they are all together, by adding a LBConfig.UnmarshalJSON
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Needed to do same trick on that type to prevent infinite recursion and stack overflow.
type failurePercentageEjection FailurePercentageEjection | ||
|
||
// UnmarshalJSON unmarshals JSON into FailurePercentageEjection. If a | ||
// FailurePercentageEjection field is not set, that field will get it's default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/it's/its/
- 3x in this PR. Whenever I see "it's" my mind says "it is".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
SuccessRateEjection: sre, | ||
FailurePercentageEjection: fpe, | ||
} | ||
odLBCfgJSON, err := json.Marshal(odLBCfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return json.Marshal(odLBCfg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Switched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pass! Got to all comments.
internal/balancer/nop/nop.go
Outdated
* | ||
*/ | ||
|
||
// Package nop implements a balancer with all of it's balancer operations as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:/. Haha switched.
internal/balancer/nop/nop.go
Outdated
"google.golang.org/grpc/connectivity" | ||
) | ||
|
||
// Balancer is a balancer with all of it's balancer operations as no-ops, other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:/. Haha switched.
internal/balancer/nop/nop.go
Outdated
// and a Connectivity State of TRANSIENT_FAILURE. | ||
func (b *Balancer) UpdateClientConnState(_ balancer.ClientConnState) error { | ||
b.cc.UpdateState(balancer.State{ | ||
Picker: base.NewErrPicker(errors.New("no-op balancer invoked")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error returned here should be an error passed to NewNOPBalancer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
internal/balancer/nop/nop.go
Outdated
|
||
// Balancer is a balancer with all of it's balancer operations as no-ops, other | ||
// than returning a Transient Failure Picker on a Client Conn update. | ||
type Balancer struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unexport, then you don't need all these comments. Only NewNOPBalancer
needs a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it unexported, but it was throwing errors wrt lint. I had to change it to pass lint :/
internal/balancer/nop/nop.go
Outdated
} | ||
|
||
// NewNOPBalancer returns a no-op balancer. | ||
func NewNOPBalancer(cc balancer.ClientConn) *Balancer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewBalancer
since it's in the nop
package? I.e. nop.NewBalancer()
vs nop.NewNOPBalancer()
... the latter slightly stutters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -72,7 +73,10 @@ var ( | |||
} | |||
|
|||
noopODCfg = outlierdetection.LBConfig{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait how is this config used in these tests, then? It looked like they were inputs, which means they could be completely empty, right? Or are you trying to make the input test data match real-world input data?
@@ -160,7 +160,7 @@ type exitIdle struct{} | |||
|
|||
// cdsBalancer implements a CDS based LB policy. It instantiates a | |||
// cluster_resolver balancer to further resolve the serviceName received from | |||
// CDS, into localities and endpoints. Implements the balancer.Balancer | |||
// CDS, into localities and endpoints. Implements the balancer.bal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops. Change back.
This PR fixes Outlier Detection configuration across the xDS System and the Outlier Detection ParseConfig.
This PR touches multiple components:
This fixes a few bugs:
RELEASE NOTES: