-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
orca: create ORCA producer for LB policies to use to receive OOB load reports #5669
Conversation
1b369f2
to
1cf3c60
Compare
@@ -20,8 +20,15 @@ | |||
// avoid polluting the godoc of the top-level orca package. | |||
package internal | |||
|
|||
import ibackoff "google.golang.org/grpc/internal/backoff" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this import to be renamed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't; I just like to keep things consistent, and we have a backoff
package and an internal/backoff
package, and it's confusing if we reference both using the same name in different files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Package names should ideally be self contained and self explanatory. If we are always going to rename this import, we should ideally rename the package itself.
// A Producer is a type shared among potentially many consumers. It is | ||
// associated with a SubConn, and an implementation will typically contain | ||
// other methods to provide additional functionality, e.g. configuration or | ||
// subscription registration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description feels very vague to me, in the sense that I can't get a good idea of when to use a Producer
, how to use one etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you suggest something better after our discussion today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. The first sentence is fine, but I don't get much technical understanding from: "and an implementation will typically contain other methods to provide additional functionality, e.g. configuration or subscription registration." Additional functionality with respect to what? There's no functionality defined before this. Do we expect this to ever be implemented by another concrete type? This comment is very specific to type producer in orca/producer. If so, why have this interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional functionality with respect to what? There's no functionality defined before this.
In addition to the builder that created it, I suppose, and the close
function the builder returns.
Do we expect this to ever be implemented by another concrete type?
Yes, definitely. This is intended to be generic and able to be implemented by anything. The other example we already have is the health checks, which are implemented in the other languages as a generic wrapper, but hard-coded into gRPC in Go. I'd like to move to a similar model there by using this mechanism.
This comment is very specific to type producer in orca/producer.
Not at all. If it was, it would talk about ORCA configuration. ORCA should not be part of grpc-go directly, but instead should only be a plugin which uses this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might have missed pushing a commit. I have more comments which I will add soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool PR bossman. Some comments.
// A Producer is a type shared among potentially many consumers. It is | ||
// associated with a SubConn, and an implementation will typically contain | ||
// other methods to provide additional functionality, e.g. configuration or | ||
// subscription registration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. The first sentence is fine, but I don't get much technical understanding from: "and an implementation will typically contain other methods to provide additional functionality, e.g. configuration or subscription registration." Additional functionality with respect to what? There's no functionality defined before this. Do we expect this to ever be implemented by another concrete type? This comment is very specific to type producer in orca/producer. If so, why have this interface?
1cf3c60
to
4971070
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated; sorry for the force push; I messed up the synchronization between my laptop & workstation.
// A Producer is a type shared among potentially many consumers. It is | ||
// associated with a SubConn, and an implementation will typically contain | ||
// other methods to provide additional functionality, e.g. configuration or | ||
// subscription registration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional functionality with respect to what? There's no functionality defined before this.
In addition to the builder that created it, I suppose, and the close
function the builder returns.
Do we expect this to ever be implemented by another concrete type?
Yes, definitely. This is intended to be generic and able to be implemented by anything. The other example we already have is the health checks, which are implemented in the other languages as a generic wrapper, but hard-coded into gRPC in Go. I'd like to move to a similar model there by using this mechanism.
This comment is very specific to type producer in orca/producer.
Not at all. If it was, it would talk about ORCA configuration. ORCA should not be part of grpc-go directly, but instead should only be a plugin which uses this.
@@ -197,7 +200,7 @@ func (p *producer) runStream(ctx context.Context) (resetBackoff bool, err error) | |||
ReportInterval: durationpb.New(interval), | |||
}) | |||
if err != nil { | |||
return resetBackoff, err | |||
return false, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You liked this rather than using default value of bools more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was changed due to an earlier comment from @easwars. I said in my reply there I didn't feel strongly either way. I still don't.
// to avoid overloading a server experiencing problems. The attempt count | ||
// is incremented when stream errors occur and is reset when the stream | ||
// reports a result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the last part of this comment is correct/like this comment. First of all, reset has two semantical meanings for this field, one for the timer, one for the attempt count, which gets plumbed into Exponential Backoff to determine what the backoff timer gets reset to I'm assuming your reset refers to attempt count. Also, there is no linkage or even local variable named request count. I see backoffAttempt, which isn't a field here, but the int in this function variable. The attempt count is not reset when the stream reports a result. Right now, the attempt count is reset on either a. the minInterval has changed after processing the receive from the stream compared to what was the stream was configured with. b. If the second+ stream.Recv() returns an error, this also feels wrong, I will leave a comment there.
return resetBackoff, err | ||
} | ||
resetBackoff = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? I don't get why it matters whether the error comes from the second+ stream.Recv() or the first one with respect to resetting backoff. Shouldn't it be io.EOF the conditional for a successful stream where we can infer the server is working properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Deep down though, I would have still preferred if we could have added the methods required to make RPCs on the SubChannel
interface. I feel that would have been a more intuitive change.
Like, currently, the RegisterOOBListener
function in the orca
package calls GetOrBuildProducer
on the SubConn
and passes a builder defined locally. This seems quite roundabout. But, if we want the orca
package to handle everything locally, then we would need the functionality to make RPCs on the subConns, and also the functionality to uniquely identify subConns based on their channelz ID. If we don't think adding this functionality to the subConn interface is a good long term option, I agree with the approach taken in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All testing nits.
orca/producer_test.go
Outdated
// Stop listener 3. This makes the interval longer, with stream recreation | ||
// delayed until the next report is received. Reports only go to listener | ||
// 1 now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reports should only go to listener 1 now. Also explain why the interval is now longer, and perhaps tie that idea of a longer interval to stream recreation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping about stream recreation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a very high level understanding of the feature (ORCA OOB reporting) plus this comment as-is is sufficient to understand what's happening here. I could make a longer comment to make things painfully obvious, but sometimes less is more.
orca/producer_test.go
Outdated
|
||
lisOpts := orca.OOBListenerOptions{ReportInterval: 50 * time.Millisecond} | ||
lao := &listenerAndOptions{listener: oobLis, opts: lisOpts} | ||
r.InitialState(resolver.State{Addresses: []resolver.Address{setListenerAndOptions(resolver.Address{Addr: lis.Addr().String()}, lao)}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this split up still.
@@ -255,10 +258,14 @@ func (s) TestProducerBackoff(t *testing.T) { | |||
|
|||
// Provide a convenient way to expect backoff calls and return a minimal | |||
// value. | |||
expectedBackoff := -1 // -1 indicates any value is allowed. | |||
const backoffShouldNotBeCalled = 9999 // Use to assert backoff function is not called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get that explanation, but it's probably just me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explain in comments why we don't want the backoff function to be called when backoffShouldNotBeCalled
is assigned to expectedBackoff
below. (It's because the last stream was successful.)
|
||
// Register our fake ORCA service | ||
s := grpc.NewServer() | ||
fake := newFakeORCAService() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments.
// Stop listener 2. This does not affect the interval. The next update | ||
// goes to listeners 1 and 3. | ||
// Stop listener 2. This does not affect the interval as listener 3 is | ||
// still the shortest. The next update goes to listeners 1 and 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and still has the shortest interval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This level of review isn't important. The meaning of this comment is very clear, and it's an internal comment in a test, not in external documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't hurt though. But I see what you're saying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM outside my final responses, but again don't feel strongly about any of my nits.
You can go ahead and merge btw. |
RELEASE NOTES: