Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race between activeStreams and bdp window size #5494

Merged
merged 9 commits into from
Jul 13, 2022
Merged

Conversation

zasweq
Copy link
Contributor

@zasweq zasweq commented Jul 11, 2022

Fixes #5358.

The test case I added based off the issue test would successfully fail unexpected.EOF. This was caused by a race between when a stream was created and when updateFlowControl() was called. This fix makes sure any created streams has the correct window size on creation.

RELEASE NOTES:

  • Fixed dynamic BDP estimation causing unexpected EOF error by fixing race between activeStreams and bdp window size

@zasweq zasweq requested a review from dfawley July 11, 2022 17:27
@zasweq zasweq added this to the 1.49 Release milestone Jul 11, 2022
Comment on lines 751 to 757
if !checkForHeaderListSize(it) {
return false
}
if !checkForHeaderListSize(it) {
if !checkForStreamQuota(it) {
return false
}
return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can all be simply return checkForHeaderListSize(it) && checkForStreamQuota(it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

Comment on lines 8071 to 8072
if e, ok := status.FromError(err); ok {
if e.Code() != codes.ResourceExhausted {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simpler:

if code := status.Code(err); code != codes.ResourceExhausted {
	...
}

Note also that your test fails to detect if the returned error from UnaryCall is a non-status error (this isn't ever expected, of course, but is technically possible).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched and wrote a comment about your note.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrote a comment about your note.

I don't understand. If you switched to my method instead of yours, my note doesn't apply anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh duh true lol.

Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also probably need a better release note. Else, we will get back to this when we make the release.

Comment on lines 8069 to 8070
_, err := ss.Client.UnaryCall(ctx, &testpb.SimpleRequest{ResponseSize: 4194304})
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: combine the assignment and the conditional statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 8080 to 8081
_, err = ss.Client.UnaryCall(ctx, &testpb.SimpleRequest{ResponseSize: 275075})
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: combine the assignment and the conditional statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// this is testing for is an unexpected EOF with a status code
// INTERNAL so this is fine.
if code := status.Code(err); code != codes.ResourceExhausted {
t.Fatalf("unexpected err in UnaryCall: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you mention the received and expected codes in the error string.

	got, want := status.Code(err), codes.ResourceExhausted; got != want {
		t.Fatalf("UnaryCall RPC returned status code: %v, want %v", got, want)
	}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Printing the whole error is better than only printing the code, as there might be status message info that's useful:

		t.Fatalf("UnaryCall RPC returned error: %v, want status code %v", err, want)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to this:
if code := status.Code(err); code != codes.ResourceExhausted { t.Fatalf("UnaryCall RPC returned error: %v, want status code %v", err, codes.ResourceExhausted) }

// should work normally.
_, err = ss.Client.UnaryCall(ctx, &testpb.SimpleRequest{ResponseSize: 275075})
if err != nil {
t.Fatalf("unexpected err in UnaryCall: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer concise error messages:

t.Fatalf("UnaryCall RPC failed: %v", err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

@easwars easwars assigned zasweq and unassigned dfawley Jul 12, 2022
@@ -719,6 +718,11 @@ func (t *http2Client) NewStream(ctx context.Context, callHdr *CallHdr) (*Stream,
t.nextID += 2
s.id = h.streamID
s.fc = &inFlow{limit: uint32(t.initialWindowSize)}
t.mu.Lock()
if t.activeStreams != nil { // Can be niled from Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if t.activeStreams == nil {
	return false
}
t.activeStreams[s.id] = s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments :D! I'm sure some tests will fail from this stream change though give it a second to look at again.

// this is testing for is an unexpected EOF with a status code
// INTERNAL so this is fine.
if code := status.Code(err); code != codes.ResourceExhausted {
t.Fatalf("unexpected err in UnaryCall: %v", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to this:
if code := status.Code(err); code != codes.ResourceExhausted { t.Fatalf("UnaryCall RPC returned error: %v, want status code %v", err, codes.ResourceExhausted) }

@@ -719,6 +718,11 @@ func (t *http2Client) NewStream(ctx context.Context, callHdr *CallHdr) (*Stream,
t.nextID += 2
s.id = h.streamID
s.fc = &inFlow{limit: uint32(t.initialWindowSize)}
t.mu.Lock()
if t.activeStreams != nil { // Can be niled from Close()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

@zasweq zasweq assigned easwars and dfawley and unassigned zasweq Jul 12, 2022
@@ -1215,7 +1217,7 @@ func (t *http2Client) handleGoAway(f *http2.GoAwayFrame) {
default:
t.setGoAwayReason(f)
close(t.goAway)
t.controlBuf.put(&incomingGoAway{})
defer t.controlBuf.put(&incomingGoAway{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining that we defer because t.mu is held

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added "Defer as t.mu is currently held".

Comment on lines 8070 to 8072
// This check doesn't fail on a non status error. However, the main
// this is testing for is an unexpected EOF with a status code
// INTERNAL so this is fine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete now. It does fail on a non-status error since code would be Unknown in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops forgot deleted.

for i := 0; i < 10; i++ {
// exceeds grpc.DefaultMaxRecvMessageSize, this should error with
// RESOURCE_EXHAUSTED error.
if _, err := ss.Client.UnaryCall(ctx, &testpb.SimpleRequest{ResponseSize: 4194304}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_, err := ss.Client.UnaryCall()
if got := status.Code(err); got != codes.ResourceExhausted {
	t.Fatal
}

This way you catch the case where the call doesn't fail (code will be OK), unexpectedly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point switched.

@dfawley dfawley assigned zasweq and unassigned dfawley Jul 12, 2022
Copy link
Contributor Author

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments :D!

Comment on lines 8070 to 8072
// This check doesn't fail on a non status error. However, the main
// this is testing for is an unexpected EOF with a status code
// INTERNAL so this is fine.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops forgot deleted.

for i := 0; i < 10; i++ {
// exceeds grpc.DefaultMaxRecvMessageSize, this should error with
// RESOURCE_EXHAUSTED error.
if _, err := ss.Client.UnaryCall(ctx, &testpb.SimpleRequest{ResponseSize: 4194304}); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point switched.

@@ -1215,7 +1217,7 @@ func (t *http2Client) handleGoAway(f *http2.GoAwayFrame) {
default:
t.setGoAwayReason(f)
close(t.goAway)
t.controlBuf.put(&incomingGoAway{})
defer t.controlBuf.put(&incomingGoAway{})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added "Defer as t.mu is currently held".

@zasweq zasweq assigned dfawley and unassigned zasweq Jul 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gRPC connection closing with unexpected EOF on high-latency connections
3 participants