Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Check for context cancellation in more places #10487

Merged
merged 2 commits into from
Nov 7, 2016

Conversation

bdarnell
Copy link
Contributor

@bdarnell bdarnell commented Nov 6, 2016

#10427 saw a lot of commands that were canceled via their context while waiting in the command queue; it is possible that those contexts were canceled even earlier. If some process (like the GC queue) issued a bunch of commands on the same canceled context, those commands would all go into the command queue and have to wait on each other for O(n**2) worst-case performance (note that while this is a possibility, I don't find it a very convincing explanation for #10427 because I would expect more visible problems before the tipping point).

Guard against this by checking for context cancellation before the command queue. Also add a cancellation check to the unbounded retry loop in Store.


This change is Reviewable

@petermattis
Copy link
Collaborator

:lgtm:


Reviewed 1 of 1 files at r1.
Review status: 1 of 3 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/storage/replica.go, line 1249 at r2 (raw file):

      // command queue (and check again afterward). Once we're in the
      // command queue we're committed to waiting on our prerequisites
      // (which costs a goroutine, and slightly increases the cost of

@nvanbenschoten has a PR to fix this. Perhaps we should elevate its priority.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 6, 2016

:lgtm:


Reviewed 1 of 1 files at r1, 2 of 2 files at r2.
Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/storage/replica.go, line 1249 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

@nvanbenschoten has a PR to fix this. Perhaps we should elevate its priority.

https://github.com//pull/9448

pkg/storage/replica_test.go, line 2082 at r2 (raw file):

  }

  // Start a second command which will wait for the first.

"a second command" is confusing here, since this command is really the third. at least, this phrasing gave me pause, but maybe that's because i'm looking at a diff and not the whole test.

perhaps moving this before the starting of the already-canceled command would clarify things.


pkg/storage/store.go, line 2430 at r1 (raw file):

  // maximum retry count; return txn retry error for transactional
  // cases and the original error otherwise.
  if ctx.Err() != nil {

nit: slight preference for a binding scoped to this if


Comments from Reviewable

@bdarnell
Copy link
Contributor Author

bdarnell commented Nov 7, 2016

Review status: 0 of 3 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.


pkg/storage/replica.go, line 1249 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

#9448

That'll help keep us from running out of memory so quickly, but we're seeing so much contention on cmdQMu that I think we'd have problems even without the goroutine.

pkg/storage/replica_test.go, line 2082 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

"a second command" is confusing here, since this command is really the third. at least, this phrasing gave me pause, but maybe that's because i'm looking at a diff and not the whole test.

perhaps moving this before the starting of the already-canceled command would clarify things.

Changed second to third throughout.

pkg/storage/store.go, line 2430 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: slight preference for a binding scoped to this if

Done.

Comments from Reviewable

@bdarnell bdarnell merged commit 278e8b1 into cockroachdb:master Nov 7, 2016
@bdarnell bdarnell deleted the check-context branch November 7, 2016 03:56
@tamird
Copy link
Contributor

tamird commented Nov 7, 2016

Reviewed 3 of 3 files at r4.
Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants