Search code, repositories, users, issues, pull requests...

	kvTxn := txn.Sender().NewChildTransaction()
	childTxn := NewTxnFromProto(ctx, txn.db, txn.gatewayNodeID, kvTxn.ReadTimestamp, RootTxn, kvTxn)

I'm not quite sure what's going on with txn.mu.userPriority in this method, but I'm not sure it's going to behave correctly if kvTxn has a min or max priority.

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1153 at r3 (raw file):

	if paErr := (*roachpb.ParentAbortedError)(nil); errors.As(err, &paErr) &&
		paErr.ParentTxn.ID == txn.ID() {
		return roachpb.NewTransactionRetryWithProtoRefreshError(

Do we need to call txn.mu.sender.ManualRestart or GenerateForcedRetryableError to ensure that the parent txn's TxnCoordSender itself is restarted?

pkg/kv/txn.go, line 1332 at r3 (raw file):

	// a higher-level txn to be retried. We don't do this in any of the other
	// functions in DB; I guess we should.
	if errors.HasType(err, (*roachpb.TransactionRetryWithProtoRefreshError)(nil)) {

Shouldn't the child transaction have retried in childTxn.exec if it saw one of these errors?

Also, is errors.Wrapf enough to terminate the error? Won't errors.HasType still find the error cause?

pkg/kv/txn.go, line 1348 at r7 (raw file):

	// the child. This will also detect whether the child illegally wrote over
	// the parent's reads.
	return txn.Sender().ForwardToChild(ctx, childTxn.TestingCloneTxn())

nit: we shouldn't be using TestingCloneTxn in non-testing code.

pkg/kv/txn_external_test.go, line 229 at r3 (raw file):

// that requests issued in a child transaction carry the TxnMeta of the
// parent.
func TestChildTransactionMetadataPropagates(t *testing.T) {

Do we need another test for various retry conditions around child and parent txns?

pkg/kv/txn_external_test.go, line 362 at r4 (raw file):

	//
	// Without any transaction timeouts, the two possible outcomes should be
	// either that A and its child success or that B and its child success.

s/success/succeed/

pkg/kv/txn_external_test.go, line 365 at r4 (raw file):

	//
	// There are some extreme scenarios which can happen under transaction
	// expiration whereby A ends up getting aborted because of a timeout and A1

What kind of timeout?

pkg/kv/txn_external_test.go, line 369 at r4 (raw file):

	// could happen for both of them. We do not anticipate or allow these
	// scenarios.
	if (errA == nil && errB == nil) || (errA != nil && errB != nil) {

nit: (errA == nil) == (errB == nil)

	// TODO(ajwerner): Consider stripping information from the parent proto like
	// LockSpans or perhaps introducing a new proto altogether for parents. The
	// latter is likely to be onerous.

A TxnMeta doesn't work?

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1172 at r3 (raw file):

	// deadlock. It seems that priority levels should only increase and thus
	// we shouldn't have these scenarios so long as the priority starts at at
	// least that of the parent.

I don't quite understand this comment. Is it assuming that priorities immediately resolve pushes one way or another? That isn't really the case – they mostly just impact who gets aborted once a deadlock has been detected.

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1219 at r7 (raw file):

		for _, readSpan := range tc.interceptorAlloc.refreshFootprint.asSlice() {
			if writes.Overlaps(readSpan.AsRange()) {
				return errors.New("cannot forward provisional commit timestamp " +

If we refreshed the parent immediately to the new write timestamp, could we detect conflicts without all this?

Or does that not work for cases where the parent and child txns have the same timestamp? Actually, maybe that's fine for the same reason we can short circuit on child.WriteTimestamp.LessEq(tc.mu.txn.ReadTimestamp).

pkg/kv/kvserver/concurrency/concurrency_manager.go, line 268 at r6 (raw file):

			log.Fatalf(ctx, "%v", err)
		}
		for cur := g.Req.Txn; cur != nil && parentIntentError == nil; cur = cur.Parent {

Do we need this if we detect the conflict in the lock table? I guess we do in cases where the lock table is disabled (!added)?

If that's so, then should we detect this before even calling AddDiscoveredLock?

pkg/kv/kvserver/concurrency/lock_table.go, line 432 at r6 (raw file):

// the system.
//
//   case | parent | child

This is pretty interesting - and a little scary. How many of these combinations do we actually need to support? The first thing that jumps out to me is that this should probably be talking in terms of locking strength, not locking durability. It's understandable that you didn't do this because we only currently support a single locking strength.

Once we do that, I wonder if we can simplify this a little by saying that a lock held by a parent txn also implies that the lock is held by the child txn, up to the locking strength. So a child transaction only needs to acquire a lock that conflicts with one that its parent acquired if it needs a stronger lock.

Viewing this problem as one of lock promotion is nice because it pretty clearly raises the question of what happens when the parent shares a SHARED lock with another txn and then the child tries to upgrade to an exclusive lock. This creates the deadlock scenario that you've been thinking through.

It also raises the question of what to do if the parent already holds an exclusive lock (all four of these cases). In such cases, do we need to support the child being able to perform a write? It sounds like we don't for replicated/replicated. Do we need to for any other case?

pkg/kv/kvserver/concurrency/lock_table_test.go, line 50 at r6 (raw file):

  Creates a lockTable. The lockTable is initially enabled.

new-txn txn=<name> ts=<int>[,<int>] epoch=<int> [seq=<int>]

Add a comment about the new new-child-txn directive.

pkg/kv/kvserver/concurrency/lock_table_test.go, line 154 at r6 (raw file):

				guardsByReqName = make(map[string]lockTableGuard)
				return ""
			case "new-child-txn":

nit: drop this below new-txn

pkg/kv/kvserver/concurrency/testdata/lock_table/child_txn, line 2 at r6 (raw file):

# Test that we do not block on a parent's intent.
# Test that we do not block on a parent's read lock.

s/read/unreplicated exclusive/

pkg/kv/kvserver/concurrency/testdata/lock_table/child_txn, line 3 at r6 (raw file):

# Test that we do not block on a parent's intent.
# Test that we do not block on a parent's read lock.
# TODO(ajwerner): We need to test some of the more pernicious issues.

👍

pkg/kv/kvserver/txnwait/queue.go, line 140 at r4 (raw file):

	set := map[uuid.UUID]struct{}{}
	for _, push := range pt.waitingPushes {
		for cur := &push.req.PusherTxn; cur != nil; cur = cur.Parent {

Give this line a comment. It's pretty subtle.

pkg/kv/kvserver/txnwait/queue.go, line 484 at r4 (raw file):

		readyCh = make(chan struct{}, 1)
		queryPusherCh, queryPusherErrCh = q.startQueryPusherTxn(ctx, push, readyCh)
		// Ensure that the pusher querying goroutine is complete at exit.

Let's keep/adapt this comment.

pkg/kv/kvserver/txnwait/queue.go, line 500 at r4 (raw file):

		for i, cur := 0, &req.PusherTxn; cur != nil; cur = cur.Parent {
			if cur.ID == uuid.Nil || !cur.IsLocking() {

nit: consider pulling this into a helper and sharing that with getNumPusherTxnsToTrack.

pkg/kv/kvserver/txnwait/queue.go, line 653 at r4 (raw file):

					// not the immediate parent? Ideally we'd abort all of the parents.
					// I think this will work so long as we propagate this error and
					// only convert it to a retry at the relevant parent.

Yeah, that's how I figured this would work too. But we'll need to make sure of it in the client.

pkg/kv/kvserver/txnwait/queue.go, line 683 at r4 (raw file):

			// TODO(ajwerner): What should be the semantics here? Why would we release
			// if we're going to forcePush ourselves? Should we release if it isn't
			// the pusher which is updated but rather some parent?

I think we still want to release if it isn't the pusher which is updated, because we may still have noticed new dependents.

pkg/kv/kvserver/txnwait/queue.go, line 705 at r4 (raw file):

			}
			// Signal the pusher query txn loop to continue.
			// TODO(ajwerner): Do we need all of these to be notified?

It doesn't look like it. Only the goroutine that sent on queryPusherCh needs to be notified.

pkg/roachpb/data.proto, line 423 at r1 (raw file):

  //
  // TODO(ajwerner): come back and describe why we want a TxnMeta here as it
  // pertains to read visibility of the parent.

Do we want a TxnMeta or a Transaction?

pkg/roachpb/errors.proto, line 482 at r2 (raw file):

// behavior.
message ParentAbortedError {
  optional roachpb.Transaction parent_txn = 1 [(gogoproto.nullable) = false];

Is this the direct parent or the ancestor that was aborted? If the latter (which I suspect it is so that we can catch the retry error at the right level), should we rename this to AncestorAbortedError? Or at least spell this all out?

pkg/storage/mvcc_history_test.go, line 1007 at r5 (raw file):

}

func (e *evalCtx) newChildTxn(

nit: move this below newTxn

pkg/storage/pebble_mvcc_scanner.go, line 471 at r5 (raw file):

	if epoch < p.meta.Txn.Epoch {
		// 12. We're reading our own txn's intent but the current txn has

12?

pkg/storage/pebble_mvcc_scanner.go, line 489 at r5 (raw file):

// ownsIntent return true if the current intent is owned by the reading
// transaction or one of its parents. If it is owned by the reading transaction,

The second sentence makes it sound like the return values are only set if the intent is owned by the reading transaction and not one of its parents. Consider updating this.

ajwerner

Thanks for taking a look. Generally agree with your testing comments. Also agree with being more explicit about some properties in comments. Curious to explore some locking upgrade concepts to make that code clearer.

I'll come back and take another pass on the code after we discuss some of the things here.

I think the most controversial things were:

The protobuf design - using the transaction directly and chaining them
The forwarding APIs - it started as simply as you imagined but being able to detect the difference between a true refresh failure and a real problem due to parent-child conflicts drove me to this API
The various error types and handling in:
- lock table when encountering a parent's write in a write-write conflict (or read lock-write conflict I suppose)
- When the child detects an aborted ancestor in the txnwait.Queue

Another note is that we've come up with a case where we think we're going to want to use these child transactions in 21.1.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

pkg/kv/sender.go, line 302 at r7 (raw file):
If the parent read the keys and then the child writes, it seems that the parent should be doomed to fail. In particular, we want to make sure that the parent never causally precedes the child by the time it commits. If the parent read, then the child wrote, the child will get pushed above the parent because of the timestamp cache. That will force the parent to refresh here leading to certain failure.

If so, does that simplify the lock table handling?

What do you mean?

pkg/kv/sender.go, line 303 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

My immediate reaction to this signature is "why isn't this ForwardTimestampAndRefresh?"

It originally was however there's a funny cases I ran into making sure we can detect bad client behavior.

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm not quite sure what's going on with txn.mu.userPriority in this method, but I'm not sure it's going to behave correctly if kvTxn has a min or max priority.

The child needs to have priority at least as high as its parent. Otherwise there are some funny live-lock cases that can arise where children get aborted but parents don't and these things end up looping.

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1153 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we need to call txn.mu.sender.ManualRestart or GenerateForcedRetryableError to ensure that the parent txn's TxnCoordSender itself is restarted?

ManualRestart I think makes sense. It felt like the SQL layer was dealing with this properly but I may just be misunderstanding how this should work.

pkg/kv/txn.go, line 1332 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Shouldn't the child transaction have retried in childTxn.exec if it saw one of these errors?

Also, is errors.Wrapf enough to terminate the error? Won't errors.HasType still find the error cause?

I agree this is a problem. I don't really know what error should be returned here. I don't think the parent can handle this.

pkg/kv/txn.go, line 1348 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: we shouldn't be using TestingCloneTxn in non-testing code.

Agreed, I think I want to make it non-testing. Any issues with that or alternative suggestions?

pkg/kv/txn_external_test.go, line 229 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we need another test for various retry conditions around child and parent txns?

Surely

pkg/kv/txn_external_test.go, line 365 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What kind of timeout?

expiration like because heartbeats aren't working

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

A TxnMeta doesn't work?

Well, for one, we need the ignored sequence numbers and we need to propagate the entire chain of ancestors. My other concern was that it would make the code that walks the ancestor chain messier.

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1172 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't quite understand this comment. Is it assuming that priorities immediately resolve pushes one way or another? That isn't really the case – they mostly just impact who gets aborted once a deadlock has been detected.

Correct, this is entirely in the context of deadlocks. Nevertheless, imagine the above scenario. When deciding to break a deadlock, it's the pusher that makes the determination to do the force push. If both pushers have lower priorities than the transactions which they are pushing, then there's a problem, right?

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1219 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we refreshed the parent immediately to the new write timestamp, could we detect conflicts without all this?

Or does that not work for cases where the parent and child txns have the same timestamp? Actually, maybe that's fine for the same reason we can short circuit on child.WriteTimestamp.LessEq(tc.mu.txn.ReadTimestamp).

I originally just pushed the parent's write timestamp. That will eventually force a refresh. The problem is that if that refresh fails, we won't know whether that's a legitimate serializable restart or a programming bug (this case). Because we can't know which transaction caused the refresh failure, we can't just refresh unless we want all refreshes due to child pushes always fail, which seems unacceptable.

A test that I wrote ran into this where the parent transaction just retried forever when a child had invalidated one of its reads.

One way out of this would be if we forced parents to run in a pessimistic mode with replicated locks on all reads. Then we could detect any such conflict before it happened rather than after. Of course, that doesn't seem like something that's going to happen.

pkg/kv/kvserver/concurrency/concurrency_manager.go, line 268 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we need this if we detect the conflict in the lock table? I guess we do in cases where the lock table is disabled (!added)?

If that's so, then should we detect this before even calling AddDiscoveredLock?

I feel like I'm missing something here.

pkg/kv/kvserver/concurrency/lock_table.go, line 432 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is pretty interesting - and a little scary. How many of these combinations do we actually need to support? The first thing that jumps out to me is that this should probably be talking in terms of locking strength, not locking durability. It's understandable that you didn't do this because we only currently support a single locking strength.

Once we do that, I wonder if we can simplify this a little by saying that a lock held by a parent txn also implies that the lock is held by the child txn, up to the locking strength. So a child transaction only needs to acquire a lock that conflicts with one that its parent acquired if it needs a stronger lock.

Viewing this problem as one of lock promotion is nice because it pretty clearly raises the question of what happens when the parent shares a SHARED lock with another txn and then the child tries to upgrade to an exclusive lock. This creates the deadlock scenario that you've been thinking through.

It also raises the question of what to do if the parent already holds an exclusive lock (all four of these cases). In such cases, do we need to support the child being able to perform a write? It sounds like we don't for replicated/replicated. Do we need to for any other case?

Fundamentally I think if the parent holds an exclusive lock then we should have an assertion failure. I had a hard time understanding how to make that happen so instead I allowed it to happen and then forced the assertion failure in other places. I'd love to make this interaction more explicit and fix any problems here.

pkg/roachpb/data.proto, line 423 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we want a TxnMeta or a Transaction?

Right, s/TxnMeta/Transaction/.

Discussed elsewhere but the issues are the ignored sequence numbers and also the chain of ancestors.

ajwerner

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1219 at r7 (raw file):

Previously, ajwerner wrote…

I originally just pushed the parent's write timestamp. That will eventually force a refresh. The problem is that if that refresh fails, we won't know whether that's a legitimate serializable restart or a programming bug (this case). Because we can't know which transaction caused the refresh failure, we can't just refresh unless we want all refreshes due to child pushes always fail, which seems unacceptable.

A test that I wrote ran into this where the parent transaction just retried forever when a child had invalidated one of its reads.

One way out of this would be if we forced parents to run in a pessimistic mode with replicated locks on all reads. Then we could detect any such conflict before it happened rather than after. Of course, that doesn't seem like something that's going to happen.

Alright, thinking a bit more about this, it seems like if the parent is forced to do a refresh before the child commits and proves that that succeeds, then it feels like that could just work. It might be the case that the child gets pushed even further between the refresh and its commit but that's okay because we'll have a proof that the child didn't invalidate any of the parent reads. This also would be better because then the child wouldn't have to commit first.

This would require:

Building the machinery to force the parent to refresh eagerly
Make sure to propagate a reasonable error when that refresh hits a child's intent
Find a way to ensure that the EndTxn cannot happen before the parent does its stuff. Perhaps the way that happens is by having the child require that the read timestamp of the parent is at least as high as the timestamp of any intent it has written. I don't know how trivial this will be.

ajwerner

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1219 at r7 (raw file):

Previously, ajwerner wrote…

Alright, thinking a bit more about this, it seems like if the parent is forced to do a refresh before the child commits and proves that that succeeds, then it feels like that could just work. It might be the case that the child gets pushed even further between the refresh and its commit but that's okay because we'll have a proof that the child didn't invalidate any of the parent reads. This also would be better because then the child wouldn't have to commit first.

This would require:

Building the machinery to force the parent to refresh eagerly

Make sure to propagate a reasonable error when that refresh hits a child's intent

Find a way to ensure that the EndTxn cannot happen before the parent does its stuff. Perhaps the way that happens is by having the child require that the read timestamp of the parent is at least as high as the timestamp of any intent it has written. I don't know how trivial this will be.

If we do want to push and refresh prior to committing the child, we'll need a way to do that for all of the ancestors. Perhaps this implies a need for tigher coupling of children to parents inside of the txnCoordSender.

nvanbenschoten

The protobuf design - using the transaction directly and chaining them

I'm 👍 on this.

The forwarding APIs - it started as simply as you imagined but being able to detect the difference between a true refresh failure and a real problem due to parent-child conflicts drove me to this API

Yeah, I'm still not clear what the best option for this is either. It's a tricky problem. I like the idea of using refreshing to detect the conflicts so that we don't need to rely on the client having a precise memory of parent reads and child writes, but getting that to work properly is tough.

The various error types and handling in:
lock table when encountering a parent's write in a write-write conflict (or read lock-write conflict I suppose)

Seems like this should just be an assertion failure, right?

When the child detects an aborted ancestor in the txnwait.Queue

I like the solution you've come up with here for this problem (ParentAbortedError + client-side unwinding).

Reviewed 1 of 1 files at r8, 3 of 3 files at r9.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

pkg/kv/sender.go, line 302 at r7 (raw file):

What do you mean?

This was in regard to the discussion we were having in lock_table.go. I thought that we had to handle complex cases like Shared locks held by the parent being promoted to Exclusive locks by the child, but it sounds like we don't. In that case, we can throw an error in the lockTable whenever we find conflicts between ancestors, regardless of locking strength.

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1153 at r3 (raw file):

Previously, ajwerner wrote…

The child needs to have priority at least as high as its parent. Otherwise there are some funny live-lock cases that can arise where children get aborted but parents don't and these things end up looping.

Right, so what's up with txn.mu.userPriority = roachpb.NormalUserPriority in NewTxnFromProto? Is that going to break things?

pkg/kv/txn.go, line 1348 at r7 (raw file):

Previously, ajwerner wrote…

Agreed, I think I want to make it non-testing. Any issues with that or alternative suggestions?

Nope, that SGTM.

pkg/kv/txn_external_test.go, line 365 at r4 (raw file):

Previously, ajwerner wrote…

expiration like because heartbeats aren't working

👍 maybe add that as an example here.

Previously, ajwerner wrote…

Well, for one, we need the ignored sequence numbers and we need to propagate the entire chain of ancestors. My other concern was that it would make the code that walks the ancestor chain messier.

Ack. I thought the ignored sequence numbers were in the TxnMeta, but it appears that they are not.

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1172 at r3 (raw file):

Previously, ajwerner wrote…

Correct, this is entirely in the context of deadlocks. Nevertheless, imagine the above scenario. When deciding to break a deadlock, it's the pusher that makes the determination to do the force push. If both pushers have lower priorities than the transactions which they are pushing, then there's a problem, right?

Yeah, I see what you're saying. So it is not safe for a child to have a lower priority than its parent. I guess that intuitively makes sense from a behavioral standpoint, but it's a good catch that this could actually cause a stall in deadlock resolution.

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1219 at r7 (raw file):

If we do want to push and refresh prior to committing the child, we'll need a way to do that for all of the ancestors.

We have this problem with the current approach as well, right? We're calling ForwardToChild on the child's immediate parent, but we should really be doing this on all of the child's ancestors.

The idea of refreshing each of a child's ancestors before the child commits seems pretty complex, but your point about infinite retries is valid. Is the idea that refreshing earlier gives us the opportunity to detect conflicts while the child's writes are still intents (i.e. still contain its ID) and therefore can still be distinguished from other transactions' intents?

pkg/kv/kvserver/concurrency/concurrency_manager.go, line 268 at r6 (raw file):

Previously, ajwerner wrote…

I feel like I'm missing something here.

Me too. My thought was that if we detect the illegal conflict in the lock table already, why do so here as well? We just added the intent to the lock table and we're just about to go back and scan it again.

pkg/kv/kvserver/concurrency/lock_table.go, line 432 at r6 (raw file):

Previously, ajwerner wrote…

Fundamentally I think if the parent holds an exclusive lock then we should have an assertion failure. I had a hard time understanding how to make that happen so instead I allowed it to happen and then forced the assertion failure in other places. I'd love to make this interaction more explicit and fix any problems here.

Well all locks today are exclusive, regardless of whether they're replicated or not. I agree that we should try to make this interaction more explicit.

ajwerner

Thanks for the look. I'm thinking that the biggest sticking point now is the forwarding behavior. I've commented on that thread. I think we should rework it to do a preemptive refresh of all ancestors prior to the committing of the child. We'll still need to forward the immediate ancestor's read timestamp after, just in case the commit still gets pushed, but we don't need that to be preemptive. Any other thoughts before I move in that direction?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1219 at r7 (raw file):
Yeah, that seems right. We currently aren't detecting issues where a grandchild invalidates a grand-parent's read. This seems like a hint that this deferred checking isn't going to cut it.

Is the idea that refreshing earlier gives us the opportunity to detect conflicts while the child's writes are still intents (i.e. still contain its ID) and therefore can still be distinguished from other transactions' intents?

Exactly. I agree it's pretty complex but it feels like the right thing to do. I'm okay with the child committing and the parent not, there's no way to avoid that. However, I'm not that keen on the possibility of the child committing and the parent not in the case of this egregious assertion failure. I'm starting to think that the right way to spin this is to do some coupling of the child TxnCoordSender to its parent.

Are you okay with my moving forward with the child TxnCoordSender holding a reference to the parent and having an interceptor which grabs the EndTxn and forces the parents to refresh? That will simplify the lifecycle handling in the kv layer and ensure that any of these bogus conflicts are detected.

nvanbenschoten

Any other thoughts before I move in that direction?

I don't think so. Let's move this PR in that direction and see how it looks.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1219 at r7 (raw file):

Previously, ajwerner wrote…

Yeah, that seems right. We currently aren't detecting issues where a grandchild invalidates a grand-parent's read. This seems like a hint that this deferred checking isn't going to cut it.

Is the idea that refreshing earlier gives us the opportunity to detect conflicts while the child's writes are still intents (i.e. still contain its ID) and therefore can still be distinguished from other transactions' intents?

Exactly. I agree it's pretty complex but it feels like the right thing to do. I'm okay with the child committing and the parent not, there's no way to avoid that. However, I'm not that keen on the possibility of the child committing and the parent not in the case of this egregious assertion failure. I'm starting to think that the right way to spin this is to do some coupling of the child TxnCoordSender to its parent.

Are you okay with my moving forward with the child TxnCoordSender holding a reference to the parent and having an interceptor which grabs the EndTxn and forces the parents to refresh? That will simplify the lifecycle handling in the kv layer and ensure that any of these bogus conflicts are detected.

So is the proposal that we'll augment RefreshRequest and RefreshRangeRequest with a new "invalid intent ID" field? Or introduce a structured error for it to return in cases where a conflicting value/intent is detected (with a way to distinguish between the two) and then detect the illegal conflicts in the client? All things considered, the second option seems cleaner to me.

Other than that, I think child TxnCoordSenders holding a reference to their parent seems ok. Or at least, some slim handle to avoid the possibility of too much coupling.

ajwerner

I don't think so. Let's move this PR in that direction and see how it looks.

👍 let's see how it goes

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1219 at r7 (raw file):

Or introduce a structured error for it to return in cases where a conflicting value/intent is detected (with a way to distinguish between the two) and then detect the illegal conflicts in the client?

A structured error for sure.

Or at least, some slim handle to avoid the possibility of too much coupling.

Yep, I was thinking the child would hold onto an interface that allows it to preemptively push the parent and run a refresh.

ajwerner

@nvanbenschoten I reworked the pushing bits to have the child push its ancestors. It's honestly pretty messy. I haven't yet gone back to deal with the lock table. I ended up augmenting the refresh requests to include the set of descendants. It's not terrible. I'd love to get your take on this as it is before making another pass. The interfaces are not making me very happy. You tend to have a way with these things.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table_test.go, line 154 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Right, so what's up with txn.mu.userPriority = roachpb.NormalUserPriority in NewTxnFromProto? Is that going to break things?

Done.

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: drop this below new-txn

Done.

pkg/storage/mvcc_history_test.go, line 1007 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: move this below newTxn

Done.

pkg/storage/pebble_mvcc_scanner.go, line 471 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

12?

Done.

pkg/storage/pebble_mvcc_scanner.go, line 489 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The second sentence makes it sound like the return values are only set if the intent is owned by the reading transaction and not one of its parents. Consider updating this.

Done.

nvanbenschoten

One thing I'm trying to determine is whether we need to take into consideration server-side refreshes, which could raise the commit timestamp of a child transaction after the point at which we refresh all ancestors. If that's an issue, then this kind of pre-commit hook isn't going to work out.

I see why we arrived where we did here, but it's all starting to become pretty heavyweight. This makes me think you may have had things right earlier on, when you compared the write footprint of the child transaction after committing with the refresh spans of each ancestor transaction and called it a day. Since this is all in service of detecting programming errors, it feels a little wrong to be doing something so expensive and to add so much new logic in. Which way are you leaning on all of this? It sounds like you aren't too happy with the recursion in the txn_coord_sender.

Reviewed 1 of 29 files at r10, 3 of 6 files at r12, 3 of 4 files at r15, 3 of 4 files at r17, 3 of 4 files at r18, 1 of 1 files at r19, 1 of 2 files at r20.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 183 at r20 (raw file):

		return nil
	}
	updated, err := tc.interceptorAlloc.forceRefreshLocked(ctx, txn, childrenIDs...)

nit here and below: calling the method/access fields on the interceptorAlloc itself is pretty confusing. Consider referencing the specific interceptor directly, like we do elsewhere in this file.

pkg/roachpb/api.proto, line 1591 at r20 (raw file):

  // their descendants. Encountering such an intent must trigger an assertion
  // failure as it is a programming error.
  repeated bytes descendant_txns = 4 [(gogoproto.customtype) = "github.com/cockroachdb/cockroach/pkg/util/uuid.UUID"];

If we're already returning the conflicting intent in a structured RefreshFailedError, why do we also need to send this in the request itself? Can't we make a determination about this on the client by looking at the RefreshFailedError?

ajwerner

One thing I'm trying to determine is whether we need to take into consideration server-side refreshes, which could raise the commit timestamp of a child transaction after the point at which we refresh all ancestors. If that's an issue, then this kind of pre-commit hook isn't going to work out.

Makes sense. I was worried about something like that and was thinking about splitting off the EndTxn to deal with it. This is especially a problem if the final batch of contains a write that would do the invalidating. I'm somewhat unhappy with the post-commit hook to forward the parent. I'm not upset about the recursion per se, but I'm upset that the recursion potentially runs recursive RPCs.

What if we did a hybrid of all of this where the child hangs on to a handle to its parent (and thus all of its ancestors) and then, upon commit EndTxn, ensures that the set of writes do not invalidate any parent reads. It's almost the same as the old approach except that it removes the crufty API from the kv.TxnSender.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

pkg/roachpb/api.proto, line 1591 at r20 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we're already returning the conflicting intent in a structured RefreshFailedError, why do we also need to send this in the request itself? Can't we make a determination about this on the client by looking at the RefreshFailedError?

The issue is that there might be multiple violating intents and we want to make sure to discover the right one. That might just be premature optimization given this is all about detecting a programming error.

nvanbenschoten

What if we did a hybrid of all of this where the child hangs on to a handle to its parent (and thus all of its ancestors) and then, upon commit EndTxn, ensures that the set of writes do not invalidate any parent reads. It's almost the same as the old approach except that it removes the crufty API from the kv.TxnSender.

This approach sounds reasonable to me.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

pkg/roachpb/api.proto, line 1591 at r20 (raw file):

Previously, ajwerner wrote…

The issue is that there might be multiple violating intents and we want to make sure to discover the right one. That might just be premature optimization given this is all about detecting a programming error.

Yeah, I would think that we would only need to reject due to a programming error if we only ran into our child's intent. Otherwise, we're also seeing a real conflict, so we're either not in an infinite retry loop (meaning we'll find the programming error after a retry) or we have other real issues.

ajwerner

Alright, this is RFAL.

The lock table stuff is now simplified and more explicit but it still deserves more scrutiny. Maybe I can get @sumeerbhola to sign off.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/txn.go, line 1332 at r3 (raw file):

Previously, ajwerner wrote…

I agree this is a problem. I don't really know what error should be returned here. I don't think the parent can handle this.

I just removed this. It wasn't doing anything valuable.

pkg/kv/txn_external_test.go, line 362 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/success/succeed/

Done.

pkg/kv/txn_external_test.go, line 365 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

👍 maybe add that as an example here.

Done.

pkg/kv/txn_external_test.go, line 369 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: (errA == nil) == (errB == nil)

Done.

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 1172 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Yeah, I see what you're saying. So it is not safe for a child to have a lower priority than its parent. I guess that intuitively makes sense from a behavioral standpoint, but it's a good catch that this could actually cause a stall in deadlock resolution.

This has been pulled into the TxnCoordSender.

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 183 at r20 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit here and below: calling the method/access fields on the interceptorAlloc itself is pretty confusing. Consider referencing the specific interceptor directly, like we do elsewhere in this file.

No longer relevant.

pkg/kv/kvserver/concurrency/concurrency_manager.go, line 268 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Me too. My thought was that if we detect the illegal conflict in the lock table already, why do so here as well? We just added the intent to the lock table and we're just about to go back and scan it again.

I moved this above adding the discovered lock. I do think this still needs to be here as the lock table doesn't have a complete view.

pkg/kv/kvserver/concurrency/lock_table.go, line 432 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Well all locks today are exclusive, regardless of whether they're replicated or not. I agree that we should try to make this interaction more explicit.

Okay, made it all more explicit.

pkg/kv/kvserver/concurrency/lock_table_test.go, line 50 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Add a comment about the new new-child-txn directive.

Done.

pkg/kv/kvserver/concurrency/testdata/lock_table/child_txn, line 3 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

👍

Added some more testing and clarified the behavior here.

pkg/kv/kvserver/txnwait/queue.go, line 140 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Give this line a comment. It's pretty subtle.

Done.

pkg/kv/kvserver/txnwait/queue.go, line 484 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's keep/adapt this comment.

Done.

pkg/kv/kvserver/txnwait/queue.go, line 500 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: consider pulling this into a helper and sharing that with getNumPusherTxnsToTrack.

Done.

pkg/kv/kvserver/txnwait/queue.go, line 683 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think we still want to release if it isn't the pusher which is updated, because we may still have noticed new dependents.

Removed the comment.

pkg/kv/kvserver/txnwait/queue.go, line 705 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It doesn't look like it. Only the goroutine that sent on queryPusherCh needs to be notified.

Done.

pkg/roachpb/api.proto, line 1591 at r20 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Yeah, I would think that we would only need to reject due to a programming error if we only ran into our child's intent. Otherwise, we're also seeing a real conflict, so we're either not in an infinite retry loop (meaning we'll find the programming error after a retry) or we have other real issues.

This is gone.

pkg/roachpb/errors.proto, line 482 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this the direct parent or the ancestor that was aborted? If the latter (which I suspect it is so that we can catch the retry error at the right level), should we rename this to AncestorAbortedError? Or at least spell this all out?

Done.

sumeerbhola

Reviewed 1 of 4 files at r25, 1 of 7 files at r26, 1 of 6 files at r27.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

pkg/kv/txn.go, line 1291 at r28 (raw file):

// from recoverable internal errors, and is automatically committed
// otherwise. The retryable function should have no side effects which could
// cause problems in the event it must be run more than once.

does this "run more than once" include "run and committed more than once", i.e., do the writes need to be idempotent?

pkg/kv/txn.go, line 1300 at r28 (raw file):

//
//   * No concurrent operations may be performed on the parent transaction.
//   * The child transaction must not invalidate any reads of the parent.

so it can't write to any keys that the parent has already read? Is this beyond the protection that the ts cache gives us? I am assuming this includes the case where the child txn is writing at a higher timestamp, which the ts cache will allow, but we are explicitly saying it should not do so.

But it could write to keys that the parent will read after the child commits?

Without some documented constraints on the "read set compression", I am not sure how the child txn can do any writes. Can we document them here?

What if the parent has acquired an unreplicated exclusive lock that has been forgotten. We can't guarantee an assertion failure.

In practice do we need many levels of ancestors or will this be used with only 2-levels? If the latter, can the overall code be simplified under that constraint.

pkg/kv/txn.go, line 1318 at r28 (raw file):

//     if the parent is aborted during deadlock detection and the child observes
//     that it has been aborted, the child will also be aborted and the parent
//     will encounter a retry.

This behavior is not needed for correctness, yes?

pkg/kv/kvserver/concurrency/concurrency_manager.go, line 268 at r6 (raw file):

//     timestamp of the parent.
//   * The child may commit values which the parent subsequently reads. This is
//     fine and can be desirable.

These committed values are observable by other unrelated txns that are concurrent, yes?
So is the only thing that makes the parent-child relationship special the one-way information transfer from the parent to the child of writes by the parent being observed by the child? The other direction of information is just whether the child committed or not, which the parent would know about if it used a non-child transaction?

(I'm asking things that were probably documented in the RFC, but I've swapped out some of that state, and I am guessing some things may have changed since the RFC)

Previously, ajwerner wrote…

I moved this above adding the discovered lock. I do think this still needs to be here as the lock table doesn't have a complete view.

This seems reasonable.
Will the parentIntentError cause this Request to stop evaluating and call FinishReq? Do we want the call to AddDiscoveredLock when this error occurs?

pkg/kv/kvserver/concurrency/lock_table.go, line 454 at r28 (raw file):

// will result in an assertion failure error. This is consistent with the
// rules of child transactions which state that descendants my not acquire
// exclusive locks or write underneath the reads or writes of any ancestors.

What if the descendent is writing above a read of the ancestor? We can't detect it here, yes?

Since the lock table is incomplete wrt both replicated and unreplicated locks, and this write-write conflict seems mainly an error check against our internal use of child transactions, I am wondering whether we need to make any changes to the lock table at all and just treat an ancestor txn lock as equivalent to the child txn already holding the lock. Then the evaluation will pick up any errors via the parentIntentError path.

We're also introducing optimistic locking where the checking happens after evaluation #58670, so I'm trying to see what changes are essential to the lock table and what are optional (to reduce its complexity as much as possible).

I'm probably missing some subtlety here.

pkg/storage/pebble_mvcc_scanner.go, line 92 at r28 (raw file):

// Go port of mvccScanner in libroach/mvcc.h. Stores all variables relating to
// one MVCCGet / MVCCScan call.
type pebbleMVCCScanner struct {

Do we only need changes to pebbleMVCCScanner in the MVCC path since the only special handling for child txns is the write => read data transfer from parent to child?
I keep forgetting that even MVCCGet uses pebbleMVCCScanner. The code comment for this struct is both dated and insufficient. Could you add another sentence saying that pebbleMVCCScanner must be used for all Gets and Scans since it correctly handles child transactions.

pkg/storage/pebble_mvcc_scanner.go, line 373 at r28 (raw file):

	}

	ownIntent, epoch, seqNum, ignoredSeqNums := p.ownsIntent()

How does the epoch and seqNum of the child txn relate to that of the parent? I am guessing it is derived from the parent since we are continuing to do comparisons like seqNum >= p.meta.Txn.Sequence and epoch == p.meta.Txn.Epoch. Is this already documented elsewhere in the code (I didn't look at the whole PR)?

And this works because of the "No concurrent operations may be performed on the parent transaction"?

ajwerner

@sumeerbhola I'm replying to your comments before addressing the ones which indicate the clear action to keep the conversation going.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, and @sumeerbhola)

pkg/kv/txn.go, line 1291 at r28 (raw file):

Previously, sumeerbhola wrote…

does this "run more than once" include "run and committed more than once", i.e., do the writes need to be idempotent?

No, it does not mean run and commit more than once. It's trying to express the same behavior as (*kv.DB).Txn().

pkg/kv/txn.go, line 1300 at r28 (raw file):

Without some documented constraints on the "read set compression", I am not sure how the child txn can do any writes. Can we document them here?

This is a fair and valid point. If the refresh_span_bytes were to be set to zero, child transactions which get pushed can never commit. If the child never gets pushed, there's no problem. If the child were to invalidate the write of an ancestor, it would have to be pushed.

If I read the span condensing logic correctly, we'll never add span bounds which we've not added. For example, we won't pick up spans which are outside of the spans which were read. Say we had AB-AC,AD-AE,AF-AG We might compress that to AB-AG but I don't think we'll ever change the bounds to A-B.

The principle this ends up with is that if you want to be safe, never write to any of the same table spans in child transactions that the parent has read from and even then don't read from tables which might be in-between tables that the parent has read from. This ends up being both quite restrictive and yet good enough for the use-cases I currently envision. We could probably tighten this up to change the span compression logic to not cross table boundaries.

In practice do we need many levels of ancestors or will this be used with only 2-levels? If the latter, can the overall code be simplified under that constraint.

I don't have an immediate use-case for more than one layer. @nvanbenschoten proposed the generalization. I'm not sure that it does too much to simplify any of the code.

pkg/kv/txn.go, line 1318 at r28 (raw file):

Previously, sumeerbhola wrote…

This behavior is not needed for correctness, yes?

Correct.

pkg/kv/kvserver/concurrency/concurrency_manager.go, line 268 at r6 (raw file):

These committed values are observable by other unrelated txns that are concurrent, yes?

Right

So is the only thing that makes the parent-child relationship special the one-way information transfer from the parent to the child of writes by the parent being observed by the child? The other direction of information is just whether the child committed or not, which the parent would know about if it used a non-child transaction?

Yes, well that and the parent will be pushed to the child's commit timestamp so that it cannot commit under the parent. Ideally if code needs information to transfer from the child back to the parent, it will do it through the closure.

Previously, sumeerbhola wrote…

This seems reasonable.
Will the parentIntentError cause this Request to stop evaluating and call FinishReq? Do we want the call to AddDiscoveredLock when this error occurs?

Before this most recent revision, we were calling AddDiscoveredLock. I don't think it's a correctness problem either way, right?

pkg/kv/kvserver/concurrency/lock_table.go, line 454 at r28 (raw file):

Previously, sumeerbhola wrote…

What if the descendent is writing above a read of the ancestor? We can't detect it here, yes?

Since the lock table is incomplete wrt both replicated and unreplicated locks, and this write-write conflict seems mainly an error check against our internal use of child transactions, I am wondering whether we need to make any changes to the lock table at all and just treat an ancestor txn lock as equivalent to the child txn already holding the lock. Then the evaluation will pick up any errors via the parentIntentError path.

We're also introducing optimistic locking where the checking happens after evaluation #58670, so I'm trying to see what changes are essential to the lock table and what are optional (to reduce its complexity as much as possible).

I'm probably missing some subtlety here.

We detect the descendant writing over an ancestor's read at commit time by comparing the writes of the descendant to the read spans of its ancestors. All of this is just a protection against programming errors so I'd like it to be minimally disruptive.

pkg/storage/pebble_mvcc_scanner.go, line 92 at r28 (raw file):

Previously, sumeerbhola wrote…

Do we only need changes to pebbleMVCCScanner in the MVCC path since the only special handling for child txns is the write => read data transfer from parent to child?
I keep forgetting that even MVCCGet uses pebbleMVCCScanner. The code comment for this struct is both dated and insufficient. Could you add another sentence saying that pebbleMVCCScanner must be used for all Gets and Scans since it correctly handles child transactions.

Will do.

pkg/storage/pebble_mvcc_scanner.go, line 373 at r28 (raw file):
I'll comment ownsIntent() to better explain what's going on. The idea is that the descendant is viewing the parent's writes as though it were the ancestor as of the snapshot of the ancestor when the child is created.

And this works because of the "No concurrent operations may be performed on the parent transaction"?

Indeed.

nvanbenschoten

The approach looks good to me now! Thanks for iteratively refining this.

Reviewed 2 of 32 files at r21, 6 of 6 files at r22, 3 of 7 files at r23, 3 of 4 files at r24, 4 of 4 files at r25, 6 of 7 files at r26, 5 of 6 files at r27, 2 of 2 files at r28.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, and @sumeerbhola)

Previously, ajwerner wrote…

ManualRestart I think makes sense. It felt like the SQL layer was dealing with this properly but I may just be misunderstanding how this should work.

Did you get a chance to confirm that the SQL layer is handling this correctly?

pkg/kv/txn.go, line 1318 at r28 (raw file):

Previously, sumeerbhola wrote…

This behavior is not needed for correctness, yes?

I don't think so. It would be impossible to guarantee.

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 159 at r28 (raw file):

Previously, sumeerbhola wrote…

These committed values are observable by other unrelated txns that are concurrent, yes?
So is the only thing that makes the parent-child relationship special the one-way information transfer from the parent to the child of writes by the parent being observed by the child? The other direction of information is just whether the child committed or not, which the parent would know about if it used a non-child transaction?

(I'm asking things that were probably documented in the RFC, but I've swapped out some of that state, and I am guessing some things may have changed since the RFC)

The other thing that makes the relationship special is its impact on deadlock detection between transactions. Parent transactions have an implicit dependency on their children transactions.

pkg/kv/txn_external_test.go, line 233 at r28 (raw file):

func TestChildTransactionMetadataPropagates(t *testing.T) {
	defer leaktest.AfterTest(t)()
	//defer log.Scope(t).Close(t)

Did you mean to uncomment this?

both in the txnSpanRefresher

This part isn't true anymore, right?

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 170 at r28 (raw file):

//     error.
//
// Note that if the commitCommitTimestamp is equal to the parent's current

childCommitTimestamp

pkg/kv/kvclient/kvcoord/txn_coord_sender.go, line 828 at r28 (raw file):

		tc.mu.txn.Update(br.Txn)
		if justCommitted {
			return roachpb.NewError(tc.pushParentUponCommitLocked(ctx))

It seems like this should live near maybeSleepForLinearizable, maybe in some method that handles post-commit actions. And commitReadOnlyTxnLocked should also call this post-commit hook.

pkg/kv/kvclient/kvcoord/txn_coord_sender_test.go, line 2415 at r28 (raw file):

	require.NotNil(t, child.Parent)
	require.Equal(t, parent.ID, child.Parent.ID)
	require.Equal(t, parent.Name+" child", child.Name)

nit: are we ok with the recursive " child"s on grandchildren?

pkg/kv/kvserver/concurrency/lock_table.go, line 275 at r28 (raw file):

	ts      hlc.Timestamp
	spans   *spanset.SpanSet
	readTS  hlc.Timestamp

I think these are leftovers from a bad rebase. They were removed a few weeks go.

pkg/kv/kvserver/concurrency/lock_table.go, line 453 at r28 (raw file):

// ReadWrite requests due to descendants seeking to acquire locks over ancestors
// will result in an assertion failure error. This is consistent with the
// rules of child transactions which state that descendants my not acquire

"may"

pkg/kv/kvserver/concurrency/lock_table_test.go, line 55 at r28 (raw file):

 Creates a Transaction.

new-child-txn parent=<name> txn=<name> ts=<int>[,<int>] epoch=<int> [seq=<int>]

Feel free to ignore, but would it be easier and result in less duplicate code to add an optional parent field to new-txn?

pkg/kv/kvserver/txnwait/queue_test.go, line 491 at r28 (raw file):

}

// noopT implements assert.TestingT so that its tests can be used without

:expandingbrain:

pkg/roachpb/data.proto, line 410 at r28 (raw file):

  // STAGING to COMMITTED. Because of this, the set will only ever contain
  // entries when the transaction is STAGING. For more, see txnCommitter. It
  // may be the case that InFlightWrites is populated in a transaction which

Mind changing "in a transaction which" to "in a transaction object which"? This took me a second to understand.

pkg/roachpb/data.proto, line 431 at r28 (raw file):

  // parent with this metadata.
  //
  // TODO(ajwerner): come back and describe why we want a TxnMeta here as it

Do you intend to address this TODO here?

sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, and @sumeerbhola)

pkg/kv/kvserver/concurrency/concurrency_manager.go, line 268 at r6 (raw file):

Previously, ajwerner wrote…

Before this most recent revision, we were calling AddDiscoveredLock. I don't think it's a correctness problem either way, right?

Yes, its harmless to call AddDiscoveredLock even if this request is never going to go and wait at that lock.

pkg/kv/kvserver/concurrency/lock_table.go, line 454 at r28 (raw file):

Previously, ajwerner wrote…

We detect the descendant writing over an ancestor's read at commit time by comparing the writes of the descendant to the read spans of its ancestors. All of this is just a protection against programming errors so I'd like it to be minimally disruptive.

Are you saying we don't actually need any changes to lock_table.go other than making the parent txn appear equivalent to the child wrt the child not waiting on the parent's locks? That is, is my supposition that the detection of bad behavior via parentIntentError, actually sufficient?

ajwerner

Seems like the sticking point here is going to be the lock table changes. I don't feel strongly either way. I think the error propagation changes in the PR are sort of ugly, but they do at least make some of the interactions more explicit. Until we offer lock strengths weaker than exclusive, I think just letting children pass through under locks held by ancestors works well. If the ancestor were to hold a shared lock, I think the interactions get less clear.

Dismissed @sumeerbhola from a discussion.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you get a chance to confirm that the SQL layer is handling this correctly?

Added a test that shows that it behaves properly.