Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvcoord: merge budgets for in-flight writes and lock spans #66915

Merged
merged 3 commits into from
Jul 2, 2021

Conversation

andreimatei
Copy link
Contributor

Before this patch, the txnPipeliner maintained two different memory
budgets:

  1. kv.transaction.max_intents_bytes - a limit on a txn's lock spans
  2. kv.transaction.write_pipelining_max_outstanding_size - a limit on a
    txn's in-flight writes

Besides protecting memory usage, these guys also prevent the commit's
Raft command from becoming too big.

Having two budgets for very related things is unnecessary. In-flight
writes frequently turn into lock spans, and so thinking about how one of
the budget feeds into the other is confusing. The exhaustion of the
in-flight budget also had a hand in this, by turning in-flight writes
into locks immediately.

This patch makes write_pipelining_max_outstanding_bytes a no-op.
max_intent_bytes takes on also tracking in-flight writes. A request
whose async consensus writes would push this budget over the limit is
not allowed to perform async-consensus, on the argument that performing
sync writes is better because the locks from those writes can be
condensed.

This patch is also done with an eye towards offering an option to reject
transactions that are about to go over budget. Having a single budget to
think about makes that conceptually simpler.

Release note (general change): The setting
kv.transaction.write_pipelining_max_outstanding_size becomes a no-op.
Its function is folded into the kv.transaction.max_intents_bytes
setting.

cc @cockroachdb/kv

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei
Copy link
Contributor Author

The commit kvcoord: refactor: extract pipeliner function is #66908.

@andreimatei
Copy link
Contributor Author

One thing I've been thinking is that we might want to take this occasion to introduce a limit on the number of in-flight keys that we attach to a parallel-commit EndTxn, for stability purposes - an abandoned STAGING txn record forces conflicting txns to perform recovery, which consists of work on the order of record's in-flight writes. This seems like potentially a major unexpected burden on that other txn, externalized by the committing txn. Thoughts?

Copy link
Contributor

@aliher1911 aliher1911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean roughly that we will check limit in txn_interceptor_committer.canCommitInParallel and stop that if certain limit is exceeded? That makes sense to me as it will only touch large transactions that has less benefits from parallel commits.

Just some nits from my side regarding readability, otherwise looks good.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)


pkg/kv/kvclient/kvcoord/condensable_span_set.go, line 80 at r3 (raw file):

//
// maxBytes may be zero, or even negative. In that case, each range will be
// maximally condensed.

Since there's no difference between 0 and negative size maybe we can say maxBytes less than 1 will force maximum possible condencing? Otherwise I read it as 0 and negative have some difference.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 390 at r3 (raw file):

"cannot perform async consensus because memory budget limit exceeded"

pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 1424 at r3 (raw file):

// along range boundaries when they exceed the maximum intent bytes threshold.
//
// TODO(andrei): Merge this test into TestTxnPipelinerCondenseLockSpans2, which

Is it a long term TODO? I think it would be helpful to at least hightlight what is the difference between checks they make just in case it fails and someone needs to debug it.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I've been thinking is that we might want to take this occasion to introduce a limit on the number of in-flight keys that we attach to a parallel-commit EndTxn, for stability purposes - an abandoned STAGING txn record forces conflicting txns to perform recovery, which consists of work on the order of record's in-flight writes. This seems like potentially a major unexpected burden on that other txn, externalized by the committing txn. Thoughts?

I thought that's what kv.transaction.write_pipelining_max_batch_size did, but it looks like you're right that it's not. The limits in the pipeline do help bound the effective maximum because we'll only let the inFlightWriteSet get so large, but that doesn't help in the case of a single large batch with an EndTxn. So I agree that we should limit parallel commits to some maximum number of in-flight writes to avoid an unbounded fanout factor during txn recovery. I'm thinking somewhere around 1024 keys.

Does it mean roughly that we will check limit in txn_interceptor_committer.canCommitInParallel and stop that if certain limit is exceeded?

Yes, I think that's right. We'll compare len(et.InFlightWrites) against a new cluster setting.

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 4 of 4 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @andreimatei)


pkg/kv/kvclient/kvcoord/condensable_span_set.go, line 170 at r3 (raw file):

}

func (s *condensableSpanSet) asSortedSlice() []roachpb.Span {

Consider moving this to condensable_span_set_test.go with a comment that we'll want to make this more efficient if it ever needs to be used in production code.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 514 at r1 (raw file):

	ctx context.Context, ba roachpb.BatchRequest, br *roachpb.BatchResponse,
) {
	// After adding new writes to the lock footprint, check whether we need to

Did you mean to remove this?


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 251 at r2 (raw file):

	}

	ba.AsyncConsensus = tp.canUseAsyncConsensus(ba)

Want to rebase this second commit away?


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 35 at r3 (raw file):

	true,
)
var _ = settings.RegisterByteSizeSetting(

Can't we just delete this and add it to retiredSettings?


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 340 at r3 (raw file):

// canUseAsyncConsensus checks the conditions necessary for this batch to be
// allowed to set the AsyncConsensus flag.
func (tp *txnPipeliner) canUseAsyncConsensus(ctx context.Context, ba roachpb.BatchRequest) bool {

While we're here, should we short-circuit this method on if _, hasET := ba.GetArg(roachpb.EndTxn); hasET { return false }? Now that this is factored out, it seems like a valuable fast-path.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 390 at r3 (raw file):

Previously, aliher1911 (Oleg) wrote…
"cannot perform async consensus because memory budget limit exceeded"

Also, does v4 seem too low? I was thinking v2.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 81 at r3 (raw file):

//
// iter is the iterator to use for condensing the lock spans. It can be nil, in
// which case the pipeliner will panic if it ever needs to condense.

"needs to condense intent spans"


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 84 at r3 (raw file):

func makeMockTxnPipeliner(iter condensableSpanSetRangeIterator) (txnPipeliner, *mockLockedSender) {
	mockSender := &mockLockedSender{}
	metrics := MakeTxnMetrics(time.Hour)

nit: just inline these.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 583 at r3 (raw file):

	tp, mockSender := makeMockTxnPipeliner(nil /* iter */)

	// Disable write_pipelining_max_outstanding_size, and max_intents_bytes limits.

nit: no comma


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 970 at r3 (raw file):

// TestTxnPipelinerMaxInFlightSize tests that batches are not pipelined if doing
// so would push the memory used to track locks and in-flight writes over the
// limit allowed by the kv.transaction.max_intents_bytes.

nit: remove the period and re-wrap.

Lift a defer() into a wrapper function.

Release note: None
This patch short-circuits canUseAyncConsensus for EndTxn batches.

Release note: None
@andreimatei andreimatei force-pushed the intents.track-inflight-writes branch from 7ab7d96 to 4dc342e Compare July 1, 2021 18:55
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @nvanbenschoten)


pkg/kv/kvclient/kvcoord/condensable_span_set.go, line 80 at r3 (raw file):

Previously, aliher1911 (Oleg) wrote…

Since there's no difference between 0 and negative size maybe we can say maxBytes less than 1 will force maximum possible condencing? Otherwise I read it as 0 and negative have some difference.

done


pkg/kv/kvclient/kvcoord/condensable_span_set.go, line 170 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Consider moving this to condensable_span_set_test.go with a comment that we'll want to make this more efficient if it ever needs to be used in production code.

moved to a helpers_test.go


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 514 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you mean to remove this?

yes. I had done it in the wrong commit. Done.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 251 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Want to rebase this second commit away?

done


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 35 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can't we just delete this and add it to retiredSettings?

done


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 340 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

While we're here, should we short-circuit this method on if _, hasET := ba.GetArg(roachpb.EndTxn); hasET { return false }? Now that this is factored out, it seems like a valuable fast-path.

done in a separate commit


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go, line 390 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Also, does v4 seem too low? I was thinking v2.

all done


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 81 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"needs to condense intent spans"

done


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 84 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: just inline these.

I can't because I need to take their address.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 583 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: no comma

done


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 970 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: remove the period and re-wrap.

done


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go, line 1424 at r3 (raw file):

Is it a long term TODO?

More of a "never TODO" kind of thing, as most are :). It's only purpose is to discourage extensions to this one and point to the other one.

I think it would be helpful to at least hightlight what is the difference between checks they make just in case it fails and someone needs to debug it.

I don't know what to write exactly, cause I don't want to put the other one in a little box by describing what it currently has.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 6 of 6 files at r6.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911)

Before this patch, the txnPipeliner maintained two different memory
budgets:

1) kv.transaction.max_intents_bytes - a limit on a txn's lock spans
2) kv.transaction.write_pipelining_max_outstanding_size - a limit on a
   txn's in-flight writes

Besides protecting memory usage, these guys also prevent the commit's
Raft command from becoming too big.

Having two budgets for very related things is unnecessary. In-flight
writes frequently turn into lock spans, and so thinking about how one of
the budget feeds into the other is confusing. The exhaustion of the
in-flight budget also had a hand in this, by turning in-flight writes
into locks immediately.

This patch makes write_pipelining_max_outstanding_bytes a no-op.
max_intent_bytes takes on also tracking in-flight writes. A request
whose async consensus writes would push this budget over the limit is
not allowed to perform async-consensus, on the argument that performing
sync writes is better because the locks from those writes can be
condensed.

This patch is also done with an eye towards offering an option to reject
transactions that are about to go over budget. Having a single budget to
think about makes that conceptually simpler.

Release note (general change): The setting
`kv.transaction.write_pipelining_max_outstanding_size` becomes a no-op.
Its function is folded into the `kv.transaction.max_intents_bytes`
setting.
@andreimatei andreimatei force-pushed the intents.track-inflight-writes branch from 4dc342e to 0149495 Compare July 1, 2021 21:43
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

bors r+

One thing I've been thinking is that we might want to take this occasion to introduce a limit on the number of in-flight keys that we attach to a parallel-commit EndTxn, for stability purposes - an abandoned STAGING txn record forces conflicting txns to perform recovery, which consists of work on the order of record's in-flight writes. This seems like potentially a major unexpected burden on that other txn, externalized by the committing txn. Thoughts?

I thought that's what kv.transaction.write_pipelining_max_batch_size did, but it looks like you're right that it's not. The limits in the pipeline do help bound the effective maximum because we'll only let the inFlightWriteSet get so large, but that doesn't help in the case of a single large batch with an EndTxn. So I agree that we should limit parallel commits to some maximum number of in-flight writes to avoid an unbounded fanout factor during txn recovery. I'm thinking somewhere around 1024 keys.

Does it mean roughly that we will check limit in txn_interceptor_committer.canCommitInParallel and stop that if certain limit is exceeded?

Yes, I think that's right. We'll compare len(et.InFlightWrites) against a new cluster setting.

I'll send a separate PR for this.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aliher1911)

@craig
Copy link
Contributor

craig bot commented Jul 2, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants