Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: resolve only intents in rangefeed's own range from txnPushAttempt #66746

Merged

Conversation

nvanbenschoten
Copy link
Member

Fixes #66741.

Before this change, (*txnPushAttempt).pushOldTxns would resolve all of
a committed or aborted transaction's intents that it discovered during a
push, instead of just those in the current range.

This was unnecessary for the purposes of the rangefeed, which only needs
a pushed transaction's intents in its range resolved. Worse, it could
result in quadratic behavior in cases where a large transaction wrote
many intents across many ranges that each had an active rangefeed
processor. In such cases, the rangefeed processor on each of the ranges
that the transaction touched would try to resolve its intents across all
ranges that the transaction touched. This could lead to a herd of
redundant intent resolution, which was especially disruptive if the
transaction had exceeded its precise intent span tracking and degraded
to ranged intent resolution.

This commit fixes this issue by having each rangefeed processor resolve
only those intents that are within the bounds of its own range. This
avoids the quadratic behavior that, in its worst form, could create a
pileup of ranged intent resolution across an entire table and starve out
foreground traffic.

Release note (bug fix): Changefeeds no longer interact poorly with
large, abandoned transactions. It was previously possible for this
combination to result in a cascade of work during transaction cleanup
that could starve out foreground traffic.

/cc. @cockroachdb/kv

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@aliher1911 aliher1911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)


pkg/kv/kvserver/rangefeed/task.go, line 256 at r1 (raw file):

// range boundary will be returned.
//
// NOTE: a rangefeed Processor is only configured to watch the global keyspace

Does this note explain what is happening in filtering? I think it explains why we called this function with AsRawSpanWithNoLocals(), maybe it should belong there?


pkg/roachpb/data_test.go, line 1161 at r1 (raw file):

	}
	for _, test := range testData {
		require.Equal(t, test.expect, test.s1.Intersect(test.s2))

nit: Doc says that we'll get an invalid range if intersection is empty/impossible, but test actually expects to see implementation detail, not validity or the range. I'd rather have it as

if test.expect.Valid() {
		require.Equal(t, test.expect, test.s1.Intersect(test.s2))
} else {
		require.False(t, test.s1.Intersect(test.s2).Valid())
}

or something similar and have Span{} as expectation.

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me aside from the existing comments.

For other new people, I found this useful for understanding the bits about global vs local keys: https://github.com/cockroachdb/cockroach/blob/master/pkg/keys/doc.go

return Span{}
}

if len(s.EndKey) == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: An empty end key means this span contains a single key. Overlaps already has special code for the signle-key cases, so here we return whichever key is the single key. If they are both a single key, we know they are equal anyway so the order doesn't matter.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/rangefeedPushOld branch 2 times, most recently from 81c40c7 to e7a2d1f Compare June 23, 2021 15:03
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @aliher1911, @andreimatei, and @stevendanna)


pkg/kv/kvserver/rangefeed/task.go, line 256 at r1 (raw file):

Previously, aliher1911 (Oleg) wrote…

Does this note explain what is happening in filtering? I think it explains why we called this function with AsRawSpanWithNoLocals(), maybe it should belong there?

Good point. It wasn't mentioning the macro-level reason to do this. Done.


pkg/roachpb/data.go, line 2171 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Note to self: An empty end key means this span contains a single key. Overlaps already has special code for the signle-key cases, so here we return whichever key is the single key. If they are both a single key, we know they are equal anyway so the order doesn't matter.

Sounds like a useful comment too. Added.


pkg/roachpb/data_test.go, line 1161 at r1 (raw file):

Previously, aliher1911 (Oleg) wrote…

nit: Doc says that we'll get an invalid range if intersection is empty/impossible, but test actually expects to see implementation detail, not validity or the range. I'd rather have it as

if test.expect.Valid() {
		require.Equal(t, test.expect, test.s1.Intersect(test.s2))
} else {
		require.False(t, test.s1.Intersect(test.s2).Valid())
}

or something similar and have Span{} as expectation.

Done.

Copy link
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911, @andreimatei, @nvanbenschoten, and @stevendanna)


pkg/roachpb/data.go, line 2164 at r1 (raw file):

// Intersect returns the intersection of the key space covered by the two spans.
// If there is no intersection between the twp spans, an invalid span (see Valid)

two*

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/rangefeedPushOld branch from e7a2d1f to 427e840 Compare June 23, 2021 15:06
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @aliher1911, @andreimatei, and @stevendanna)


pkg/roachpb/data.go, line 2164 at r1 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

two*

Done.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 3 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @aliher1911, @nvanbenschoten, and @stevendanna)


pkg/kv/kvserver/rangefeed/task.go, line 228 at r2 (raw file):

			})

			// If the txn happens to have its LockSpans populated, then lets clean up

it'd be useful if this comment made a nod to the code above to explain why this is only an optimization; why the rangefeed was already unblocked even without resolving these intents.

Btw, is the code around here gonna change when the lock table learns how to do "virtual resolution" - will we be able to unblock the rangefeed before replicating the intent resolution?


pkg/kv/kvserver/rangefeed/task.go, line 268 at r2 (raw file):

// for a range. It is also only informed about logical operations on global keys
// (see OpLoggerBatch.logLogicalOp). So even if this transaction has LockSpans
// in the range's global and local keyspace, we only need to resolve those in

I think this comment about the local space should live in the caller. This function doesn't deal with local vs global keys.


pkg/kv/kvserver/rangefeed/task.go, line 270 at r2 (raw file):

// in the range's global and local keyspace, we only need to resolve those in
// the global keyspace.
func intentsInSpan(txn *roachpb.Transaction, bound roachpb.Span) []roachpb.LockUpdate {

nit: if you rename bound->span, it'll match the function name


pkg/kv/kvserver/rangefeed/task_test.go, line 343 at r2 (raw file):

		require.Equal(t, txn4LockSpans[0], intents[2].Span)
		require.Equal(t, txn4LockSpans[1], intents[3].Span)
		require.Equal(t, func() roachpb.Span {

consider adding a nod here to the lock span truncation

Copy link
Contributor

@aliher1911 aliher1911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @aliher1911, @nvanbenschoten, and @stevendanna)


pkg/roachpb/data_test.go, line 1134 at r2 (raw file):

		expect Span
	}{
		{sA, sA, sA},

Since you changed the assert below and build failed, maybe we can change expectations to Span{} for first invalid range cases as well?

Fixes cockroachdb#66741.

Before this change, `(*txnPushAttempt).pushOldTxns` would resolve all of
a committed or aborted transaction's intents that it discovered during a
push, instead of just those in the current range.

This was unnecessary for the purposes of the rangefeed, which only needs
a pushed transaction's intents in its range resolved. Worse, it could
result in quadratic behavior in cases where a large transaction wrote
many intents across many ranges that each had an active rangefeed
processor. In such cases, the rangefeed processor on each of the ranges
that the transaction touched would try to resolve its intents across all
ranges that the transaction touched. This could lead to a herd of
redundant intent resolution, which was especially disruptive if the
transaction had exceeded its precise intent span tracking and degraded
to ranged intent resolution.

This commit fixes this issue by having each rangefeed processor resolve
only those intents that are within the bounds of its own range. This
avoids the quadratic behavior that, in its worst form, could create a
pileup of ranged intent resolution across an entire table and starve out
foreground traffic.

Release note (bug fix): Changefeeds no longer interact poorly with
large, abandoned transactions. It was previously possible for this
combination to result in a cascade of work during transaction cleanup
that could starve out foreground traffic.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/rangefeedPushOld branch from 427e840 to 3e6b834 Compare June 23, 2021 16:55
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @aayushshah15, @aliher1911, @andreimatei, and @stevendanna)


pkg/kv/kvserver/rangefeed/task.go, line 228 at r2 (raw file):

it'd be useful if this comment made a nod to the code above to explain why this is only an optimization; why the rangefeed was already unblocked even without resolving these intents.

Done.

Btw, is the code around here gonna change when the lock table learns how to do "virtual resolution" - will we be able to unblock the rangefeed before replicating the intent resolution?

It depends on what we use to drive rangefeed updates. I'd imagine that we still want to drive it with the application of intent resolution, because that's the only thing that is replicated, so it allows us to handle things identically regardless of whether a rangefeed is attached to a leaseholder or to a follower.


pkg/kv/kvserver/rangefeed/task.go, line 268 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think this comment about the local space should live in the caller. This function doesn't deal with local vs global keys.

I considered it, but there are two callers, so I'd rather keep it centralized.


pkg/kv/kvserver/rangefeed/task.go, line 270 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: if you rename bound->span, it'll match the function name

Done.


pkg/kv/kvserver/rangefeed/task_test.go, line 343 at r2 (raw file):

// truncated at beginning

I figured this was enough of a nod.


pkg/roachpb/data_test.go, line 1134 at r2 (raw file):

Previously, aliher1911 (Oleg) wrote…

Since you changed the assert below and build failed, maybe we can change expectations to Span{} for first invalid range cases as well?

The build flake wasn't due to this test, it was due to a flake in sql's TestGenerateParse. So unrelated. This test is passing.

@nvanbenschoten
Copy link
Member Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 23, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv: txnPushAttempt should only resolve intents in current range
6 participants