Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: write to system.rangelog async #102813

Merged
merged 1 commit into from
Jun 7, 2023

Conversation

miraradeva
Copy link
Contributor

@miraradeva miraradeva commented May 5, 2023

Previously, writing to the system.rangelog table was done as part of the
transaction that triggered the range change (e.g. split, merge,
rebalance). If the logging failed for some reason (e.g. JSON being
logged was too large), the entire transaction needed to be retried.
This has caused at least one major incident.

This change introduces the option to write to system.rangelog async, and
not as part of the original transaction; this option is also now used
by default for all writes to system.rangelog. When logging async, the
actual write to system.rangelog is done in an async task executed as a
commit trigger after the original transaction ends.

Epic: CRDB-16517

Fixes: #82538

Informs: #104075

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@miraradeva miraradeva force-pushed the mira-82538 branch 6 times, most recently from 7583723 to bba6c94 Compare May 17, 2023 18:25
@miraradeva miraradeva changed the title kv: write to system.rangelog non-transactionally (WIP) kvserver: write to system.rangelog async May 17, 2023
@miraradeva miraradeva marked this pull request as ready for review May 18, 2023 15:08
@miraradeva miraradeva requested review from a team as code owners May 18, 2023 15:08
@miraradeva miraradeva requested a review from nvanbenschoten May 18, 2023 15:09
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 9 files at r2, 10 of 10 files at r3, 10 of 10 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva)


pkg/kv/txn.go line 1088 at r3 (raw file):

	br, pErr := txn.db.sendUsingSender(ctx, ba, sender)

	// Invoking the commit triggers here ensures they run even in the case when a

nit: To avoid redundancy, we could move this check into the if pErr == nil branch below.


pkg/kv/kvserver/range_log.go line 234 at r4 (raw file):

	asyncLogFn := func(ctx context.Context) {
		const perAttemptTimeout time.Duration = 5 * time.Second

nit: time.Second is typed as a time.Duration, so this can be const perAttemptTimeout = 5 * time.Second.


pkg/kv/kvserver/range_log.go line 237 at r4 (raw file):

		const maxAttempts = 5
		retryOpts := base.DefaultRetryOptions()
		retryOpts.Closer = ctx.Done()

Do we want to attach the caller's context to this retryOpts, even though the caller is async?


pkg/kv/kvserver/range_log.go line 243 at r4 (raw file):

			context.Background(), "rangelog-async", func(ctx context.Context) {
				// Stop writing when the server shuts down.
				ctx, stopCancel := stopper.WithCancelOnQuiesce(ctx)

Should we lift this out of the async task and then transfer the logtags over from the caller like:

ctx := logtags.AddTags(context.Background(), logtags.FromContext(ctx))

pkg/kv/kvserver/replica_command.go line 214 at r4 (raw file):

	// Log the split into the range event log.
	if err := store.logSplit(ctx, txn, *leftDesc, *rightDesc, reason, true); err != nil {

Here and below: true /* logAsync */.


pkg/kv/kvserver/replica_command.go line 214 at r4 (raw file):

	// Log the split into the range event log.
	if err := store.logSplit(ctx, txn, *leftDesc, *rightDesc, reason, true); err != nil {

Was the plan to make this configurable with a cluster setting?


pkg/kv/txn_test.go line 819 at r3 (raw file):

			endTxnFn: func(txn *Txn) error {
				b := txn.NewBatch()
				b.AddRawRequest(&kvpb.EndTxnRequest{})

nit: {Commit: false} to make this case explicit.


pkg/kv/kvserver/range_log_test.go line 429 at r4 (raw file):

// logging is best-effort and the rest of the transaction commits. In the case
// where logAsync is false, the failed logging forces the transaction to retry.
func TestAsyncLogging(t *testing.T) {

Nice test!

Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/range_log.go line 237 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we want to attach the caller's context to this retryOpts, even though the caller is async?

Ok, I moved things around based on your next comment as well. I think now we have:

  • WithCancelOnQuiesce(ctx) based on the original caller context.
  • A new asyncCtx in asyncLogFn with the logtags from the original context transferred over.
  • asyncContext is used everywhere within the async task, including in the retryOpts.

pkg/kv/kvserver/replica_command.go line 214 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Was the plan to make this configurable with a cluster setting?

It wasn't part of the original plan but I think it really depends on when we want to be able to do transactional logging:

  1. Turn on transactional logging to help with debugging because the default async logging is failing for some reason. Then it seems like a cluster setting would make sense. Though it deviates a bit from the plan to have best-effort logging.
  2. Log transactionally for some code paths and async for others. The code in this PR is set up to do this by passing in the logAsync flag.
  3. Anything else?

I don't know of any use cases for either 1 or 2. Maybe we make this decisions when we have a specific use case in mind?

Is there any general guidance on what should be a cluster setting? Presumably, we don't want to pollute the space of cluster settings too much to where customers are overwhelmed.


pkg/kv/kvserver/range_log_test.go line 429 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Nice test!

Thanks, it didn't come easy.

Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I making a mess of revisions? I'm still figuring out reviewable with multiple commits and multiple revisions, and it doesn't look very easy to review. But maybe it's just me not clicking the right revisions.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

@miraradeva miraradeva requested review from knz, nvanbenschoten and a team May 23, 2023 15:10
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit feels like it is a bug fix (for a long-standing bug), valuable beyond the scope of the motivating project for this PR.
IT seems to me it would be valuable to extract that 1st commit in a separate PR, link it to a relevant issue and include a release note.

Reviewed 12 of 12 files at r5, 10 of 10 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva and @nvanbenschoten)


pkg/kv/kvserver/rangelog/rangelog.go line 90 at r6 (raw file):

		)
	}
	log.KvDistribution.Infof(ctx, "logging RangeLogEvent %v to system.rangelog", event)

What is the purpose of this extra Info call? The structured event should make its way to the relevant log channel already. Have you observed otherwise? It might be a bug in which case this is not the right fix.


pkg/kv/kvserver/rangelog/internal_executor_writer_test.go line 78 at r6 (raw file):

	txn, ok := runner.(*kv.Txn)
	if !ok {
		return errors.Errorf("No transaction provided")

nit: error messages don't start with a capital letter.

Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can definitely separate out the first commit; is there a particular issue I can link it to? @nvanbenschoten do you know of any? If not I can probably create a new one.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)


pkg/kv/kvserver/rangelog/rangelog.go line 90 at r6 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

What is the purpose of this extra Info call? The structured event should make its way to the relevant log channel already. Have you observed otherwise? It might be a bug in which case this is not the right fix.

Do you mean via the eventlog infrastructure, where a structured event is defined as an EventPayload and logged to a specified channel? If so, don't think a RangeLogEvent goes through that infra, and until now it wasn't logged anywhere other than in system.rangelog (i.e. not in any log channels).

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva and @nvanbenschoten)


pkg/kv/kvserver/rangelog/rangelog.go line 90 at r6 (raw file):

Previously, miraradeva (Mira Radeva) wrote…

Do you mean via the eventlog infrastructure, where a structured event is defined as an EventPayload and logged to a specified channel? If so, don't think a RangeLogEvent goes through that infra, and until now it wasn't logged anywhere other than in system.rangelog (i.e. not in any log channels).

TIL. That is something we can improve in your PR. It will help in many ways.
We can embed kvserverpb.RangeLogEvent inside a new event type in util/log/eventpb.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvanbenschoten do you know of any?

No, I'm not aware of any issue for this. I agree that the first commit can be split out into a separate PR and quickly merged as a fix for a latent bug.

Reviewed 2 of 12 files at r5, 8 of 10 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva)


pkg/kv/kvserver/range_log.go line 236 at r7 (raw file):

	stopper := txn.DB().Context().Stopper
	// Stop writing when the server shuts down.
	_, stopCancel := stopper.WithCancelOnQuiesce(ctx)

This function returns a new context which we need to use to be notified of the cancellation.

Also, shouldn't this be performed in the asyncLogFn?


pkg/kv/kvserver/replica_command.go line 214 at r4 (raw file):

Previously, miraradeva (Mira Radeva) wrote…

It wasn't part of the original plan but I think it really depends on when we want to be able to do transactional logging:

  1. Turn on transactional logging to help with debugging because the default async logging is failing for some reason. Then it seems like a cluster setting would make sense. Though it deviates a bit from the plan to have best-effort logging.
  2. Log transactionally for some code paths and async for others. The code in this PR is set up to do this by passing in the logAsync flag.
  3. Anything else?

I don't know of any use cases for either 1 or 2. Maybe we make this decisions when we have a specific use case in mind?

Is there any general guidance on what should be a cluster setting? Presumably, we don't want to pollute the space of cluster settings too much to where customers are overwhelmed.

Cluster settings serve two main roles:

  1. they provide cluster-wide configurability to users
  2. they provide crdb debuggers with tools to manipulate a cluster during support cases

A cluster setting here would probably only serve this second role, so we would keep it private to avoid polluting the cluster setting space. But that raises the question of when the setting would be helpful to debuggers. If the default was to log synchronously with an option to log async, then I could envision scenarios where this might be useful to debuggers. However, since the default is async, I think I agree that there are few cases where a debugger would reach for this — if the async logging leads to a lost event, it's already lost by the time a debugger gets involved.

So I agree with you that we can forgo the cluster setting.

@miraradeva miraradeva requested a review from a team May 26, 2023 17:51
@miraradeva miraradeva requested a review from a team as a code owner May 26, 2023 17:51
@miraradeva miraradeva requested a review from a team May 26, 2023 17:51
@miraradeva miraradeva requested a review from a team as a code owner May 26, 2023 17:51
@miraradeva miraradeva requested a review from a team May 26, 2023 17:51
Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I removed the commit trigger changes and merged those in as a separate PR. In this PR, there are now 3 commits:

  1. The previous async writing to rangelog changes. Only one new change there, based on @nvanbenschoten 's comment to get the contexts right in the async function.
  2. Tased on @knz 's comment for using structured events, I pulled RangeLogEvent into a new package and bazel target, so referring to it in util/log/eventpb doesn't introduce a dependency to kv/kvserver/kvserverpb. This mostly worked out but there is still an indirect dependency from util/log/eventpb to roachpb (that's where RangeDescriptor lives). This commit builds successfully, though it's a pretty sprawling change, and I'm not 100% convinced it's the right way to go given the remaining dependency.
  3. The last commit is WIP. I tried adding the new structured event that references the newly moved rangelogpb.RangeLogEvent, but I'm running into an error when generating the eventpb autogen files:
ERROR: template: json_encode_go:167:6: executing "json_encode_go" at <error .FieldType>: error calling error: template error: cockroach.kv.rangelog.rangelogpb.RangeLogEvent

I'm happy to try to get this all to work but I'm also mindful of whether it's worth the large change and added dependency.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)

Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)


pkg/kv/kvserver/rangelog/rangelog.go line 90 at r6 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

TIL. That is something we can improve in your PR. It will help in many ways.
We can embed kvserverpb.RangeLogEvent inside a new event type in util/log/eventpb.

I gave this a try, see my comment in the review discussion.

@miraradeva miraradeva force-pushed the mira-82538 branch 2 times, most recently from 3a798a0 to b31388e Compare May 26, 2023 19:43
@miraradeva miraradeva changed the title kvserver: write to system.rangelog async kvserver,eventpb: write to system.rangelog async May 26, 2023
Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followup: I got the third commit to work as well.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)

@miraradeva miraradeva changed the title kvserver,eventpb: write to system.rangelog async kvserver: write to system.rangelog async May 30, 2023
Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with Raphael in slack, we decided to:

  1. Not log a structured event as part of this PR because the refactoring necessary to make that work is non-trivial.
  2. Not log an unstructured event either because we don't have a good way to process unstructured events, and it would only give us false confidence. Since we're ok with best-effort logging, I think this should be ok until we find a better way to log structured events.

I created a new issue (#104075) to address those problems.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay on this. Just one more comment thread to resolve.

Thanks for discussing the structured logging decision with Raphael. Your decision SGTM.

Reviewed 7 of 41 files at r11, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @miraradeva)


pkg/kv/kvserver/range_log.go line 237 at r4 (raw file):

WithCancelOnQuiesce(ctx) based on the original caller context.

What does this accomplish? The asyncCtx won't inherit this cancellation. Do we want to do this on the asyncCtx instead?

Copy link
Contributor Author

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)


pkg/kv/kvserver/range_log.go line 237 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

WithCancelOnQuiesce(ctx) based on the original caller context.

What does this accomplish? The asyncCtx won't inherit this cancellation. Do we want to do this on the asyncCtx instead?

Yes, you're right. My understanding was that asyncCtx would inherit the cancellation as part of AddTags on the next line, but after reading through the context code, I see cancellations are stored separately from tags. I'm still a little confused on the idiomatic way to use contexts but I'll go through the coding guide section on this again. If you have any other good reading material with examples, send it my way.

Previously, writing to the system.rangelog table was done as part of the
transaction that triggered the range change (e.g. split, merge,
rebalance). If the logging failed for some reason (e.g. JSON being
logged was too large), the entire transaction needed to be retried.
This has caused at least one major incident.

This change introduces the option to write to system.rangelog async, and
not as part of the original transaction; this option is also now used
by default for all writes to system.rangelog. When logging async, the
actual write to system.rangelog is done in an async task executed as a
commit trigger after the original transaction ends.

Epic: CRDB-16517

Fixes: cockroachdb#82538

Informs: cockroachdb#104075

Release note: None
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 41 files at r11, 1 of 1 files at r13, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @knz)

@miraradeva
Copy link
Contributor Author

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Jun 7, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv: decide whether to write rangelog events transactionally or not
4 participants