-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: write to system.rangelog async #102813
Conversation
7583723
to
bba6c94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 9 files at r2, 10 of 10 files at r3, 10 of 10 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva)
pkg/kv/txn.go
line 1088 at r3 (raw file):
br, pErr := txn.db.sendUsingSender(ctx, ba, sender) // Invoking the commit triggers here ensures they run even in the case when a
nit: To avoid redundancy, we could move this check into the if pErr == nil
branch below.
pkg/kv/kvserver/range_log.go
line 234 at r4 (raw file):
asyncLogFn := func(ctx context.Context) { const perAttemptTimeout time.Duration = 5 * time.Second
nit: time.Second
is typed as a time.Duration
, so this can be const perAttemptTimeout = 5 * time.Second
.
pkg/kv/kvserver/range_log.go
line 237 at r4 (raw file):
const maxAttempts = 5 retryOpts := base.DefaultRetryOptions() retryOpts.Closer = ctx.Done()
Do we want to attach the caller's context to this retryOpts
, even though the caller is async?
pkg/kv/kvserver/range_log.go
line 243 at r4 (raw file):
context.Background(), "rangelog-async", func(ctx context.Context) { // Stop writing when the server shuts down. ctx, stopCancel := stopper.WithCancelOnQuiesce(ctx)
Should we lift this out of the async task and then transfer the logtags over from the caller like:
ctx := logtags.AddTags(context.Background(), logtags.FromContext(ctx))
pkg/kv/kvserver/replica_command.go
line 214 at r4 (raw file):
// Log the split into the range event log. if err := store.logSplit(ctx, txn, *leftDesc, *rightDesc, reason, true); err != nil {
Here and below: true /* logAsync */
.
pkg/kv/kvserver/replica_command.go
line 214 at r4 (raw file):
// Log the split into the range event log. if err := store.logSplit(ctx, txn, *leftDesc, *rightDesc, reason, true); err != nil {
Was the plan to make this configurable with a cluster setting?
pkg/kv/txn_test.go
line 819 at r3 (raw file):
endTxnFn: func(txn *Txn) error { b := txn.NewBatch() b.AddRawRequest(&kvpb.EndTxnRequest{})
nit: {Commit: false}
to make this case explicit.
pkg/kv/kvserver/range_log_test.go
line 429 at r4 (raw file):
// logging is best-effort and the rest of the transaction commits. In the case // where logAsync is false, the failed logging forces the transaction to retry. func TestAsyncLogging(t *testing.T) {
Nice test!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/range_log.go
line 237 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we want to attach the caller's context to this
retryOpts
, even though the caller is async?
Ok, I moved things around based on your next comment as well. I think now we have:
WithCancelOnQuiesce(ctx)
based on the original caller context.- A new
asyncCtx
inasyncLogFn
with the logtags from the original context transferred over. asyncContext
is used everywhere within the async task, including in theretryOpts
.
pkg/kv/kvserver/replica_command.go
line 214 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Was the plan to make this configurable with a cluster setting?
It wasn't part of the original plan but I think it really depends on when we want to be able to do transactional logging:
- Turn on transactional logging to help with debugging because the default async logging is failing for some reason. Then it seems like a cluster setting would make sense. Though it deviates a bit from the plan to have best-effort logging.
- Log transactionally for some code paths and async for others. The code in this PR is set up to do this by passing in the
logAsync
flag. - Anything else?
I don't know of any use cases for either 1 or 2. Maybe we make this decisions when we have a specific use case in mind?
Is there any general guidance on what should be a cluster setting? Presumably, we don't want to pollute the space of cluster settings too much to where customers are overwhelmed.
pkg/kv/kvserver/range_log_test.go
line 429 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Nice test!
Thanks, it didn't come easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I making a mess of revisions? I'm still figuring out reviewable with multiple commits and multiple revisions, and it doesn't look very easy to review. But maybe it's just me not clicking the right revisions.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first commit feels like it is a bug fix (for a long-standing bug), valuable beyond the scope of the motivating project for this PR.
IT seems to me it would be valuable to extract that 1st commit in a separate PR, link it to a relevant issue and include a release note.
Reviewed 12 of 12 files at r5, 10 of 10 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva and @nvanbenschoten)
pkg/kv/kvserver/rangelog/rangelog.go
line 90 at r6 (raw file):
) } log.KvDistribution.Infof(ctx, "logging RangeLogEvent %v to system.rangelog", event)
What is the purpose of this extra Info
call? The structured event should make its way to the relevant log channel already. Have you observed otherwise? It might be a bug in which case this is not the right fix.
pkg/kv/kvserver/rangelog/internal_executor_writer_test.go
line 78 at r6 (raw file):
txn, ok := runner.(*kv.Txn) if !ok { return errors.Errorf("No transaction provided")
nit: error messages don't start with a capital letter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can definitely separate out the first commit; is there a particular issue I can link it to? @nvanbenschoten do you know of any? If not I can probably create a new one.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)
pkg/kv/kvserver/rangelog/rangelog.go
line 90 at r6 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
What is the purpose of this extra
Info
call? The structured event should make its way to the relevant log channel already. Have you observed otherwise? It might be a bug in which case this is not the right fix.
Do you mean via the eventlog infrastructure, where a structured event is defined as an EventPayload
and logged to a specified channel? If so, don't think a RangeLogEvent
goes through that infra, and until now it wasn't logged anywhere other than in system.rangelog (i.e. not in any log channels).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva and @nvanbenschoten)
pkg/kv/kvserver/rangelog/rangelog.go
line 90 at r6 (raw file):
Previously, miraradeva (Mira Radeva) wrote…
Do you mean via the eventlog infrastructure, where a structured event is defined as an
EventPayload
and logged to a specified channel? If so, don't think aRangeLogEvent
goes through that infra, and until now it wasn't logged anywhere other than in system.rangelog (i.e. not in any log channels).
TIL. That is something we can improve in your PR. It will help in many ways.
We can embed kvserverpb.RangeLogEvent inside a new event type in util/log/eventpb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nvanbenschoten do you know of any?
No, I'm not aware of any issue for this. I agree that the first commit can be split out into a separate PR and quickly merged as a fix for a latent bug.
Reviewed 2 of 12 files at r5, 8 of 10 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva)
pkg/kv/kvserver/range_log.go
line 236 at r7 (raw file):
stopper := txn.DB().Context().Stopper // Stop writing when the server shuts down. _, stopCancel := stopper.WithCancelOnQuiesce(ctx)
This function returns a new context
which we need to use to be notified of the cancellation.
Also, shouldn't this be performed in the asyncLogFn
?
pkg/kv/kvserver/replica_command.go
line 214 at r4 (raw file):
Previously, miraradeva (Mira Radeva) wrote…
It wasn't part of the original plan but I think it really depends on when we want to be able to do transactional logging:
- Turn on transactional logging to help with debugging because the default async logging is failing for some reason. Then it seems like a cluster setting would make sense. Though it deviates a bit from the plan to have best-effort logging.
- Log transactionally for some code paths and async for others. The code in this PR is set up to do this by passing in the
logAsync
flag.- Anything else?
I don't know of any use cases for either 1 or 2. Maybe we make this decisions when we have a specific use case in mind?
Is there any general guidance on what should be a cluster setting? Presumably, we don't want to pollute the space of cluster settings too much to where customers are overwhelmed.
Cluster settings serve two main roles:
- they provide cluster-wide configurability to users
- they provide crdb debuggers with tools to manipulate a cluster during support cases
A cluster setting here would probably only serve this second role, so we would keep it private to avoid polluting the cluster setting space. But that raises the question of when the setting would be helpful to debuggers. If the default was to log synchronously with an option to log async, then I could envision scenarios where this might be useful to debuggers. However, since the default is async, I think I agree that there are few cases where a debugger would reach for this — if the async logging leads to a lost event, it's already lost by the time a debugger gets involved.
So I agree with you that we can forgo the cluster setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I removed the commit trigger changes and merged those in as a separate PR. In this PR, there are now 3 commits:
- The previous async writing to rangelog changes. Only one new change there, based on @nvanbenschoten 's comment to get the contexts right in the async function.
- Tased on @knz 's comment for using structured events, I pulled
RangeLogEvent
into a new package and bazel target, so referring to it inutil/log/eventpb
doesn't introduce a dependency tokv/kvserver/kvserverpb
. This mostly worked out but there is still an indirect dependency fromutil/log/eventpb
toroachpb
(that's where RangeDescriptor lives). This commit builds successfully, though it's a pretty sprawling change, and I'm not 100% convinced it's the right way to go given the remaining dependency. - The last commit is WIP. I tried adding the new structured event that references the newly moved
rangelogpb.RangeLogEvent
, but I'm running into an error when generating the eventpb autogen files:
ERROR: template: json_encode_go:167:6: executing "json_encode_go" at <error .FieldType>: error calling error: template error: cockroach.kv.rangelog.rangelogpb.RangeLogEvent
I'm happy to try to get this all to work but I'm also mindful of whether it's worth the large change and added dependency.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)
pkg/kv/kvserver/rangelog/rangelog.go
line 90 at r6 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
TIL. That is something we can improve in your PR. It will help in many ways.
We can embed kvserverpb.RangeLogEvent inside a new event type in util/log/eventpb.
I gave this a try, see my comment in the review discussion.
3a798a0
to
b31388e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followup: I got the third commit to work as well.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with Raphael in slack, we decided to:
- Not log a structured event as part of this PR because the refactoring necessary to make that work is non-trivial.
- Not log an unstructured event either because we don't have a good way to process unstructured events, and it would only give us false confidence. Since we're ok with best-effort logging, I think this should be ok until we find a better way to log structured events.
I created a new issue (#104075) to address those problems.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay on this. Just one more comment thread to resolve.
Thanks for discussing the structured logging decision with Raphael. Your decision SGTM.
Reviewed 7 of 41 files at r11, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @miraradeva)
pkg/kv/kvserver/range_log.go
line 237 at r4 (raw file):
WithCancelOnQuiesce(ctx)
based on the original caller context.
What does this accomplish? The asyncCtx
won't inherit this cancellation. Do we want to do this on the asyncCtx
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)
pkg/kv/kvserver/range_log.go
line 237 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
WithCancelOnQuiesce(ctx)
based on the original caller context.What does this accomplish? The
asyncCtx
won't inherit this cancellation. Do we want to do this on theasyncCtx
instead?
Yes, you're right. My understanding was that asyncCtx
would inherit the cancellation as part of AddTags
on the next line, but after reading through the context code, I see cancellations are stored separately from tags. I'm still a little confused on the idiomatic way to use contexts but I'll go through the coding guide section on this again. If you have any other good reading material with examples, send it my way.
Previously, writing to the system.rangelog table was done as part of the transaction that triggered the range change (e.g. split, merge, rebalance). If the logging failed for some reason (e.g. JSON being logged was too large), the entire transaction needed to be retried. This has caused at least one major incident. This change introduces the option to write to system.rangelog async, and not as part of the original transaction; this option is also now used by default for all writes to system.rangelog. When logging async, the actual write to system.rangelog is done in an async task executed as a commit trigger after the original transaction ends. Epic: CRDB-16517 Fixes: cockroachdb#82538 Informs: cockroachdb#104075 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 41 files at r11, 1 of 1 files at r13, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz)
bors r=nvanbenschoten |
Build succeeded: |
Previously, writing to the system.rangelog table was done as part of the
transaction that triggered the range change (e.g. split, merge,
rebalance). If the logging failed for some reason (e.g. JSON being
logged was too large), the entire transaction needed to be retried.
This has caused at least one major incident.
This change introduces the option to write to system.rangelog async, and
not as part of the original transaction; this option is also now used
by default for all writes to system.rangelog. When logging async, the
actual write to system.rangelog is done in an async task executed as a
commit trigger after the original transaction ends.
Epic: CRDB-16517
Fixes: #82538
Informs: #104075
Release note: None