-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: add debug send-kv-batch
command to run arbitrary BatchRequest
#72732
Conversation
4078316
to
f31b9fe
Compare
debug send-kv-batch
command to run BatchRequest
sdebug send-kv-batch
command to run arbitrary BatchRequest
7ca1320
to
66e0052
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM and I'll defer to @knz about making sure this is documented and set up in the right way.
I did poke around a bit and ended up rewriting the test a bit, pushed this as a separate commit. Take it or leave it, fine either way.
Reviewed 15 of 15 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, @knz, and @tbg)
docs/generated/eventlog.md, line 168 at r1 (raw file):
| `Timestamp` | The timestamp of the event. Expressed as nanoseconds since the Unix epoch. | no | | `EventType` | The type of the event. | no | | `NodeID` | The node ID where the event was originated. | no |
where the event originated
pkg/cli/debug.go, line 1505 at r1 (raw file):
} var debugSendKVBatchCmd = &cobra.Command{
mind putting this and the impl + tests in a new files debug_send_kv_batch{,_test}.go
?
pkg/cli/debug.go, line 1511 at r1 (raw file):
RunE: runSendKVBatch, Long: ` Submits a JSON-formatted roachpb.BatchRequest, given as a file or via stdin,
The spaces at the beginning of each line shouldn't be there IMO
pkg/cli/debug_test.go, line 522 at r1 (raw file):
require.Contains(t, output, "ERROR: invalid JSON") // Unknown JSON field should error
.
pkg/server/admin.go, line 2627 at r1 (raw file):
} nodeID := int32(s.server.NodeID()) event := &eventpb.DebugSendKvBatch{
customname DebugSendKVBatch
?
pkg/server/admin.go, line 2644 at r1 (raw file):
}) if err != nil { log.Warningf(ctx, "failed to log SendKVBatch to event log, emitting below: %v", err)
I think structured events are unlikely to hit the same log as a Warning, so the "below" may not make sense. Maybe "emitting as structured event"?
pkg/util/log/eventpb/debug_events.proto, line 32 at r1 (raw file):
// CommonDebugEventDetails contains the fields common to all debugging events. message CommonDebugEventDetails { // The node ID where the event was originated.
-was
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, your test hit what I expect will be a common mistake: Protobuf structs cannot be (un)marshalled using the standard Golang JSON marshaller. You must use the JSON marshaller that's shipped with the Protobuf library, e.g. https://pkg.go.dev/github.com/gogo/protobuf/jsonpb.
I'll update the test to use this, with a big fat warning to anyone following its example.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @knz, and @tbg)
pkg/cli/debug.go, line 1505 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
mind putting this and the impl + tests in a new files
debug_send_kv_batch{,_test}.go
?
Yup, done.
pkg/cli/debug.go, line 1511 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
The spaces at the beginning of each line shouldn't be there IMO
Ah, my bad -- thought I'd seen other commands indent it.
pkg/server/admin.go, line 2627 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
customname
DebugSendKVBatch
?
I don't think customname
can be applied to messages, only fields. Unfortunately, we need the Kv
here to get the snake-cased event type generated correctly.
pkg/server/admin.go, line 2644 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I think structured events are unlikely to hit the same log as a Warning, so the "below" may not make sense. Maybe "emitting as structured event"?
Right, good catch.
f7d2859
to
9add527
Compare
98d0452
to
7769e60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 14 of 15 files at r1, 9 of 9 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, and @tbg)
pkg/server/admin.go, line 2637 at r1 (raw file):
BatchRequest: string(baJSON), } err = contextutil.RunWithTimeout(ctx, "log-event", 10*time.Second, func(ctx context.Context) error {
Can you explain to me the wisdom of using system.eventlog here, while we're generally trying to get away from using this table (also, observing that if one finds themselves in need to reach out to this debug facility, chances are this write may be impossible).
pkg/server/admin.go, line 2645 at r1 (raw file):
if err != nil { log.Warningf(ctx, "failed to log SendKVBatch to event log, emitting below: %v", err) log.StructuredEvent(ctx, event)
The structured event should be logged in any case.
I think you're relying on the idea that InsertEventRecord will also call log.StructuredEvent above? But then that needs to be clarified.
(I'd argue all this could be simplified by only calling log.StructuredEvent here.)
Also, the structured logging needs a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, @knz, and @tbg)
pkg/server/admin.go, line 2637 at r1 (raw file):
Previously, knz (kena) wrote…
Can you explain to me the wisdom of using system.eventlog here, while we're generally trying to get away from using this table (also, observing that if one finds themselves in need to reach out to this debug facility, chances are this write may be impossible).
I was under the impression that system.eventlog is part of our event/audit logging facilities. You seem to be saying that it's unnecessary or deprecated here, so I'll remove it.
7769e60
to
bbf5b56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @knz, and @tbg)
pkg/server/admin.go, line 2645 at r1 (raw file):
Previously, knz (kena) wrote…
The structured event should be logged in any case.
I think you're relying on the idea that InsertEventRecord will also call log.StructuredEvent above? But then that needs to be clarified.
(I'd argue all this could be simplified by only calling log.StructuredEvent here.)Also, the structured logging needs a test.
Done.
bbf5b56
to
d5c8af7
Compare
d5c8af7
to
4b62fc6
Compare
@knz Any further thoughts on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the tardiness!
Reviewed 18 of 18 files at r4, 17 of 17 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dt)
`jsonpb.Marshal` is used for JSON-marshaling of Protobuf structs in accordance with the Protobuf spec. Release note: None
This patch adds a CLI command `debug send-kv-batch` that takes a JSON `BatchRequest` as input, proxies it into KV via the connected node's admin server, and outputs a JSON `BatchResponse`. An admin user is required, typically the root user, and the request is logged to the system event log for auditability. This is a useful backdoor to access arbitrary KV functionality for debugging and surgery, e.g. during support escalations. Release note: None
4b62fc6
to
ee1624a
Compare
TFTRs! bors r=knz,tbg |
Btw, demo about this on next weekly? This is cool stuff.
(sent from mobile, please excuse my brevity)
…On Thu, Nov 25, 2021, 20:24 Erik Grinaker ***@***.***> wrote:
TFTRs!
bors r=knz,tbg
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#72732 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGXPZG6AUBLEN7OVJ5HLS3UN2EP7ANCNFSM5H77LVIQ>
.
|
Sure thing. It's pretty trivial tbh, but extremely useful. |
Build succeeded: |
This commit introduces a new KV request type `ProbeRequest`. This request performs a mutationless round-trip through the replication layer. The proximate reason to add this request is cockroachdb#33007, where we want to fail-fast requests on a Replica until a `ProbeRequest` succeeds. The alternative to introducing a new request type is using a a `Put` on `keys.RangeProbeKey` (a key introduced for [kvprober]). However, when the Replica circuit breakers perform a probe, they have to make sure that this probe itself doesn't get caught up in the circuit breakers. In particular, the circuit breaker's request needs special casing with respect to lease requests (which would otherwise also bounce on the circuit breaker). But even with that added (not too large a change), the lease would be required, so we wouldn't be able to apply the circuit breakers anywhere but the leaseholder (i.e. we would have to heal the breaker when the lease is "not obtainable". What we are after is really a per-Replica concept that is disjoint from the lease, where we check that the replica can manage to get a command into the log and observe it coming out of the replication layer. `ProbeRequest` bypasses lease checks and can thus be processed by any Replica. Upon application, they are guaranteed to result in a forced error, and we suppress this error above the replication layer (i.e. at the waiting request). This means that receiving a nil error from the probe implies that the probe successfully made it through latching (where it declares a read at its timestamp on the probed key), evaluation, and application. While no concrete plans exist for adoption of `ProbeRequest` elsewhere, this might be of interest for [kvprober] and as a general building block for quickly assessing overall KV health; for example we may extend cockroachdb#72732 to facilitate sending a probe to the leaseholders of all Ranges in the system. `ProbeRequest` is a point request. A ranged version is possible but since `DistSender` will not return partial responses, it is not introduced at this point. This can be done when a concrete need arises. `ProbeRequest` by default will be routed to the leaseholder, but it can also be used with the `NEAREST` routing policy. Some details helpful for reviewers follow. - to enable testing of this new request type, `TestingApplyFilter` was augmented with a `TestingApplyForcedErrFilter` which gets to inspect and mutate forced errors below raft. The semantics of `TestingApplyFitler` remain unchainged; the newly added field `ForcedErr` is always nil when handed to such a filter. - `ProbeRequest` is the first request type that skips the lease check and is not itself a lease check. This prompted a small refactor to `GetPrevLeaseForLeaseRequest`, which can now return false when the underlying request is a `Probe` and not `{Transfer,Request}Lease`. [kvprober]: https://github.com/cockroachdb/cockroach/tree/2ad2bee257e78970ce2c457ddd6996099ed6727a/pkg/kv/kvprober [routing policy]: https://github.com/cockroachdb/cockroach/blob/e1c7d6a55ed147aeb20d6d5c564d3d9bf73d2624/pkg/roachpb/api.proto#L48-L60 Release note: None
This commit introduces a new KV request type `ProbeRequest`. This request performs a mutationless round-trip through the replication layer. The proximate reason to add this request is cockroachdb#33007, where we want to fail-fast requests on a Replica until a `ProbeRequest` succeeds. The alternative to introducing a new request type is using a `Put` on `keys.RangeProbeKey` (a key introduced for [kvprober]). However, when the Replica circuit breakers perform a probe, they have to make sure that this probe itself doesn't get caught up in the circuit breakers. In particular, the circuit breaker's request needs special casing with respect to lease requests (which would otherwise also bounce on the circuit breaker). But even with that added (not too large a change), the lease would be required, so we wouldn't be able to apply the circuit breakers anywhere but the leaseholder (i.e. we would have to heal the breaker when the lease is "not obtainable". What we are after is really a per-Replica concept that is disjoint from the lease, where we check that the replica can manage to get a command into the log and observe it coming out of the replication layer. `ProbeRequest` bypasses lease checks and can thus be processed by any Replica. Upon application, they are guaranteed to result in a forced error, and we suppress this error above the replication layer (i.e. at the waiting request). This means that receiving a nil error from the probe implies that the probe successfully made it through latching (where it declares a read at its timestamp on the probed key), evaluation, and application. While no concrete plans exist for adoption of `ProbeRequest` elsewhere, this might be of interest for [kvprober] and as a general building block for quickly assessing overall KV health; for example we may extend cockroachdb#72732 to facilitate sending a probe to the leaseholders of all Ranges in the system. `ProbeRequest` is a point request. A ranged version is possible but since `DistSender` will not return partial responses, it is not introduced at this point. This can be done when a concrete need arises. `ProbeRequest` by default will be routed to the leaseholder, but it can also be used with the `NEAREST` routing policy. Some details helpful for reviewers follow. - to enable testing of this new request type, `TestingApplyFilter` was augmented with a `TestingApplyForcedErrFilter` which gets to inspect and mutate forced errors below raft. The semantics of `TestingApplyFilter` remain unchainged; the newly added field `ForcedErr` is always nil when handed to such a filter. - `ProbeRequest` is the first request type that skips the lease check and is not itself a lease check. This prompted a small refactor to `GetPrevLeaseForLeaseRequest`, which can now return false when the underlying request is a `Probe` and not `{Transfer,Request}Lease`. [kvprober]: https://github.com/cockroachdb/cockroach/tree/2ad2bee257e78970ce2c457ddd6996099ed6727a/pkg/kv/kvprober [routing policy]: https://github.com/cockroachdb/cockroach/blob/e1c7d6a55ed147aeb20d6d5c564d3d9bf73d2624/pkg/roachpb/api.proto#L48-L60 Release note: None
This commit introduces a new KV request type `ProbeRequest`. This request performs a mutationless round-trip through the replication layer. The proximate reason to add this request is cockroachdb#33007, where we want to fail-fast requests on a Replica until a `ProbeRequest` succeeds. The alternative to introducing a new request type is using a `Put` on `keys.RangeProbeKey` (a key introduced for [kvprober]). However, when the Replica circuit breakers perform a probe, they have to make sure that this probe itself doesn't get caught up in the circuit breakers. In particular, the circuit breaker's request needs special casing with respect to lease requests (which would otherwise also bounce on the circuit breaker). But even with that added (not too large a change), the lease would be required, so we wouldn't be able to apply the circuit breakers anywhere but the leaseholder (i.e. we would have to heal the breaker when the lease is "not obtainable". What we are after is really a per-Replica concept that is disjoint from the lease, where we check that the replica can manage to get a command into the log and observe it coming out of the replication layer. `ProbeRequest` bypasses lease checks and can thus be processed by any Replica. Upon application, they are guaranteed to result in a forced error, and we suppress this error above the replication layer (i.e. at the waiting request). This means that receiving a nil error from the probe implies that the probe successfully made it through latching (where it doesn't declare any key, i.e. should not conflict with long- evaluating requests), evaluation, and application. While no concrete plans exist for adoption of `ProbeRequest` elsewhere, this might be of interest for [kvprober] and as a general building block for quickly assessing overall KV health; for example we may extend cockroachdb#72732 to facilitate sending a probe to the leaseholders of all Ranges in the system. `ProbeRequest` is a point request. A ranged version is possible but since `DistSender` will not return partial responses, it is not introduced at this point. This can be done when a concrete need arises. `ProbeRequest` by default will be routed to the leaseholder, but it can also be used with the `NEAREST` routing policy. Some details helpful for reviewers follow. - to enable testing of this new request type, `TestingApplyFilter` was augmented with a `TestingApplyForcedErrFilter` which gets to inspect and mutate forced errors below raft. The semantics of `TestingApplyFilter` remain unchainged; the newly added field `ForcedErr` is always nil when handed to such a filter. - `ProbeRequest` is the first request type that skips the lease check and is not itself a lease check. This prompted a small refactor to `GetPrevLeaseForLeaseRequest`, which can now return false when the underlying request is a `Probe` and not `{Transfer,Request}Lease`. [kvprober]: https://github.com/cockroachdb/cockroach/tree/2ad2bee257e78970ce2c457ddd6996099ed6727a/pkg/kv/kvprober [routing policy]: https://github.com/cockroachdb/cockroach/blob/e1c7d6a55ed147aeb20d6d5c564d3d9bf73d2624/pkg/roachpb/api.proto#L48-L60 Release note: None
72972: kvserver: introduce ProbeRequest r=erikgrinaker a=tbg This commit introduces a new KV request type `ProbeRequest`. This request performs a mutationless round-trip through the replication layer. The proximate reason to add this request is #33007, where we want to fail-fast requests on a Replica until a `ProbeRequest` succeeds. The alternative to introducing a new request type is using a a `Put` on `keys.RangeProbeKey` (a key introduced for [kvprober]). However, when the Replica circuit breakers perform a probe, they have to make sure that this probe doesn't get caught up in the circuit breakers themselves. In particular, the circuit breaker's request needs special casing with respect to lease requests (which would otherwise also bounce on the circuit breaker), and in particular we would also like the probe to be proposed (via the local raft instance) when the local replica is not the leaseholder, as this allows the circuit breakers to cover more ground and ascertain that the Replica has access to the replication layer regardless of its status. `ProbeRequest` bypasses lease checks and can thus be processed by any Replica. Upon application, they are guaranteed to result in a forced error, and we suppress this error above the replication layer (i.e. at the waiting request). This means that receiving a nil error from the probe implies that the probe successfully made it through latching, evaluation, and application. While no concrete plans exist for adoption of `ProbeRequest` elsewhere, this might be of interest for [kvprober] and as a general building block for quickly assessing overall KV health; for example we may extend #72732 to facilitate sending a probe to the leaseholders of all Ranges in the system. `ProbeRequest` is a point request. A ranged version is possible but since `DistSender` will not return partial responses, it is not introduced at this point. This can be done when a concrete need arises. `ProbeRequest` by default will be routed to the leaseholder, but it can also be used with the `NEAREST` routing policy. [kvprober]: https://github.com/cockroachdb/cockroach/tree/2ad2bee257e78970ce2c457ddd6996099ed6727a/pkg/kv/kvprober [routing policy]: https://github.com/cockroachdb/cockroach/blob/e1c7d6a55ed147aeb20d6d5c564d3d9bf73d2624/pkg/roachpb/api.proto#L48-L60 Release note: None Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Tobias Grieger <[email protected]>
This commit introduces a new KV request type `ProbeRequest`. This request performs a mutationless round-trip through the replication layer. The proximate reason to add this request is cockroachdb#33007, where we want to fail-fast requests on a Replica until a `ProbeRequest` succeeds. The alternative to introducing a new request type is using a `Put` on `keys.RangeProbeKey` (a key introduced for [kvprober]). However, when the Replica circuit breakers perform a probe, they have to make sure that this probe itself doesn't get caught up in the circuit breakers. In particular, the circuit breaker's request needs special casing with respect to lease requests (which would otherwise also bounce on the circuit breaker). But even with that added (not too large a change), the lease would be required, so we wouldn't be able to apply the circuit breakers anywhere but the leaseholder (i.e. we would have to heal the breaker when the lease is "not obtainable". What we are after is really a per-Replica concept that is disjoint from the lease, where we check that the replica can manage to get a command into the log and observe it coming out of the replication layer. `ProbeRequest` bypasses lease checks and can thus be processed by any Replica. Upon application, they are guaranteed to result in a forced error, and we suppress this error above the replication layer (i.e. at the waiting request). This means that receiving a nil error from the probe implies that the probe successfully made it through latching (where it doesn't declare any key, i.e. should not conflict with long- evaluating requests), evaluation, and application. While no concrete plans exist for adoption of `ProbeRequest` elsewhere, this might be of interest for [kvprober] and as a general building block for quickly assessing overall KV health; for example we may extend cockroachdb#72732 to facilitate sending a probe to the leaseholders of all Ranges in the system. `ProbeRequest` is a point request. A ranged version is possible but since `DistSender` will not return partial responses, it is not introduced at this point. This can be done when a concrete need arises. `ProbeRequest` by default will be routed to the leaseholder, but it can also be used with the `NEAREST` routing policy. Some details helpful for reviewers follow. - to enable testing of this new request type, `TestingApplyFilter` was augmented with a `TestingApplyForcedErrFilter` which gets to inspect and mutate forced errors below raft. The semantics of `TestingApplyFilter` remain unchainged; the newly added field `ForcedErr` is always nil when handed to such a filter. - `ProbeRequest` is the first request type that skips the lease check and is not itself a lease check. This prompted a small refactor to `GetPrevLeaseForLeaseRequest`, which can now return false when the underlying request is a `Probe` and not `{Transfer,Request}Lease`. [kvprober]: https://github.com/cockroachdb/cockroach/tree/2ad2bee257e78970ce2c457ddd6996099ed6727a/pkg/kv/kvprober [routing policy]: https://github.com/cockroachdb/cockroach/blob/e1c7d6a55ed147aeb20d6d5c564d3d9bf73d2624/pkg/roachpb/api.proto#L48-L60 Release note: None
This patch adds a CLI command
debug send-kv-batch
that takes a JSONBatchRequest
as input, proxies it into KV via the connected node'sadmin server, and outputs a JSON
BatchResponse
. An admin user isrequired, typically the root user, and the request is logged to the
system event log for auditability.
This is a useful backdoor to access arbitrary KV functionality for
debugging and surgery, e.g. during support escalations.
Resolves #66829.
Resolves #67001.
Release note: None
@knz It does not seem like admin commands generally allow specifying a user (e.g. via
--user
), and implicitly require the root user, so I followed that convention here as well. Let me know if we need to extend this.