Skip to content

Commit

Permalink
kvserver: introduce ProbeRequest
Browse files Browse the repository at this point in the history
This commit introduces a new KV request type `ProbeRequest`. This
request performs a mutationless round-trip through the replication
layer. The proximate reason to add this request is #33007, where we want
to fail-fast requests on a Replica until a `ProbeRequest` succeeds.

The alternative to introducing a new request type is using a `Put` on
`keys.RangeProbeKey` (a key introduced for [kvprober]). However, when
the Replica circuit breakers perform a probe, they have to make sure
that this probe itself doesn't get caught up in the circuit breakers. In
particular, the circuit breaker's request needs special casing with
respect to lease requests (which would otherwise also bounce on the
circuit breaker). But even with that added (not too large a change),
the lease would be required, so we wouldn't be able to apply the
circuit breakers anywhere but the leaseholder (i.e. we would have
to heal the breaker when the lease is "not obtainable". What we are
after is really a per-Replica concept that is disjoint from the lease,
where we check that the replica can manage to get a command into the
log and observe it coming out of the replication layer.

`ProbeRequest` bypasses lease checks and can thus be processed by any
Replica. Upon application, they are guaranteed to result in a forced
error, and we suppress this error above the replication layer (i.e. at
the waiting request). This means that receiving a nil error from the
probe implies that the probe successfully made it through latching
(where it doesn't declare any key, i.e. should not conflict with long-
evaluating requests), evaluation, and application.

While no concrete plans exist for adoption of `ProbeRequest` elsewhere,
this might be of interest for [kvprober] and as a general building block
for quickly assessing overall KV health; for example we may extend #72732
to facilitate sending a probe to the leaseholders of all Ranges in the
system.

`ProbeRequest` is a point request. A ranged version is possible but
since `DistSender` will not return partial responses, it is not
introduced at this point. This can be done when a concrete need
arises.

`ProbeRequest` by default will be routed to the leaseholder, but it
can also be used with the `NEAREST` routing policy.

Some details helpful for reviewers follow.

- to enable testing of this new request type, `TestingApplyFilter` was
  augmented with a `TestingApplyForcedErrFilter` which gets to inspect
  and mutate forced errors below raft. The semantics of
  `TestingApplyFilter` remain unchainged; the newly added field
  `ForcedErr` is always nil when handed to such a filter.
- `ProbeRequest` is the first request type that skips the lease check
  and is not itself a lease check. This prompted a small refactor to
  `GetPrevLeaseForLeaseRequest`, which can now return false when the
  underlying request is a `Probe` and not `{Transfer,Request}Lease`.

[kvprober]: https://github.com/cockroachdb/cockroach/tree/2ad2bee257e78970ce2c457ddd6996099ed6727a/pkg/kv/kvprober
[routing policy]: https://github.com/cockroachdb/cockroach/blob/e1c7d6a55ed147aeb20d6d5c564d3d9bf73d2624/pkg/roachpb/api.proto#L48-L60

Release note: None
  • Loading branch information
tbg committed Dec 9, 2021
1 parent 3bed3a6 commit afb50d6
Show file tree
Hide file tree
Showing 21 changed files with 1,748 additions and 897 deletions.
1 change: 1 addition & 0 deletions pkg/kv/kvserver/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,7 @@ go_test(
"replica_learner_test.go",
"replica_lease_renewal_test.go",
"replica_metrics_test.go",
"replica_probe_test.go",
"replica_proposal_buf_test.go",
"replica_protected_timestamp_test.go",
"replica_raft_test.go",
Expand Down
1 change: 1 addition & 0 deletions pkg/kv/kvserver/batcheval/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ go_library(
"cmd_lease_transfer.go",
"cmd_merge.go",
"cmd_migrate.go",
"cmd_probe.go",
"cmd_push_txn.go",
"cmd_put.go",
"cmd_query_intent.go",
Expand Down
49 changes: 49 additions & 0 deletions pkg/kv/kvserver/batcheval/cmd_probe.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
// Copyright 2021 The Cockroach Authors.
//
// Use of this software is governed by the Business Source License
// included in the file licenses/BSL.txt.
//
// As of the Change Date specified in that file, in accordance with
// the Business Source License, use of this software will be governed
// by the Apache License, Version 2.0, included in the file
// licenses/APL.txt.

package batcheval

import (
"context"

"github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/result"
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverpb"
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset"
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/storage"
)

func declareKeysProbe(
_ ImmutableRangeState, _ roachpb.Header, _ roachpb.Request, _, _ *spanset.SpanSet,
) {
// Declare no keys. This means that we're not even serializing with splits
// (i.e. a probe could be directed at a key that will become the right-hand
// side of the split, and the split races ahead of the probe though the probe
// will still execute on the left-hand side). This is acceptable; we want the
// probe to bypass as much of the above-raft machinery as possible so that it
// gives us a signal on the replication layer alone.
}

func init() {
RegisterReadWriteCommand(roachpb.Probe, declareKeysProbe, Probe)
}

// Probe causes an effectless round-trip through the replication layer,
// i.e. it is a write that does not change any kv pair. It declares a
// write on the targeted key (but no lock).
func Probe(
ctx context.Context, readWriter storage.ReadWriter, cArgs CommandArgs, resp roachpb.Response,
) (result.Result, error) {
return result.Result{
Replicated: kvserverpb.ReplicatedEvalResult{
IsProbe: true,
},
}, nil
}
5 changes: 5 additions & 0 deletions pkg/kv/kvserver/batcheval/declare_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@ func TestRequestsSerializeWithAllKeys(t *testing.T) {
declareAllKeys(&allLatchSpans)

for method, command := range cmds {
if method == roachpb.Probe {
// Probe is special since it's a no-op round-trip through the replication
// layer. It does not declare any keys.
continue
}
t.Run(method.String(), func(t *testing.T) {
var otherLatchSpans, otherLockSpans spanset.SpanSet

Expand Down
5 changes: 5 additions & 0 deletions pkg/kv/kvserver/batcheval/result/result.go
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,11 @@ func (p *Result) MergeAndDestroy(q Result) error {
}
q.Replicated.PriorReadSummary = nil

if !p.Replicated.IsProbe {
p.Replicated.IsProbe = q.Replicated.IsProbe
}
q.Replicated.IsProbe = false

if p.Local.EncounteredIntents == nil {
p.Local.EncounteredIntents = q.Local.EncounteredIntents
} else {
Expand Down
16 changes: 7 additions & 9 deletions pkg/kv/kvserver/kvserverbase/base.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,11 @@ type ProposalFilterArgs struct {
// ApplyFilterArgs groups the arguments to a ReplicaApplyFilter.
type ApplyFilterArgs struct {
kvserverpb.ReplicatedEvalResult
CmdID CmdIDKey
RangeID roachpb.RangeID
StoreID roachpb.StoreID
Req *roachpb.BatchRequest // only set on the leaseholder
CmdID CmdIDKey
RangeID roachpb.RangeID
StoreID roachpb.StoreID
Req *roachpb.BatchRequest // only set on the leaseholder
ForcedError *roachpb.Error
}

// InRaftCmd returns true if the filter is running in the context of a Raft
Expand Down Expand Up @@ -110,11 +111,8 @@ type ReplicaCommandFilter func(args FilterArgs) *roachpb.Error
// from proposals after a request is evaluated but before it is proposed.
type ReplicaProposalFilter func(args ProposalFilterArgs) *roachpb.Error

// A ReplicaApplyFilter can be used in testing to influence the error returned
// from proposals after they apply. The returned int is treated as a
// storage.proposalReevaluationReason and will only take an effect when it is
// nonzero and the existing reason is zero. Similarly, the error is only applied
// if there's no error so far.
// A ReplicaApplyFilter is a testing hook into raft command application.
// See StoreTestingKnobs.
type ReplicaApplyFilter func(args ApplyFilterArgs) (int, *roachpb.Error)

// ReplicaResponseFilter is used in unittests to modify the outbound
Expand Down
Loading

0 comments on commit afb50d6

Please sign in to comment.