Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
62435: cli: make --global flag for demo public r=knz a=otan

See individual commits for details.

Refs: #62025

63971: kv: commit-wait before running commit triggers and resolving intents r=nvanbenschoten a=nvanbenschoten

Fixes a serious bug revealed by #63747.

This commit fixes a bug revealed by kvnemesis where a range-merge watcher on the
right-hand side of a range merge could incorrectly determine that a range merge
txn had succeeded, when in reality, it had failed. The watcher would then put
the RHS leaseholder replica into a stalled state by setting `r.mu.destroyStatus`
to `destroyReasonMergePending`, effectively stalling any operation on the range
indefinitely.

The setup for this bug was that a range was operating with a `global_reads` zone
configuration attribute, so it was pushing all writers into the future. The
range was split and then rapidly merged back together. During the merge txn, a
range-merge watcher (see `maybeWatchForMergeLocked`) began monitoring the state
of the range merge txn. The problem was that at the time that the range merge
txn committed, neither the meta descriptor version written by the merge or even
the meta descriptor version written by the split were visible to the watcher's
follow-up query. Because the watcher read below the split txn's descriptor, it
came to the wrong conclusion about the merge.

It is interesting to think about what is going wrong here, because it's not
immediately obvious who is at fault. If a transaction has a commit timestamp in
the future of present time, it will need to commit-wait before acknowledging the
client. Typically, this is performed in the TxnCoordSender after the transaction
has committed and resolved its intents (see TxnCoordSender.maybeCommitWait). It
is safe to wait after a future-time transaction has committed and resolved
intents without compromising linearizability because the uncertainty interval of
concurrent and later readers ensures atomic visibility of the effects of the
transaction. In other words, all of the transaction's intents will become
visible and will remain visible at once, which is sometimes called "monotonic
reads". This is true even if the resolved intents are at a high enough timestamp
such that they are not visible to concurrent readers immediately after they are
resolved, but only become visible sometime during the writer's commit-wait
sleep. This property is central to the correctness of non-blocking transactions.
See: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20200811_non_blocking_txns.md

However, if a transaction has a commit trigger, the side-effects of the trigger
will go into effect immediately upon applying the corresponding Raft log entry.
This poses a problem, because we do not want part of a transaction's effects
(e.g. its commit trigger) to become visible to onlookers before the rest of its
effects do (e.g. its intent writes).

To avoid this problem, this commit adds special server-side logic to perform the
commit-wait stage of a transaction with a commit trigger early, before its
EndTxn evaluates and its commit trigger fires. This results in the transaction
waiting longer to commit, run its commit trigger, and resolve its intents, but
it is otherwise safe and effective.

Interestingly, this is quite similar to how Spanner handles its commit-wait rule:

> Before allowing any coordinator replica to apply the commit record, the
> coordinator leader waits until TT.after(s), so as to obey the commit-wait rule
> described in Section 4.1.2. Because the coordinator leader chose s based on
> TT.now().latest, and now waits until that timestamp is guaranteed to be in the
> past, the expected wait is at least 2 ∗ . This wait is typically overlapped with
> Paxos communication. After commit wait, the coordinator sends the commit
> timestamp to the client and all other participant leaders. Each participant
> leader logs the transaction’s outcome through Paxos. All participants apply at
> the same timestamp and then release locks.

Of course, the whole point of non-blocking transactions is that we release locks
early and use clocks (through uncertainty intervals + a reader-side commit-wait
rule) to enforce consistency, so we don't want to make this change for standard
transactions.

Before this change, I could hit the bug in about 5 minutes of stressing
kvnemesis on a roachprod cluster. After this change, I've been able to run
kvnemesis for a few hours without issue.

Release note (bug fix): Fixed a rare bug present in betas where rapid range
splits and merges on a GLOBAL table could lead to a stuck leaseholder replica.
The situation is no longer possible.

cc. @cockroachdb/kv 

64012: roachtest: update expected failures for pgjdbc r=RichardJCai a=rafiss

When it was easy, I also added the issue number that is causing the
failure.

fixes #63890 

Release note: None

Co-authored-by: Oliver Tan <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
  • Loading branch information
4 people committed Apr 27, 2021
4 parents f3286c1 + a9ace7f + 0abe89a + 79e8ebe commit 29305bf
Show file tree
Hide file tree
Showing 25 changed files with 1,773 additions and 1,402 deletions.
9 changes: 6 additions & 3 deletions pkg/cli/demo.go
Original file line number Diff line number Diff line change
Expand Up @@ -196,12 +196,12 @@ func checkDemoConfiguration(

// Make sure the number of nodes is valid.
if demoCtx.nodes <= 0 {
return nil, errors.Newf("--nodes has invalid value (expected positive, got %d)", demoCtx.nodes)
return nil, errors.Newf("--%s has invalid value (expected positive, got %d)", cliflags.DemoNodes.Name, demoCtx.nodes)
}

// If artificial latencies were requested, then the user cannot supply their own localities.
if demoCtx.simulateLatency && demoCtx.localities != nil {
return nil, errors.New("--global cannot be used with --demo-locality")
return nil, errors.Newf("--%s cannot be used with --%s", cliflags.Global.Name, cliflags.DemoNodeLocality.Name)
}

demoCtx.disableTelemetry = cluster.TelemetryOptOut()
Expand Down Expand Up @@ -301,7 +301,10 @@ func runDemo(cmd *cobra.Command, gen workload.Generator) (err error) {

if demoCtx.simulateLatency {
printfUnlessEmbedded(
"#\n# WARNING: the use of --%s is experimental. Some features may not work as expected.\n",
`# Communication between nodes will simulate real world latencies.
#
# WARNING: the use of --%s is experimental. Some features may not work as expected.
`,
cliflags.Global.Name,
)
}
Expand Down
107 changes: 107 additions & 0 deletions pkg/cli/demo_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,20 @@ package cli
import (
"context"
"fmt"
"io/ioutil"
"testing"
"time"

"github.com/cockroachdb/cockroach/pkg/base"
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/server"
"github.com/cockroachdb/cockroach/pkg/testutils/skip"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/stop"
"github.com/cockroachdb/cockroach/pkg/util/timeutil"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)

func TestTestServerArgsForTransientCluster(t *testing.T) {
Expand Down Expand Up @@ -113,3 +119,104 @@ func TestTestServerArgsForTransientCluster(t *testing.T) {
})
}
}

func TestTransientClusterSimulateLatencies(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

// This is slow under race as it starts a 9-node cluster which
// has a very high simulated latency between each node.
skip.UnderRace(t)

// Ensure flags are reset on start, and also make sure
// at the end of the test they are reset.
TestingReset()
defer TestingReset()

// Set up an empty 9-node cluster with simulated latencies.
demoCtx.simulateLatency = true
demoCtx.nodes = 9

certsDir, err := ioutil.TempDir("", "cli-demo-test")
require.NoError(t, err)

cleanupFunc := createTestCerts(certsDir)
defer func() {
if err := cleanupFunc(); err != nil {
t.Fatal(err)
}
}()

ctx := context.Background()

// Setup the transient cluster.
c := transientCluster{
stopper: stop.NewStopper(),
demoDir: certsDir,
stickyEngineRegistry: server.NewStickyInMemEnginesRegistry(),
}
// Stop the cluster when the test exits, including when it fails.
// This also calls the Stop() method on the stopper, and thus
// cancels everything controlled by the stopper.
defer c.cleanup(ctx)

// Also ensure the context gets canceled when the stopper
// terminates above.
ctx, _ = c.stopper.WithCancelOnQuiesce(ctx)

require.NoError(t, c.start(ctx, demoCmd, nil /* gen */))

for _, tc := range []struct {
desc string
nodeIdx int
region string
}{
{
desc: "from us-east1",
nodeIdx: 0,
region: "us-east1",
},
{
desc: "from us-west1",
nodeIdx: 3,
region: "us-west1",
},
{
desc: "from europe-west1",
nodeIdx: 6,
region: "europe-west1",
},
} {
t.Run(tc.desc, func(t *testing.T) {
url, err := c.getNetworkURLForServer(tc.nodeIdx, nil /* gen */, true /* includeAppName */)
require.NoError(t, err)
conn := makeSQLConn(url)
defer conn.Close()
// Find the maximum latency in the cluster from the current node.
var maxLatency time.Duration
for _, latencyMS := range regionToRegionToLatency[tc.region] {
if d := time.Duration(latencyMS) * time.Millisecond; d > maxLatency {
maxLatency = d
}
}

// Attempt to make a query that talks to every node.
// This should take at least maxLatency.
startTime := timeutil.Now()
_, _, err = runQuery(
conn,
makeQuery(`SHOW ALL CLUSTER QUERIES`),
false,
)
totalDuration := timeutil.Since(startTime)
require.NoError(t, err)
require.Truef(
t,
totalDuration >= maxLatency,
"expected duration at least %s, got %s",
maxLatency,
totalDuration,
)
})
}
}
2 changes: 0 additions & 2 deletions pkg/cli/flags.go
Original file line number Diff line number Diff line change
Expand Up @@ -766,9 +766,7 @@ func init() {
"For details, see: "+build.MakeIssueURL(53404))

boolFlag(f, &demoCtx.disableLicenseAcquisition, cliflags.DemoNoLicense)
// Mark the --global flag as hidden until we investigate it more.
boolFlag(f, &demoCtx.simulateLatency, cliflags.Global)
_ = f.MarkHidden(cliflags.Global.Name)
// The --empty flag is only valid for the top level demo command,
// so we use the regular flag set.
boolFlag(demoCmd.Flags(), &demoCtx.noExampleDatabase, cliflags.UseEmptyDatabase)
Expand Down
Loading

0 comments on commit 29305bf

Please sign in to comment.