storage: unexpected Raft re-proposals during split transaction #10160

spencerkimball · 2016-10-23T17:07:47Z

@bdarnell this simple test (just add it to the pkg/sql directory) exhibits behavior I'm not understanding. I have a triplicated cluster and create a table. I then wait for the table to be split along the expected boundary. Most times I run it, it takes 3-5s waiting for Raft reproposals. I've done a fair bit of digging, and it's very consistent what happens. The lost Raft batch includes just the start of the txn, which adjust the LHS RangeDescriptor. After a few hours tracking it this far, I felt like it'd be more reasonable to turn this over to the expert.

Sometimes it takes about 10s to run (seems to be a race related to not adding to the split queue), and other times it takes 25s to run (not sure about that case as it's rare and I lost the logs).

package sql_test

import (
    "context"
    "testing"

    "github.com/cockroachdb/cockroach/pkg/base"
    "github.com/cockroachdb/cockroach/pkg/keys"
    "github.com/cockroachdb/cockroach/pkg/testutils/sqlutils"
    "github.com/cockroachdb/cockroach/pkg/testutils/testcluster"
    "github.com/cockroachdb/cockroach/pkg/util"
    "github.com/cockroachdb/cockroach/pkg/util/leaktest"
    "github.com/cockroachdb/cockroach/pkg/util/log"
    "github.com/pkg/errors"
)

func TestSlowSplit(t *testing.T) {
    defer leaktest.AfterTest(t)()

    // Create a command filter which prevents EndTransaction from succeeding.
    tableStartKey := keys.MakeTablePrefix(51 /* initial table ID */)
    testClusterArgs := base.TestClusterArgs{
        ReplicationMode: base.ReplicationAuto,
    }
    tc := testcluster.StartTestCluster(t, 3, testClusterArgs)
    defer tc.Stopper().Stop()
    if err := tc.WaitForFullReplication(); err != nil {
        t.Error(err)
    }

    sqlDB := sqlutils.MakeSQLRunner(t, tc.Conns[0])

    _ = sqlDB.Exec(`CREATE DATABASE test`)
    _ = sqlDB.Exec(`CREATE TABLE test.t (k SERIAL PRIMARY KEY, v INT)`)
    log.Infof(context.TODO(), "created table")

    // Wait for new table to split.
    util.SucceedsSoon(t, func() error {
        desc, err := tc.LookupRange(keys.MakeRowSentinelKey(tableStartKey))
        if err != nil {
            t.Fatal(err)
        }
        if !desc.StartKey.Equal(tableStartKey) {
            log.Infof(context.TODO(), "waiting on split results")
            return errors.Errorf("expected range start key %s; got %s", tableStartKey, desc.StartKey)
        }
        return nil
    })
}

The text was updated successfully, but these errors were encountered:

tbg · 2016-10-23T17:28:02Z

You've ruled out WaitForFullReplication as the culprit, right? That thing is slow.

petermattis · 2016-10-23T17:42:56Z

The time is spent in the util.SuceedsSoon. I'll take a look.

petermattis · 2016-10-23T17:55:07Z

Looks like we're splitting r7 off from r6 before the Raft group for r6 has initialized which is causing the proposal for the split to not go through initially. I would have expected eager campaigning of r6 to take effect here. Still digging into what is going wrong.

petermattis · 2016-10-23T18:07:25Z

Well, one part of the problem is definitely due to not campaigning the idle r6 replica. I can force that campaigning to happen and the test time in the good case drops to <0.5s

petermattis · 2016-10-23T18:59:46Z

Eager campaigning of idle replicas is not being performed because the initial store used for bootstrapping the cluster is only used temporarily in node.go:bootstrapCluster and then another store is created. Thus, the store on node 1 used by the test cluster never has Store.Bootstrap called and so we don't set Store.idleReplicaElectionTime.at to the bootstrap time.

Extend the change in cockroachdb#9550 to allow eagerly campaigning replicas after startup to include clusters created via TestCluster. Fixes cockroachdb#10160

spencerkimball · 2016-10-26T13:46:12Z

@petermattis I'm reopening this issue. Recall in the original description that we sometimes experience 10s delays due to split queue mishaps. This is the cause of @tamird's issue #10184.

What's happening is as follows:

Splits for the first range [M{in-ax}] at keys [/Table/11/0 /Table/12/0 /Table/13/0 /Table/14/0] go into the split queue.
Within the split queue:
- Split proceeds at /Table/11/0
- Split proceeds at /Table/12/0
- Split proceeds at /Table/13/0
- Split proceeds at /Table/14/0
Splits for the range /Table/14-Max at keys [/Table/50/0] go into split queue
- Split starts at /Table/50/0
Table 51 gets created and the config is gossiped, causing it to be added to split queue, but the replica is already active, so it's ignored
Split finishes at /Table/50/0, replica exits split queue

So then we need to wait for the next scan interval for the range to be reconsidered for splitting.

petermattis · 2016-10-26T14:09:44Z

The unexpected reproposals were fixed. Now to fix the slow splits.

After splitting a range, further split work might be required if the zone config changed or a table was created concurrently or if the range was much larger than the target size. Before this change, TestSplitAtTableBoundary had a wide variance in how long it would take. Sometimes it would completely in less than a second while other times it would take 10-20 seconds. With this change it reliably completes in less than a second. Fixes cockroachdb#10160

tamird · 2016-10-26T15:55:30Z

Is this also the same as #9624, #9673, and (the current blocker in) #8057?

After splitting a range, further split work might be required if the zone config changed or a table was created concurrently or if the range was much larger than the target size. Before this change, TestSplitAtTableBoundary had a wide variance in how long it would take. Sometimes it would completely in less than a second while other times it would take 10-20 seconds. With this change it reliably completes in less than a second. Fixes cockroachdb#10160

petermattis · 2016-10-26T16:16:51Z

This doesn't help #9624. That seems to be a different problem. I'll take a look.

spencerkimball assigned bdarnell Oct 23, 2016

petermattis mentioned this issue Oct 23, 2016

server,storage: eager campaign after cluster bootstrap #10162

Merged

petermattis closed this as completed in #10162 Oct 24, 2016

spencerkimball reopened this Oct 26, 2016

spencerkimball mentioned this issue Oct 26, 2016

sql: TestAmbiguousCommit is susceptible to deadlock #10184

Closed

petermattis changed the title ~~Unexpected Raft re-proposals during split transaction~~ storage: unexpected Raft re-proposals during split transaction Oct 26, 2016

petermattis assigned petermattis and unassigned bdarnell Oct 26, 2016

petermattis mentioned this issue Oct 26, 2016

storage: check for additional split work after a split #10232

Merged

petermattis closed this as completed in #10232 Oct 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: unexpected Raft re-proposals during split transaction #10160

storage: unexpected Raft re-proposals during split transaction #10160

spencerkimball commented Oct 23, 2016

tbg commented Oct 23, 2016

petermattis commented Oct 23, 2016

petermattis commented Oct 23, 2016

petermattis commented Oct 23, 2016

petermattis commented Oct 23, 2016

spencerkimball commented Oct 26, 2016

petermattis commented Oct 26, 2016

tamird commented Oct 26, 2016

petermattis commented Oct 26, 2016

storage: unexpected Raft re-proposals during split transaction #10160

storage: unexpected Raft re-proposals during split transaction #10160

Comments

spencerkimball commented Oct 23, 2016

tbg commented Oct 23, 2016

petermattis commented Oct 23, 2016

petermattis commented Oct 23, 2016

petermattis commented Oct 23, 2016

petermattis commented Oct 23, 2016

spencerkimball commented Oct 26, 2016

petermattis commented Oct 26, 2016

tamird commented Oct 26, 2016

petermattis commented Oct 26, 2016