Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: unexpected Raft re-proposals during split transaction #10160

Closed
spencerkimball opened this issue Oct 23, 2016 · 9 comments
Closed

storage: unexpected Raft re-proposals during split transaction #10160

spencerkimball opened this issue Oct 23, 2016 · 9 comments
Assignees

Comments

@spencerkimball
Copy link
Member

@bdarnell this simple test (just add it to the pkg/sql directory) exhibits behavior I'm not understanding. I have a triplicated cluster and create a table. I then wait for the table to be split along the expected boundary. Most times I run it, it takes 3-5s waiting for Raft reproposals. I've done a fair bit of digging, and it's very consistent what happens. The lost Raft batch includes just the start of the txn, which adjust the LHS RangeDescriptor. After a few hours tracking it this far, I felt like it'd be more reasonable to turn this over to the expert.

Sometimes it takes about 10s to run (seems to be a race related to not adding to the split queue), and other times it takes 25s to run (not sure about that case as it's rare and I lost the logs).

package sql_test

import (
    "context"
    "testing"

    "github.com/cockroachdb/cockroach/pkg/base"
    "github.com/cockroachdb/cockroach/pkg/keys"
    "github.com/cockroachdb/cockroach/pkg/testutils/sqlutils"
    "github.com/cockroachdb/cockroach/pkg/testutils/testcluster"
    "github.com/cockroachdb/cockroach/pkg/util"
    "github.com/cockroachdb/cockroach/pkg/util/leaktest"
    "github.com/cockroachdb/cockroach/pkg/util/log"
    "github.com/pkg/errors"
)

func TestSlowSplit(t *testing.T) {
    defer leaktest.AfterTest(t)()

    // Create a command filter which prevents EndTransaction from succeeding.
    tableStartKey := keys.MakeTablePrefix(51 /* initial table ID */)
    testClusterArgs := base.TestClusterArgs{
        ReplicationMode: base.ReplicationAuto,
    }
    tc := testcluster.StartTestCluster(t, 3, testClusterArgs)
    defer tc.Stopper().Stop()
    if err := tc.WaitForFullReplication(); err != nil {
        t.Error(err)
    }

    sqlDB := sqlutils.MakeSQLRunner(t, tc.Conns[0])

    _ = sqlDB.Exec(`CREATE DATABASE test`)
    _ = sqlDB.Exec(`CREATE TABLE test.t (k SERIAL PRIMARY KEY, v INT)`)
    log.Infof(context.TODO(), "created table")

    // Wait for new table to split.
    util.SucceedsSoon(t, func() error {
        desc, err := tc.LookupRange(keys.MakeRowSentinelKey(tableStartKey))
        if err != nil {
            t.Fatal(err)
        }
        if !desc.StartKey.Equal(tableStartKey) {
            log.Infof(context.TODO(), "waiting on split results")
            return errors.Errorf("expected range start key %s; got %s", tableStartKey, desc.StartKey)
        }
        return nil
    })
}
@tbg
Copy link
Member

tbg commented Oct 23, 2016

You've ruled out WaitForFullReplication as the culprit, right? That thing is slow.

@petermattis
Copy link
Collaborator

The time is spent in the util.SuceedsSoon. I'll take a look.

@petermattis
Copy link
Collaborator

Looks like we're splitting r7 off from r6 before the Raft group for r6 has initialized which is causing the proposal for the split to not go through initially. I would have expected eager campaigning of r6 to take effect here. Still digging into what is going wrong.

@petermattis
Copy link
Collaborator

Well, one part of the problem is definitely due to not campaigning the idle r6 replica. I can force that campaigning to happen and the test time in the good case drops to <0.5s

@petermattis
Copy link
Collaborator

Eager campaigning of idle replicas is not being performed because the initial store used for bootstrapping the cluster is only used temporarily in node.go:bootstrapCluster and then another store is created. Thus, the store on node 1 used by the test cluster never has Store.Bootstrap called and so we don't set Store.idleReplicaElectionTime.at to the bootstrap time.

petermattis added a commit to petermattis/cockroach that referenced this issue Oct 23, 2016
Extend the change in cockroachdb#9550 to allow eagerly campaigning replicas after
startup to include clusters created via TestCluster.

Fixes cockroachdb#10160
petermattis added a commit to petermattis/cockroach that referenced this issue Oct 23, 2016
Extend the change in cockroachdb#9550 to allow eagerly campaigning replicas after
startup to include clusters created via TestCluster.

Fixes cockroachdb#10160
@spencerkimball
Copy link
Member Author

@petermattis I'm reopening this issue. Recall in the original description that we sometimes experience 10s delays due to split queue mishaps. This is the cause of @tamird's issue #10184.

What's happening is as follows:

  • Splits for the first range [M{in-ax}] at keys [/Table/11/0 /Table/12/0 /Table/13/0 /Table/14/0] go into the split queue.
  • Within the split queue:
    • Split proceeds at /Table/11/0
    • Split proceeds at /Table/12/0
    • Split proceeds at /Table/13/0
    • Split proceeds at /Table/14/0
  • Splits for the range /Table/14-Max at keys [/Table/50/0] go into split queue
    • Split starts at /Table/50/0
  • Table 51 gets created and the config is gossiped, causing it to be added to split queue, but the replica is already active, so it's ignored
  • Split finishes at /Table/50/0, replica exits split queue

So then we need to wait for the next scan interval for the range to be reconsidered for splitting.

@petermattis petermattis changed the title Unexpected Raft re-proposals during split transaction storage: unexpected Raft re-proposals during split transaction Oct 26, 2016
@petermattis petermattis assigned petermattis and unassigned bdarnell Oct 26, 2016
@petermattis
Copy link
Collaborator

The unexpected reproposals were fixed. Now to fix the slow splits.

petermattis added a commit to petermattis/cockroach that referenced this issue Oct 26, 2016
After splitting a range, further split work might be required if the
zone config changed or a table was created concurrently or if the range
was much larger than the target size.

Before this change, TestSplitAtTableBoundary had a wide variance in how
long it would take. Sometimes it would completely in less than a second
while other times it would take 10-20 seconds. With this change it
reliably completes in less than a second.

Fixes cockroachdb#10160
@tamird
Copy link
Contributor

tamird commented Oct 26, 2016

Is this also the same as #9624, #9673, and (the current blocker in) #8057?

petermattis added a commit to petermattis/cockroach that referenced this issue Oct 26, 2016
After splitting a range, further split work might be required if the
zone config changed or a table was created concurrently or if the range
was much larger than the target size.

Before this change, TestSplitAtTableBoundary had a wide variance in how
long it would take. Sometimes it would completely in less than a second
while other times it would take 10-20 seconds. With this change it
reliably completes in less than a second.

Fixes cockroachdb#10160
@petermattis
Copy link
Collaborator

This doesn't help #9624. That seems to be a different problem. I'll take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants