Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass in cell to EnsureVSchema and GetOrCreateShard #5930

Merged

Conversation

zmagg
Copy link
Contributor

@zmagg zmagg commented Mar 17, 2020

Description

Modifies EnsureVSchema so that we always pass in a cell and it rebuilds SrvVSchema only in the provided cell. This removes its auto-rebuild of all cells automatically when you're creating a new shard or keyspace.

Background

During a setup of a new region, we had a misconfigured topology and spinning up a new shard caused us to corrupt all cells in the topology, just as part of booting up a new tablet. For safety reasons, it seems better to decouple spinning up a new shard from rebuilding SrvVSchema on all cells. As talked about in person a few weeks ago--we'll keep the rebuild for the simple case where there's only one cell.

Signed-off-by: Maggie Zhou [email protected]

@zmagg zmagg requested a review from sougou as a code owner March 17, 2020 06:18
@zmagg zmagg force-pushed the zmagg_rebuild_srv_schema_no_multicell branch 2 times, most recently from 6109866 to 29b8e36 Compare March 19, 2020 17:56
@@ -82,10 +83,21 @@ func (ts *Server) EnsureVSchema(ctx context.Context, keyspace string) error {
}
}

err = ts.RebuildSrvVSchema(ctx, []string{} /* cells */)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we change EnsureVSchema to accept a cells list to rebuild, and we have each tablet rebuild only its own cell? Would that still have caused the problem you had?

Copy link
Contributor Author

@zmagg zmagg Mar 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I like that a lot more. It still does the nice thing for single cell setups, with the benefit of being easier to reason about overall. I'll switch to that approach!

That would've been fine. The problem was that the tablet in the init loop wasn't just rebuilding its own cell it was rebuilding all the other cells.

@zmagg zmagg force-pushed the zmagg_rebuild_srv_schema_no_multicell branch 8 times, most recently from a70acbf to 97268cb Compare April 15, 2020 18:01
Copy link
Member

@rafael rafael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maggie this is looking great. I added minor comments.

@@ -54,7 +54,6 @@ func RestoreTablet(t *testing.T, localCluster *cluster.LocalProcessCluster, tabl
_, err := localCluster.VtctlProcess.ExecuteCommandWithOutput("CreateKeyspace",
"-keyspace_type=SNAPSHOT", "-base_keyspace="+keyspaceName,
"-snapshot_time", tm.Format(time.RFC3339), restoreKSName)
require.Nil(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should not be removed right?

if err != nil {
log.Errorf("could not rebuild SrvVschema after creating keyspace: %v", err)
return err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - remove extra new line.

@zmagg zmagg changed the title Skip RebuildSrvVschema in InitTablet for multi-cell configurations Pass in cell to EnsureVSchema and GetOrCreateShard Apr 15, 2020
Copy link
Member

@enisoc enisoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one optional suggestion.

@@ -1620,7 +1620,12 @@ func commandCreateKeyspace(ctx context.Context, wr *wrangler.Wrangler, subFlags
}

if !*allowEmptyVSchema {
err = wr.TopoServer().EnsureVSchema(ctx, keyspace)
cells, err := wr.TopoServer().GetKnownCells(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could also just pass an empty cells list to mean "all cells", since RebuildSrvVSchema already knows how to expand that by doing this same thing.

@zmagg zmagg closed this Apr 15, 2020
@zmagg zmagg reopened this Apr 15, 2020
@zmagg zmagg force-pushed the zmagg_rebuild_srv_schema_no_multicell branch from f927513 to c2deff9 Compare April 16, 2020 06:16
@zmagg zmagg force-pushed the zmagg_rebuild_srv_schema_no_multicell branch from 4f5ded2 to 72e75c1 Compare April 16, 2020 07:00
@deepthi
Copy link
Member

deepthi commented Apr 16, 2020

local.sharded_recovery                  	FAIL (3 tries)

This test hasn't been flaky. Does it pass for you locally?

@zmagg zmagg force-pushed the zmagg_rebuild_srv_schema_no_multicell branch from c0f1f4e to aadd900 Compare April 16, 2020 21:58
@zmagg zmagg force-pushed the zmagg_rebuild_srv_schema_no_multicell branch from aadd900 to f8edec4 Compare April 16, 2020 23:04
replicaTabletArgs := commonTabletArg

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zmagg and I discussed this. It seems like this was a flaky test. The following change addresses that problem.

@rafael rafael merged commit 924824b into vitessio:master Apr 20, 2020
@rafael rafael deleted the zmagg_rebuild_srv_schema_no_multicell branch April 20, 2020 17:57
ajm188 pushed a commit to tinyspeck/vitess that referenced this pull request Apr 28, 2020
…ema_no_multicell

Pass in cell to EnsureVSchema and GetOrCreateShard
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants