Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tablet picker cell alias fallback with local cell preference #11771

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/releasenotes/16_0_0_summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,13 @@

In [PR #11103](https://github.com/vitessio/vitess/pull/11103) we introduced the ability to resume a `VTGate` [`VStream` copy operation](https://vitess.io/docs/design-docs/vreplication/vstream/vscopy/). This is useful when a [`VStream` copy operation](https://vitess.io/docs/design-docs/vreplication/vstream/vscopy/) is interrupted due to e.g. a network failure or a server restart. The `VStream` copy operation can be resumed by specifying each table's last seen primary key value in the `VStream` request. Please see the [`VStream` docs](https://vitess.io/docs/16.0/reference/vreplication/vstream/) for more details.

### New TabletPicker Options and Default Cell Behavior

In [PR 11771](https://github.com/vitessio/vitess/pull/11771) we allow for default cell alias fallback during tablet selection for VStreams when client
does not specify list of cells. In addition, we add the option for local cell preference during tablet selection.
The local cell preference takes precedence over tablet type.See PR description for examples. If a client wants to specify local cell preference in the gRPC request,
they can pass in a new "local:" tag with the rest of the cells under VStreamFlags. e.g. "local:,cella,cellb".
Comment on lines +20 to +23
Copy link
Contributor

@mattlord mattlord Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested tweaks:

In [PR 11771](https://github.com/vitessio/vitess/pull/11771) we modify the default [TabletPicker](https://vitess.io/docs/16.0/reference/vreplication/tablet_selection/)
behavior during tablet selection for [`VStreams`](https://vitess.io/docs/16.0/concepts/vstream/):
  - OLD: look for candidate tablets in the local cell
  - NEW: look for candidate tablets in the local cell, if none are found, use the local cell's cell alias — if it has one — as a fallback

In addition, we add support for the `local` notation when the client *does* specify a list of cells, e.g.: `--cells="local:zone1a,zone1b,zone1c"
with `vtctldclient` commands and `VStreamFlags.Cells = "local:zone1a,zone1b,zone1c"` in the
[VStreamFlags](https://pkg.go.dev/vitess.io/vitess/go/vt/proto/vtgate#VStreamFlags) with the vtgate VStream RPC.
The local cell will then always be searched first and takes precedence over any others specified.
See [the PR](https://github.com/vitessio/vitess/pull/11771) description for examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### Tablet throttler

The tablet throttler can now be configured dynamically. Configuration is now found in the topo service, and applies to all tablets in all shards and cells of a given keyspace. For backwards compatibility `v16` still supports `vttablet`-based command line flags for throttler ocnfiguration.
Expand Down
119 changes: 94 additions & 25 deletions go/vt/discovery/tablet_picker.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ var (
muTabletPickerRetryDelay sync.Mutex
globalTPStats *tabletPickerStats
inOrderHint = "in_order:"
localPreferenceHint = "local:"
)

// GetTabletPickerRetryDelay synchronizes changes to tabletPickerRetryDelay. Used in tests only at the moment
Expand All @@ -64,12 +65,13 @@ func SetTabletPickerRetryDelay(delay time.Duration) {

// TabletPicker gives a simplified API for picking tablets.
type TabletPicker struct {
ts *topo.Server
cells []string
keyspace string
shard string
tabletTypes []topodatapb.TabletType
inOrder bool
ts *topo.Server
cells []string
keyspace string
shard string
tabletTypes []topodatapb.TabletType
inOrder bool
localPreference string
}

// NewTabletPicker returns a TabletPicker.
Expand All @@ -92,19 +94,60 @@ func NewTabletPicker(ts *topo.Server, cells []string, keyspace, shard, tabletTyp
return nil, vterrors.Errorf(vtrpcpb.Code_FAILED_PRECONDITION,
fmt.Sprintf("Missing required field(s) for tablet picker: %s", strings.Join(missingFields, ", ")))
}

localPreference := ""
if strings.HasPrefix(cells[0], localPreferenceHint) {
localPreference = cells[0][len(localPreferenceHint):]
cells = cells[1:]
pbibra marked this conversation as resolved.
Show resolved Hide resolved
// Add the local cell to the list of cells
// This may result in the local cell appearing twice if it already exists as part of an alias,
// but cells will get deduped during tablet selection. See GetMatchingTablets() -> tp.dedupeCells()
cells = append(cells, localPreference)
}

return &TabletPicker{
ts: ts,
cells: cells,
keyspace: keyspace,
shard: shard,
tabletTypes: tabletTypes,
inOrder: inOrder,
ts: ts,
cells: cells,
keyspace: keyspace,
shard: shard,
tabletTypes: tabletTypes,
inOrder: inOrder,
localPreference: localPreference,
}, nil
}

func (tp *TabletPicker) prioritizeTablets(candidates []*topo.TabletInfo) (sameCell, allOthers []*topo.TabletInfo) {
for _, c := range candidates {
if c.Alias.Cell == tp.localPreference {
sameCell = append(sameCell, c)
} else {
allOthers = append(allOthers, c)
}
}

return sameCell, allOthers
}

func (tp *TabletPicker) orderByTabletType(candidates []*topo.TabletInfo) []*topo.TabletInfo {
// Sort candidates slice such that tablets appear in same tablet type order as in tp.tabletTypes
orderMap := map[topodatapb.TabletType]int{}
for i, t := range tp.tabletTypes {
orderMap[t] = i
}
sort.Slice(candidates, func(i, j int) bool {
if orderMap[candidates[i].Type] == orderMap[candidates[j].Type] {
// identical tablet types: randomize order of tablets for this type
return rand.Intn(2) == 0 // 50% chance
}
return orderMap[candidates[i].Type] < orderMap[candidates[j].Type]
})

return candidates
}

// PickForStreaming picks an available tablet.
// All tablets that belong to tp.cells are evaluated and one is
// chosen at random.
// chosen at random, unless local preference is given
func (tp *TabletPicker) PickForStreaming(ctx context.Context) (*topodatapb.Tablet, error) {
rand.Seed(time.Now().UnixNano())
// keep trying at intervals (tabletPickerRetryDelay) until a tablet is found
Expand All @@ -116,19 +159,28 @@ func (tp *TabletPicker) PickForStreaming(ctx context.Context) (*topodatapb.Table
default:
}
candidates := tp.GetMatchingTablets(ctx)
if tp.inOrder {
// Sort candidates slice such that tablets appear in same tablet type order as in tp.tabletTypes
orderMap := map[topodatapb.TabletType]int{}
for i, t := range tp.tabletTypes {
orderMap[t] = i
// we'd like to prioritize same cell tablets
if tp.localPreference != "" {
sameCellCandidates, allOtherCandidates := tp.prioritizeTablets(candidates)

// order same cell and all others by tablet type separately
// combine with same cell in front
if tp.inOrder {
sameCellCandidates = tp.orderByTabletType(sameCellCandidates)
allOtherCandidates = tp.orderByTabletType(allOtherCandidates)
} else {
// Randomize same cell candidates
rand.Shuffle(len(sameCellCandidates), func(i, j int) {
sameCellCandidates[i], sameCellCandidates[j] = sameCellCandidates[j], sameCellCandidates[i]
})
// Randomize all other candidates
rand.Shuffle(len(allOtherCandidates), func(i, j int) {
allOtherCandidates[i], allOtherCandidates[j] = allOtherCandidates[j], allOtherCandidates[i]
})
}
sort.Slice(candidates, func(i, j int) bool {
if orderMap[candidates[i].Type] == orderMap[candidates[j].Type] {
// identical tablet types: randomize order of tablets for this type
return rand.Intn(2) == 0 // 50% chance
}
return orderMap[candidates[i].Type] < orderMap[candidates[j].Type]
})
candidates = append(sameCellCandidates, allOtherCandidates...)
} else if tp.inOrder {
candidates = tp.orderByTabletType(candidates)
} else {
// Randomize candidates
rand.Shuffle(len(candidates), func(i, j int) {
Expand Down Expand Up @@ -204,6 +256,10 @@ func (tp *TabletPicker) GetMatchingTablets(ctx context.Context) []*topo.TabletIn
actualCells = append(actualCells, cell)
}
}
// Just in case a cell was passed in addition to its alias.
// Can happen if localPreference is not "". See NewTabletPicker
actualCells = tp.dedupeCells(actualCells)

for _, cell := range actualCells {
shortCtx, cancel := context.WithTimeout(ctx, topo.RemoteOperationTimeout)
defer cancel()
Expand Down Expand Up @@ -246,6 +302,19 @@ func (tp *TabletPicker) GetMatchingTablets(ctx context.Context) []*topo.TabletIn
return tablets
}

func (tp *TabletPicker) dedupeCells(cells []string) []string {
keys := make(map[string]bool)
dedupedCells := []string{}

for _, c := range cells {
if _, value := keys[c]; !value {
keys[c] = true
dedupedCells = append(dedupedCells, c)
}
}
return dedupedCells
}

func init() {
// TODO(sougou): consolidate this call to be once per process.
rand.Seed(time.Now().UnixNano())
Expand Down
158 changes: 158 additions & 0 deletions go/vt/discovery/tablet_picker_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,12 @@ package discovery

import (
"context"
"os"
"testing"
"time"

_flag "vitess.io/vitess/go/internal/flag"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"google.golang.org/protobuf/proto"
Expand Down Expand Up @@ -364,6 +367,156 @@ func TestPickUsingCellAlias(t *testing.T) {
assert.True(t, picked2)
}

func TestPickLocalPreferences(t *testing.T) {
type tablet struct {
id uint32
typ topodatapb.TabletType
cell string
}

type testCase struct {
name string

//inputs
tablets []tablet
inCells []string
inTabletTypes string

//expected
tpLocalPreference string
tpCells []string
wantTablets []uint32
}

tcases := []testCase{
{
"local preference",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this would be much easier to read if we had the struct field names, e.g.:

name: "local preference",
...
wantTablets: []uint32{102, 103},

Otherwise the reader needs to keep the struct's field indexes in their head.

[]tablet{
{101, topodatapb.TabletType_REPLICA, "cell1"},
{102, topodatapb.TabletType_REPLICA, "cell2"},
{103, topodatapb.TabletType_REPLICA, "cell2"},
},
[]string{"local:cell2", "cell1"},
"replica",
"cell2",
[]string{"cell1", "cell2"},
[]uint32{102, 103},
},
{
"local preference with cell alias",
[]tablet{
{101, topodatapb.TabletType_REPLICA, "cell1"},
{102, topodatapb.TabletType_REPLICA, "cell2"},
},
[]string{"local:cell2", "cella"},
"replica",
"cell2",
[]string{"cella", "cell2"},
[]uint32{102},
},
{
"local preference with tablet type ordering, replica",
[]tablet{
{101, topodatapb.TabletType_REPLICA, "cell1"},
{102, topodatapb.TabletType_REPLICA, "cell1"},
{103, topodatapb.TabletType_PRIMARY, "cell2"},
{104, topodatapb.TabletType_REPLICA, "cell2"},
},
[]string{"local:cell2", "cella"},
"in_order:replica,primary",
"cell2",
[]string{"cella", "cell2"},
[]uint32{104},
},
{
"no local preference with tablet type ordering, primary",
[]tablet{
{101, topodatapb.TabletType_REPLICA, "cell1"},
{102, topodatapb.TabletType_PRIMARY, "cell1"},
{103, topodatapb.TabletType_REPLICA, "cell2"},
{104, topodatapb.TabletType_REPLICA, "cell2"},
},
[]string{"cell2", "cella"},
"in_order:primary,replica",
"",
[]string{"cella", "cell2"},
[]uint32{102},
},
{
"local preference with tablet type ordering, primary in local",
[]tablet{
{101, topodatapb.TabletType_REPLICA, "cell1"},
{102, topodatapb.TabletType_REPLICA, "cell1"},
{103, topodatapb.TabletType_PRIMARY, "cell2"},
{104, topodatapb.TabletType_REPLICA, "cell2"},
},
[]string{"local:cell2", "cella"},
"in_order:primary,replica",
"cell2",
[]string{"cella", "cell2"},
[]uint32{103},
},
{
"local preference with tablet type ordering, primary not local",
[]tablet{
{101, topodatapb.TabletType_PRIMARY, "cell1"},
{102, topodatapb.TabletType_REPLICA, "cell1"},
{103, topodatapb.TabletType_REPLICA, "cell2"},
{104, topodatapb.TabletType_REPLICA, "cell2"},
},
[]string{"local:cell2", "cella"},
"in_order:primary,replica",
"cell2",
[]string{"cella", "cell2"},
[]uint32{103, 104}, // replicas are picked because primary is not in the local cell/cell alias
},
{
"local preference with tablet type ordering, primary in local's alias",
[]tablet{
{101, topodatapb.TabletType_PRIMARY, "cell1"},
{102, topodatapb.TabletType_REPLICA, "cell1"},
},
[]string{"local:cell2", "cella"},
"in_order:primary,replica",
"cell2",
[]string{"cella", "cell2"},
[]uint32{101}, // primary found since there are no tablets in cell/cell alias
},
}

ctx := context.Background()
for _, tcase := range tcases {
t.Run(tcase.name, func(t *testing.T) {
cells := []string{"cell1", "cell2"}
te := newPickerTestEnv(t, cells)
var testTablets []*topodatapb.Tablet
for _, tab := range tcase.tablets {
testTablets = append(testTablets, addTablet(te, int(tab.id), tab.typ, tab.cell, true, true))
}
defer func() {
for _, tab := range testTablets {
deleteTablet(t, te, tab)
}
}()
tp, err := NewTabletPicker(te.topoServ, tcase.inCells, te.keyspace, te.shard, tcase.inTabletTypes)
require.NoError(t, err)
require.Equal(t, tp.localPreference, tcase.tpLocalPreference)
require.ElementsMatch(t, tp.cells, tcase.tpCells)
var selectedTablets []uint32
selectedTabletMap := make(map[uint32]bool)
for i := 0; i < 20; i++ {
tab, err := tp.PickForStreaming(ctx)
require.NoError(t, err)
selectedTabletMap[tab.Alias.Uid] = true
}
for uid := range selectedTabletMap {
selectedTablets = append(selectedTablets, uid)
}
require.ElementsMatch(t, selectedTablets, tcase.wantTablets)
})
}
}

func TestTabletAppearsDuringSleep(t *testing.T) {
te := newPickerTestEnv(t, []string{"cell"})
tp, err := NewTabletPicker(te.topoServ, te.cells, te.keyspace, te.shard, "replica")
Expand Down Expand Up @@ -428,6 +581,11 @@ type pickerTestEnv struct {
topoServ *topo.Server
}

func TestMain(m *testing.M) {
_flag.ParseFlagsForTest()
os.Exit(m.Run())
}

func newPickerTestEnv(t *testing.T, cells []string) *pickerTestEnv {
ctx := context.Background()

Expand Down
2 changes: 1 addition & 1 deletion go/vt/discovery/topology_watcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ func (tw *TopologyWatcher) loadTablets() {
return
default:
}
log.Errorf("cannot get tablets for cell: %v: %v", tw.cell, err)
log.Errorf("cannot get tablets for cell:%v: %v", tw.cell, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this would be slightly better:

log.Errorf("cannot get tablets for cell %q: %v", tw.cell, err)

return
}

Expand Down
Loading