-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mcs: add balancer for keyspace group #6274
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Skipping CI for Draft Pull Request. |
pkg/keyspace/tso_keyspace_group.go
Outdated
for _, item := range resp.Kvs { | ||
nodes = append(nodes, string(item.Value)) | ||
} | ||
m.nodesBalancer.Reset(nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the nodesBalancer thread safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
f1cba93
to
befd1b3
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #6274 +/- ##
==========================================
+ Coverage 75.04% 75.22% +0.17%
==========================================
Files 405 407 +2
Lines 40117 40311 +194
==========================================
+ Hits 30106 30322 +216
+ Misses 7372 7358 -14
+ Partials 2639 2631 -8
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 40 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left few comments. @rleungx for more thoughts about my comment
pkg/keyspace/tso_keyspace_group.go
Outdated
revision = wresp.CompactRevision | ||
break | ||
} | ||
if wresp.Err() != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my understanding is that wresp.Err() != nil means two things here:
- it isn't equal to ctx being cancelled. So if we return here and exit watchLoop but the API service is still serving and try to recover from some bad thing happened, once it's recovered we have no loop to watch the primary address change.
- It doesn't help to call m.client.Watch to retry because watch.go will retry Watch() as much as it can unless it thinks it doesn't help.
It seems that we can't exit the watch loop but wait a while then retry "watcher := clientv3.NewWatcher(m.client) ; then watch" endlessly until ctx is cancelled. + @rleungx to provide possible comment on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do a fix similiar with #6279
|
||
policy balancer.Policy | ||
|
||
nodesBalancer balancer.Balancer[string] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we just select nodes in the round-robin. We need two-levels scheduling here. The first level is to use the following policies to place keyspace groups according to their UserKind, within each UserKind then user round-robin. We probably can't skip these two-levels scheduling for now. + @rleungx, thoughts?
- The TSO system tries to evenly distribute the keyspace with the User Kind "business" or "enterprise", across different keyspace groups as much as possible.
- The TSO system tries to evenly distribute the primary/secondaries of the keyspace groups with the User Kind "business" or "enterprise", across the TSO nodes as much as possible.
- The TSO system tries to assign the 'basic' keyspace groups' primaries to as few TSO nodes as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So where will the correspondence between tso node and user kind be found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The balancer assigns and specifies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's ignore this comment and just use round-robin as what it is in pr for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We incrementally improve it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add todo for it
PolicyRoundRobin Policy = iota | ||
// PolicyLeast is the policy to return the least used node. | ||
// TODO: move indexed heap to pkg and use it. | ||
PolicyLeast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is it used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will do at anotherr pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not add it in another pr? maybe you need replace it by adding todo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to remove PolicyLeast if we don't use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry. It will be added after #6268 is merged.
switch policy { | ||
case PolicyRoundRobin: | ||
return NewRoundRobin[T]() | ||
default: // only round-robin is supported now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not return round robin directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after dealing with the current comments as discussed.
PolicyRoundRobin Policy = iota | ||
// PolicyLeast is the policy to return the least used node. | ||
// TODO: move indexed heap to pkg and use it. | ||
PolicyLeast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to remove PolicyLeast if we don't use it
|
||
policy balancer.Policy | ||
|
||
nodesBalancer balancer.Balancer[string] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's ignore this comment and just use round-robin as what it is in pr for now.
|
||
policy balancer.Policy | ||
|
||
nodesBalancer balancer.Balancer[string] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We incrementally improve it later.
pkg/keyspace/tso_keyspace_group.go
Outdated
m.wg.Wait() | ||
} | ||
|
||
func (m *GroupManager) startWatchLoop(revision int64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need watch loop? My understanding is that we just need to read registered tso nodes once when balance API is invoked. The operator (human) will call balance API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- @rleungx for his thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just so
pkg/keyspace/tso_keyspace_group.go
Outdated
members := m.AllocNodesForGroup(keyspaceGroup.Replica) | ||
if len(members) == 0 { | ||
// directly return error if no available node. | ||
// It means that the number of nodes is reducing between the check of controller and the execution of this function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split it into two lines?
pkg/keyspace/tso_keyspace_group.go
Outdated
case <-ticker.C: | ||
} | ||
addr := m.nodesBalancer.Next() | ||
if addr == "" { // no node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if addr == "" { // no node | |
if len(addr) == 0 { // no node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They is the same
@@ -44,6 +45,10 @@ type CreateKeyspaceGroupParams struct { | |||
|
|||
// CreateKeyspaceGroups creates keyspace groups. | |||
func CreateKeyspaceGroups(c *gin.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we add distribution/balance feature in CreateKeyspaceGroups. We'd better add a separate balance API which read tso nodes and the distribution of existing keyspace groups then do the balance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can abstract balance, with definite inputs and output, and add a new balance API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do it in a separate pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will split it.
@@ -73,6 +83,8 @@ type KeyspaceGroup struct { | |||
Members []KeyspaceGroupMember `json:"members"` | |||
// Keyspaces are the keyspace IDs which belong to the keyspace group. | |||
Keyspaces []uint32 `json:"keyspaces"` | |||
// Replica is the tso replica count of the keyspace group. | |||
Replica int `json:"replica"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need Replica field. It can be a const value starting from 2. In the future, we might want to add replica to endpoint.KeyspaceGroup to specified the desired replica count.
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
pkg/keyspace/tso_keyspace_group.go
Outdated
break | ||
} | ||
} | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the err be shadowed?
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
c.AbortWithStatusJSON(http.StatusBadRequest, errs.ErrBindJSON.Wrap(err).GenWithStackByCause()) | ||
return | ||
} | ||
if manager.GetNodesNum() < allocParams.Replica || allocParams.Replica < 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume that the group has a specific user kind, here we just GetNodesNum to do the check. Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly, LGTM
/merge |
@JmPotato: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: cf2ff20
|
@lhy1024: Your PR was out of date, I have automatically updated it for you. If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
* client: refine serviceModeKeeper code (tikv#6201) ref tikv#5895 Some code refinements for `serviceModeKeeper`. Signed-off-by: JmPotato <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> * *: use revision for watch test (tikv#6205) ref tikv#6071 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> * *: remove unnecessary rand init (tikv#6207) close tikv#6134 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> * Refactor TSO forward/dispatcher to be shared by both PD and TSO (tikv#6175) ref tikv#5895 Add general tso forward/dispatcher for independent pd(tso)/tso services and cross cluster forwarding. Signed-off-by: Bin Shi <[email protected]> Signed-off-by: lhy1024 <[email protected]> * Add basic multi-keyspace-group management (tikv#6214) ref tikv#5895 Support basic functions of multi-keyspace-group management Signed-off-by: Bin Shi <[email protected]> Signed-off-by: lhy1024 <[email protected]> * *: support keyspace group RESTful API (tikv#6229) ref tikv#6231 Signed-off-by: Ryan Leung <[email protected]> Signed-off-by: lhy1024 <[email protected]> * mcs: add more tso tests (tikv#6184) ref tikv#5836 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> * client: fix compatibility problem of pd client (tikv#6244) close tikv#6243 Signed-off-by: Ryan Leung <[email protected]> * *: unify the key prefix (tikv#6248) ref tikv#5836 Signed-off-by: Ryan Leung <[email protected]> Signed-off-by: lhy1024 <[email protected]> * *: remove cluster dependency from keyspace (tikv#6249) ref tikv#6231 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> Signed-off-by: lhy1024 <[email protected]> * *: make code clear by rename `isServing` to `isRunning` (tikv#6258) ref tikv#4399 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> Signed-off-by: lhy1024 <[email protected]> * cgroup: fix the path problem due to special container name (tikv#6267) close tikv#6266 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> * server: fix watch keyspace revision (tikv#6251) ref tikv#5895 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> * tso, server: refine the TSO allocator manager parameters (tikv#6269) ref tikv#5895 - Refine the TSO allocator manager parameters. - Always run `tsoAllocatorLoop` to advance the Global TSO. Signed-off-by: JmPotato <[email protected]> Signed-off-by: lhy1024 <[email protected]> * tso: unify the TSO ServiceConfig and ConfigProvider interfaces (tikv#6272) ref tikv#5895 Unify the TSO `ServiceConfig` and `ConfigProvider` interfaces. Signed-off-by: JmPotato <[email protected]> * Load initial assignment and dynamically watch/apply keyspace groups' membership/distribution change (tikv#6247) ref tikv#6232 Load initial keyspace group assignment. Dynamically watch/apply keyspace groups' membership/distribution change. Signed-off-by: Bin Shi <[email protected]> * *: define user kind for keyspace group (tikv#6241) ref tikv#6231 Signed-off-by: Ryan Leung <[email protected]> Signed-off-by: lhy1024 <[email protected]> * Add more failure tests when tso service loading initial keyspace groups assignment (tikv#6280) ref tikv#6232 Add more failure tests when tso service loading initial keyspace groups assignment Signed-off-by: Bin Shi <[email protected]> * Apply multi-keyspace-group membership to tso service and handle inconsistency issue (tikv#6282) ref tikv#6232 Apply multi-keyspace-group membership to tso service and handle inconsistency issue. 1. Add KeyspaceLookupTable to endpoint.KeyspaceGroup type KeyspaceGroup struct { ... // KeyspaceLookupTable is for fast lookup if a given keyspace belongs to this keyspace group. // It's not persisted and will be built when loading from storage. KeyspaceLookupTable map[uint32]struct{} `json:"-"` } 2. After loading keyspace groups, the Keyspace Group Manager builds KeyspaceLookupTable for every keyspace groups. 3. When Keyspace Group Manager handles tso requests, it uses the keyspaceLookupTable to check if the required keypsace still belongs to the required keyspace group. If not, returns the current keyspace group id in the tso response header. Signed-off-by: Bin Shi <[email protected]> * *: auto assign keyspace group (tikv#6268) close tikv#6231 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> Signed-off-by: lhy1024 <[email protected]> * keyspace, api: support the keyspace group split (tikv#6293) ref tikv#6232 Support the keyspace group split and add related tests. Signed-off-by: JmPotato <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> Signed-off-by: lhy1024 <[email protected]> * Improve lock mechanism in tso.KeyspaceGroupManager (tikv#6305) ref tikv#6232 Use the RWMutex instead of individual atomic types to better protect the state of the keyspace group manager Signed-off-by: Bin Shi <[email protected]> * keyspace: add split-from field for endpoint.KeyspaceGroup (tikv#6309) ref tikv#6232 Add `split-from` field for `endpoint.KeyspaceGroup`. Signed-off-by: JmPotato <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> * Add read lock at one place for protection and better structure (tikv#6310) ref tikv#6232, ref tikv#6305 follow-up tikv#6305 Add read lock at one place for protection and better structure Signed-off-by: Bin Shi <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> * tso: optimize function signatures to reduce parameter passing (tikv#6315) ref tikv#6232 Optimize function signatures to reduce parameter passing. Signed-off-by: JmPotato <[email protected]> * *: bootstrap keyspace group when server is in API mode (tikv#6308) ref tikv#6231 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> Signed-off-by: lhy1024 <[email protected]> * keyspace: avoid keyspace being updated during the split (tikv#6316) ref tikv#6232 Prevent keyspace from being updated during the split. Signed-off-by: JmPotato <[email protected]> * *: fix `TestConcurrentlyReset` (tikv#6318) close tikv#6275 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> * bootstrap default keyspace group in the tso service (tikv#6306) ref tikv#6232 Changes: 1. Introduce the initialization logic of the default keyspace group. - If the default keyspace group isn't configured in the etcd, every tso node/pod should initialize it and join the election for the primary of this group. - If the default keyspace group is configured in the etcd, the tso nodes/pods which are assigned with this group will initialize it and join the election for the primary of this group. 2. Introduce the keyspace group membership restriction -- default keyspace always belongs to default keyspace group. Signed-off-by: Bin Shi <[email protected]> * *: change the log level (tikv#6324) ref tikv#6232 Signed-off-by: Ryan Leung <[email protected]> * *: fix the missing log panic (tikv#6325) close tikv#6257 Signed-off-by: Ryan Leung <[email protected]> Signed-off-by: lhy1024 <[email protected]> * mcs: fix watch primary address revision and update cache when meets not leader (tikv#6279) ref tikv#5895 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> Signed-off-by: lhy1024 <[email protected]> * tso, member: support TSO split based on keyspace group split (tikv#6313) ref tikv#6232 Support TSO split based on keyspace group split. Signed-off-by: JmPotato <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> Signed-off-by: lhy1024 <[email protected]> * mcs: support metrics HTTP interface for tso/resource manager server (tikv#6329) ref tikv#5895 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> Signed-off-by: lhy1024 <[email protected]> * tso: put finishSplitKeyspaceGroup into the critical section (tikv#6331) ref tikv#6232 Put `finishSplitKeyspaceGroup` into the critical section. Signed-off-by: JmPotato <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> * *: make `TestServerRegister` stable (tikv#6337) close tikv#6334 Signed-off-by: Ryan Leung <[email protected]> * tests: divide all tests into the CI chunks including submodule tests (tikv#6198) ref tikv#6181, ref tikv#6183 Divide all tests into the CI chunks including submodule tests. Signed-off-by: JmPotato <[email protected]> Signed-off-by: lhy1024 <[email protected]> * tests: introduce TSO TestCluster in the test (tikv#6333) ref tikv#6232 Introduce TSO `TestCluster` in the test. Signed-off-by: JmPotato <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> Signed-off-by: lhy1024 <[email protected]> * mcs: add balancer for keyspace group (tikv#6274) close tikv#6233 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> Signed-off-by: lhy1024 <[email protected]> * Fixed bugs in tso service registry watching loop. (tikv#6346) ref tikv#6343 Fixed the following two bugs: 1. When re-watch a range, to continue from what left by the last watch, the revision is wresp.Header.Revision + 1 instead of wresp.Header.Revision, where wresp.Header.Revision is the revision indicated in the response of the last watch. Because of this bug, it was processing the same event endless as you can see from the log below. 2. In tso service watch loop in /Users/binshi/code/pingcap/my-pd/pkg/keyspace/tso_keyspace_group.go, If this is delete event, the json.Unmarshal(event.Kv.Value, s) will fail with the error "unexpected end of JSON input", so there is no way to get s.serviceAddr from the result of json.Unmarshal. Signed-off-by: Bin Shi <[email protected]> * mcs: fix double compression of prom handler (tikv#6339) ref prometheus/client_golang#622, ref tikv#5895 Signed-off-by: Ryan Leung <[email protected]> * tests, tso: add more TSO split tests (tikv#6338) ref tikv#6232 Add more TSO split tests. Signed-off-by: JmPotato <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> * keyspace, tso: fix next revision to watch after watch/Get/RangeScan (tikv#6353) ref tikv#6232 The next revision to watch should always be Header.Revision + 1 where header is response header of watch/Get/RangeScan Signed-off-by: Bin Shi <[email protected]> * mcs, tests: use TSO cluster to do the failover test (tikv#6356) ref tikv#5895 Use TSO cluster to do the failover test. Signed-off-by: JmPotato <[email protected]> * fix startWatchLoop leak (tikv#6352) Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs: update client when meet transport is closing (tikv#6341) * mcs: update client when meet transport is closing Signed-off-by: lhy1024 <[email protected]> * address comments Signed-off-by: lhy1024 <[email protected]> * add retry Signed-off-by: lhy1024 <[email protected]> --------- Signed-off-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <[email protected]> * add bootstrap test (tikv#6347) Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <[email protected]> * Fix flaky TestLoadKeyspaceGroupsAssignment test (tikv#6365) Reduce the count of keyspace groups from 4096 to 512 in TestKeyspaceGroupManagerTestSuite/TestLoadKeyspaceGroupsAssignment to avoid timeout when test running slow. Signed-off-by: Bin Shi <[email protected]> * mcs, tso: fix ts fallback caused by multi-primary of the same keyspace group (tikv#6362) * Change participant election-prifix from listen-addr to advertise-listen-addr to gurantee uniqueness. Signed-off-by: Bin Shi <[email protected]> * Add TestPariticipantStartWithAdvertiseListenAddr Signed-off-by: Bin Shi <[email protected]> * Add comments to fix go fmt errors Signed-off-by: Bin Shi <[email protected]> --------- Signed-off-by: Bin Shi <[email protected]> Co-authored-by: Ryan Leung <[email protected]> Signed-off-by: lhy1024 <[email protected]> * fix log output (tikv#6364) Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs: fix duplicate start of RaftCluster. (tikv#6358) * Using double-checked locking to avoid duplicate start of RaftCluster. Signed-off-by: Bin Shi <[email protected]> * Handle feedback Signed-off-by: Bin Shi <[email protected]> * improve locking Signed-off-by: Bin Shi <[email protected]> * handle feedback Signed-off-by: Bin Shi <[email protected]> --------- Signed-off-by: Bin Shi <[email protected]> Co-authored-by: Ryan Leung <[email protected]> * Add retry mechanism for updating keyspace group (tikv#6372) Signed-off-by: JmPotato <[email protected]> * mcs: add set handler for balancer and alloc node for default keyspace group (tikv#6342) ref tikv#6233 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs, tso: fix Nil pointer deference when (*AllocatorManager).GetMember (tikv#6383) close tikv#6381 If the desired keyspace group fall back to the default keyspace group and the AM isn't initialized, return not served error. Signed-off-by: Bin Shi <[email protected]> * mcs, tso: support multi-keyspace-group and its service discovery in E2E path (tikv#6321) ref tikv#6232 Support multi-keyspace-group in PD(TSO) client Signed-off-by: Bin Shi <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <[email protected]> * client: add `NewClientWithKeyspaceName` for client (tikv#6380) ref tikv#5895 Signed-off-by: Ryan Leung <[email protected]> * keyspace, tso: check the replica count before the split (tikv#6382) ref tikv#6233 Check the replica count before the split. Signed-off-by: JmPotato <[email protected]> Co-authored-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tso: fix bugs to make split test case to pass (tikv#6389) ref tikv#6232 fix bugs to make split test case to pass Signed-off-by: Bin Shi <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * keyspace: patrol keyspace assignment before the first split (tikv#6388) ref tikv#6232 Patrol the keyspace assignment before the first split to make sure every keyspace has its group assignment. Signed-off-by: JmPotato <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tests: fix Flaky TestMicroserviceTSOServer/TestConcurrentlyReset (tikv#6396) close tikv#6385 Get a copy of now then call base.add, because now is shared by all goroutines and now.add() will add to itself which isn't atomic and multi-goroutine safe. Signed-off-by: Bin Shi <[email protected]> * keyspace, slice: improve code efficiency in membership ops (tikv#6392) ref tikv#6231 Improve code efficiency in membership ops Signed-off-by: Bin Shi <[email protected]> * tests: enable TestTSOKeyspaceGroupSplitClient (tikv#6398) ref tikv#6232 Enable `TestTSOKeyspaceGroupSplitClient`. Signed-off-by: JmPotato <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tests: add more tests for multiple keyspace groups (tikv#6395) ref tikv#5895 Add CheckMultiKeyspacesTSO() and WaitForMultiKeyspacesTSOAvailable in test utility. Add TestTSOKeyspaceGroupManager/TestKeyspacesServedByNonDefaultKeyspaceGroup. Cover TestGetTS, TestGetTSAsync, TestUpdateAfterResetTSO in TestMicroserviceTSOClient for multiple keyspace groups. Signed-off-by: Bin Shi <[email protected]> * tests: fix failpoint disable (tikv#6401) ref tikv#4399 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <[email protected]> * client: retry load keyspace meta when creating a new client (tikv#6402) ref tikv#5895 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Fix test issue in TestRandomResignLeader. (tikv#6410) close tikv#6404 We need to make sure the selected keyspaces are from different keyspace groups, otherwise multiple goroutines below could try to resign the primary of the same keyspace group and cause race condition. Signed-off-by: Bin Shi <[email protected]> * keyspace, api2: fix the keyspace assignment patrol consistency (tikv#6397) ref tikv#6232 Fix the keyspace assignment patrol consistency. Signed-off-by: JmPotato <[email protected]> Co-authored-by: Ryan Leung <[email protected]> Signed-off-by: lhy1024 <[email protected]> * election, tso: fix data race in lease.go (tikv#6379) close tikv#6378 fix data race in lease.go Signed-off-by: Bin Shi <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs: fix forward test with pd mode client (tikv#6290) ref tikv#5895, ref tikv#6279, close tikv#6289 Signed-off-by: lhy1024 <[email protected]> * keyspace: patrol the keyspace assignment in batch (tikv#6411) ref tikv#6232 Patrol the keyspace assignment in batch. Signed-off-by: JmPotato <[email protected]> * etcdutil: add watch loop (tikv#6390) close tikv#6391 Signed-off-by: lhy1024 <[email protected]> * mcs, tso: add API interface to obtain the TSO keyspace group member info (tikv#6373) ref tikv#6232 Add API interface to obtain the TSO keyspace group member info. Signed-off-by: JmPotato <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * pkg: move operator_check out from test_util tikv#6162 Signed-off-by: lhy1024 <[email protected]> * keysapce: wait region split when creating keyspace (tikv#6414) ref tikv#6231 Signed-off-by: zeminzhou <[email protected]> Signed-off-by: lhy1024 <[email protected]> Co-authored-by: zzm <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <[email protected]> * mcs: use getClusterInfo to check whether api service is ready (tikv#6422) ref tikv#5836 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <[email protected]> * fix data race by replace clone (tikv#6242) close tikv#6230 Signed-off-by: bufferflies <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> Signed-off-by: lhy1024 <[email protected]> * fix test and git mod tidy Signed-off-by: lhy1024 <[email protected]> * revert makefile Signed-off-by: lhy1024 <[email protected]> * fix ctx in watch loop Signed-off-by: lhy1024 <[email protected]> * delete pd-tests.yaml Signed-off-by: lhy1024 <[email protected]> * pd-ctl, tests: add the keyspace group commands (tikv#6423) ref tikv#6232 Add the keyspace group commands to show and split keyspace groups. Signed-off-by: JmPotato <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Handle compatibility issue in GetClusterInfo RPC (tikv#6434) ref tikv#5895, close tikv#6448 Handle the compatibility issue in the GetClusterInfo RPC Signed-off-by: Bin Shi <[email protected]> Signed-off-by: lhy1024 <[email protected]> * Provide GetMinTS API to solve the compatibility issue brought by multi-timeline tso (tikv#6421) ref tikv#6142 1. Import kvproto change to introduce GetMinTS rpc in the TSO service. 6. Add server side implementation for GetMinTS rpc. 7. Add client side implementation for GetMinTS rpc. 8. Add unit test Signed-off-by: Bin Shi <[email protected]> Signed-off-by: lhy1024 <[email protected]> * tso: use less interval when waiting api service (tikv#6451) close tikv#6449 Signed-off-by: lhy1024 <[email protected]> * etcdutil: fix ctx in watch loop (tikv#6445) close tikv#6439 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <[email protected]> * Fix "non-default keyspace groups use the same timestamp path by mistake" (tikv#6457) close tikv#6453, close tikv#6465 The tso servers are loading keyspace groups asynchronously. Make sure all keyspace groups are available for serving tso requests from corresponding keyspaces by querying IsKeyspaceServing(keyspaceID, the Desired KeyspaceGroupID). if use default keyspace group id in the query, it will always return true as the keyspace will be served by default keyspace group before the keyspace groups are loaded. Signed-off-by: Bin Shi <[email protected]> Signed-off-by: lhy1024 <[email protected]> * TSO microservice discovery fallback path shouldn't call FindGroupByKeyspaceID (tikv#6473) close tikv#6472 TSO microservice discovery fallback path shouldn't call FindGroupByKeyspaceID Signed-off-by: Bin Shi <[email protected]> * Revert "cgroup: fix the path problem due to special container name (tikv#6267)" This reverts commit 0c4cf7f947799e5c45d6e37448475b921044bdde. * *: rm debug file (tikv#6458) ref tikv#4399 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Revert "*: remove unnecessary rand init (tikv#6207)" This reverts commit 7383ded7581c417a3866da271eb2ec0a27b5a6c8. * mcs, tso: handle null keyspace (tikv#6476) ref tikv#5895 For API V1 and legacy path (NewClientWithContext w/o keyspace id/name), using Null Keypsace ID (uint32max) instead of default keyspace id and make sure it can be served by the default keyspace group's timeline. Modifying test accordingly. Signed-off-by: Bin Shi <[email protected]> * mcs, tso: print TSO service discovery fallback log just once (tikv#6478) ref tikv#5895 Print TSO service discovery fallback log just once Signed-off-by: Bin Shi <[email protected]> * client: return error if the keyspace meta cannot be found (tikv#6479) ref tikv#6142 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * client: support use API context to create client (tikv#6482) ref tikv#6142 Signed-off-by: Ryan Leung <[email protected]> --------- Signed-off-by: Bin Shi <[email protected]> Signed-off-by: lhy1024 <[email protected]> Signed-off-by: Ryan Leung <[email protected]> Signed-off-by: JmPotato <[email protected]> Co-authored-by: JmPotato <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> Co-authored-by: Ryan Leung <[email protected]> Co-authored-by: Bin Shi <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Co-authored-by: zzm <[email protected]> Co-authored-by: buffer <[email protected]>
What problem does this PR solve?
Issue Number: Close #6233
What is changed and how does it work?
Check List
Tests
Release note