-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ddl: decouple job scheduler from 'ddl' and make it run/exit as owner changes #53548
Conversation
Hi @D3Hunter. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
pkg/ddl/backfilling_dist_executor.go
Outdated
// TODO getTableByTxn is using DDL ctx which is never cancelled except when shutdown. | ||
// we should move this heavy operation out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tangenta ptal
@@ -89,7 +88,7 @@ func (sch *BackfillingSchedulerExt) OnNextSubtasksBatch( | |||
return nil, err | |||
} | |||
job := &backfillMeta.Job | |||
tblInfo, err := getTblInfo(sch.d, job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should framework context
@@ -382,8 +386,6 @@ type ddlCtx struct { | |||
*waitSchemaSyncedController | |||
*schemaVersionManager | |||
|
|||
runningJobs *runningJobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to job scheduler
// SeqNum is the total order in all DDLs, it's used to identify the order of | ||
// moving the job into DDL history, not the order of the job execution. | ||
// fast create table doesn't honor this field, there might duplicate seq_num in this case. | ||
// TODO: deprecated it, as it forces 'moving jobs into DDL history' part to be serial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjhuang2016 ptal
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #53548 +/- ##
================================================
+ Coverage 72.4973% 74.5399% +2.0425%
================================================
Files 1506 1506
Lines 430821 430910 +89
================================================
+ Hits 312334 321200 +8866
+ Misses 99116 89797 -9319
- Partials 19371 19913 +542
Flags with carried forward coverage won't be shown. Click here to find out more.
|
/retest |
@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will review soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 / 19 files viewed
/retest |
@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/retest |
@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
the failure of |
@D3Hunter: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tangenta, tiancaiamao The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20 / 29 files viewed
reviewing
@@ -289,11 +289,6 @@ func (w *worker) runReorgJob(reorgInfo *reorgInfo, tblInfo *model.TableInfo, | |||
if err != nil { | |||
return errors.Trace(err) | |||
} | |||
case <-w.ctx.Done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we rely on doneCh to find w.ctx.Done(). How about change the signature of f
to have a context parameter, and in runReorgJob use w.ctx
as argument. So we have more confident to remove this case <-w.ctx.Done():
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all current f
s also calls through worker
, it should be fine except that we still uses background
context inside, will change it later
logutil.DDLLogger().Info("run reorg job quit") | ||
d.removeReorgCtx(job.ID) | ||
// We return dbterror.ErrWaitReorgTimeout here too, so that outer loop will break. | ||
return dbterror.ErrWaitReorgTimeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error returned to caller is also changed, not sure about its effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previous ctx only cancells when shutdown, this error shouldn't matter much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest lgtm. You can unhold yourself after address comments.
/unhold |
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <[email protected]>
What problem does this PR solve?
Issue Number: ref #53246
Problem Summary:
What changed and how does it work?
ddl
partially, some fields are still coupled, and make it a separate and cancellable module. it should make further test easier.isOwner
check, so the duration of co-exist of 2 owner might be quite long.SyncJobSchemaVerLoop
only on owner node.add-index
disttask scheduler uses the 'ddlCtx.ctx', it should use the context we pass in.Check List
Tests
/tidb/ddl/fg/owner/
, to force owner changeSide effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.