-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
addindex/disttask: adjust add index task concurrency & add check when submit task #49403
Conversation
Hi @D3Hunter. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
pkg/ddl/backfilling_operators.go
Outdated
if err != nil { | ||
return 0, nil | ||
} | ||
return writerMemSize / uint64(concurrency) / 10, nil | ||
return writerMemSize / uint64(workerCntLimit) / 10, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ywqzzy should use taskConcurrency? as this worker-cnt might > cpu-count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
require.Equal(t, 1, task.Concurrency) | ||
taskMeta := ddl.BackfillTaskMeta{} | ||
require.NoError(t, json.Unmarshal(task.Meta, &taskMeta)) | ||
require.Equal(t, 111, taskMeta.WorkerCntLimit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why setting workerCntLimit > cpu count? If task concurrency != workerCnt, the reserved resource is less than real consuming resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 add-index worker might not consume 1 core, so GetDDLReorgWorkerCounter
can be larger than core-count, see https://docs.pingcap.com/tidb/stable/system-variables#tidb_ddl_reorg_worker_cnt.
if user do set tidb_ddl_reorg_worker_cnt > core-count, we reserve resource using core-count, and that's what we can reserve the most.
// ┌──────┐ │ ┌───────┐ ┌──┴───┐ | ||
// │failed│ │ ┌────────►│pausing├──────►│paused│ | ||
// └──────┘ │ │ └───────┘ └──────┘ | ||
// ▲ ▼ │ | ||
// ┌──┴────┐ ┌───┴───┐ ┌────────┐ | ||
// │pending├────►│running├────►│succeed │ | ||
// └──┬────┘ └───┬───┘ └────────┘ | ||
// ▼ │ ┌──────────┐ | ||
// ┌──────┐ ├────────►│cancelling│ | ||
// │failed│ │ └────┬─────┘ | ||
// └──────┘ │ ▼ | ||
// │ ┌─────────┐ ┌────────┐ | ||
// └────────►│reverting├────►│reverted│ | ||
// └────┬────┘ └────────┘ | ||
// │ ┌─────────────┐ | ||
// └─────────►│revert_failed│ | ||
// └─────────────┘ | ||
// └──┬────┘ └──┬┬───┘ └────────┘ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when will the task state transform from pending to failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
invalid task type or dispatcher init failed
pkg/ddl/backfilling_merge_sort.go
Outdated
@@ -41,21 +40,23 @@ type mergeSortExecutor struct { | |||
cloudStoreURI string | |||
mu sync.Mutex | |||
subtaskSortedKVMeta *external.SortedKVMeta | |||
workerCntLimit int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In local add index, the workerCnt can be adjusted dynamically.
Need to discuss whether we will support the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In local add index, the workerCnt can be adjusted dynamically. Need to discuss whether we will support the feature.
ok for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In local add index, the workerCnt can be adjusted dynamically.
is this an exposed feature or just previous code can support use newest workerCnt for later subtask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tangenta cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as it's an exposed behavior, i will revert the part that use saved worker-count
https://docs.pingcap.com/tidb/stable/ddl-introduction#balance-the-physical-ddl-execution-speed-and-the-impact-on-application-load-through-system-variables
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #49403 +/- ##
================================================
+ Coverage 71.0560% 71.7312% +0.6752%
================================================
Files 1368 1414 +46
Lines 401362 414964 +13602
================================================
+ Hits 285192 297659 +12467
- Misses 96344 98503 +2159
+ Partials 19826 18802 -1024
Flags with carried forward coverage won't be shown. Click here to find out more.
|
/retest |
@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold |
ccac642
to
b971889
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tangenta, ywqzzy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
What problem does this PR solve?
Issue Number: ref #49008
Problem Summary:
What changed and how does it work?
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.