-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ddl: make read and write async during adding index #39249
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
/hold WIP |
/run-check_dev_2 |
/unhold |
@@ -273,7 +273,7 @@ func (w *backfillWorker) handleBackfillTask(d *ddlCtx, task *reorgBackfillTask, | |||
rc.increaseRowCount(int64(taskCtx.addedCount)) | |||
rc.mergeWarnings(taskCtx.warnings, taskCtx.warningsCount) | |||
|
|||
if num := result.scanCount - lastLogCount; num >= 30000 { | |||
if num := result.scanCount - lastLogCount; num >= 90000 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 90000 row print log, it will be how much of interval of time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, throughput of adding index using ingest is about 200k rows per second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we set 200k rows print one log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe 200k is too large for the old implementation, the average throughput is 20k rows per second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can create a sliding window to static the throughput. so we can adapt the rows for machine performance.
Lucky, I have create a sliding windows at util/window at this pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any tests for changing the worker size while adding the index?
if scheduler.copReqSenderPool != nil { | ||
scheduler.copReqSenderPool.sendTask(task) | ||
} | ||
scheduler.taskCh <- task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the task be sent to two different channels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. This is because I want to keep the old backfill implementation and introduce as few changes as possible:
Old backfill:
scheduler.taskCh -> backfill worker -> read rows -> write indexes -> resultCh
New backfill in this PR:
scheduler.taskCh -> backfill worker -> write indexes -> resultCh
^
|
copReqSenderPool.taskCh -> cop-req sender -> read rows
If we only use one channel, then it will be
scheduler.taskCh -> backfill worker -> write indexes -> resultCh
| ^
v |
-> cop-req sender -> read rows
The backfill worker cannot process the next task until the current task is complete. However, this is not the behavior we want in the cop-request sender pool. To gain better performance, the sender pool should be always filled with tasks.
Besides, for one channel solution, it is not easy to set the concurrency of backfill worker and cop-request sender separately.
If you mean the test for the backfill worker, you can search |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: ea47941
|
TiDB MergeCI notify🔴 Bad News! [1] CI still failing after this pr merged.
|
What problem does this PR solve?
Issue Number: ref #35983
Problem Summary:
This is a further step from #39191.
What is changed and how it works?
This PR introduces the cop-request sender pool for coprocessor-read. Each sender lives in a separate goroutine, builds a coprocessor request for a task range, holds the index columns values from coprocessor response, converts them to index KVs, and sends the index KVs to a channel in batch.
The backfill workers keep reading index KVs from the channel, and write them to lightning's local engines.
Compared with the synchronous version(before this PR), it has the following advantages:
As a quick verification, here is the add index performance:
The commit before #39191:
After this PR:
By using this patch to record the time spent on each phase of adding index, we have the following log:
The commit before #39191:
After this PR:
backfill txn read
is reduced from 9.11 seconds to 0.10 seconds, it is almost eliminated.Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.