-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: inject random panic to AggExec #23139
Conversation
@@ -293,7 +294,7 @@ func (e *HashAggExec) initForParallelExec(ctx sessionctx.Context) { | |||
finalConcurrency := sessionVars.HashAggFinalConcurrency() | |||
partialConcurrency := sessionVars.HashAggPartialConcurrency() | |||
e.isChildReturnEmpty = true | |||
e.finalOutputCh = make(chan *AfFinalResult, finalConcurrency) | |||
e.finalOutputCh = make(chan *AfFinalResult, finalConcurrency+partialConcurrency+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only finalWorker will send data/error to the channel, but also fetchChildWorker/partialWorker will send errors to the channel.
If the buffer is not enough, the send will be blocked, and maybe lead dead lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the +1
for ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fetchChildData() also use the channel.
executor/aggregate.go
Outdated
waitGroup.Wait() | ||
waitGroup2.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there causes any error, HashAggExec will close finishCh, and all goroutine will try to exit. We can't guarantee fetchChildWorker and partialWorker will exit before finalWorker.
If all finalWorkers exit first and close finalOutputCh, recoveryHashAgg() in fetchChildWorker/partialWorker will try to send message to a closed chan if there are some panic..
In my random panic test, I find the data race.
executor/aggregate_test.go
Outdated
} | ||
|
||
fpName := "github.com/pingcap/tidb/executor/ConsumeRandomPanic" | ||
c.Assert(failpoint.Enable(fpName, `1%panic("ERROR 1105 (HY000): Out Of Memory Quota![conn_id=1]")`), IsNil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a random panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use 1%
to control the panic frequency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I change the panic possible to 5%
, to reduce the retry times.
/run-all-tests |
/run-all-tests |
/lgtm |
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by writing |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: f81a1d5
|
/run-tics-test |
@wshwsh12: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
/run-unit-test |
/run-all-tests |
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary:
In the project Tracking_Aggregate_Memory, there are many tracker.Consume, that will trigger panic when memory usage is too large. We should keep TiDB can cancel the sql successfully, and doesn't leak some resource.
What is changed and how it works?
Proposal: xxx
What's Changed:
How it Works:
Related changes
Check List
Tests
Side effects
Release note