-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coroutine leaking after one TiKV got stuck #10965
Comments
How do you find the problem? Can pending futures also be added to memory trace? Can yatp support memory trace? /cc @sticnarf |
Through the auto-generated jemalloc profile (start TiKV with jeprof.25133.25.i25.heap.svg.zip |
Does memory profile from dashboard work? Why it can't capture the problem before? |
Memory profile from dashboard or wget is not strong enough to help this case. I'm working on improving it now. |
The |
close #10965, ref #10965, ref #10976 Signed-off-by: ti-srebot <[email protected]> Signed-off-by: Yilin Chen <[email protected]> Co-authored-by: 5kbpers <[email protected]> Co-authored-by: Yilin Chen <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]>
Bug Report
Affected versions
v5.1, v5.2, master
Description
For Advancing the Resolved TS, we need to verify the leadership of Raft Leaders by calling a
check_leader
RPC every 1s to all the other TiKVs (Here we did not filter out learners so there will call the RPC to TiFlash).However, here we use
futures::join_all
to wait for all responses to get received. Then once one TiKV got stuck and the timeout is too long, some coroutines would be accumulated and probably lead to OOM.The text was updated successfully, but these errors were encountered: