-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase timeout to reduce the flakyness of rpc signature receving test #27008
Conversation
I understand this PR does not fix the underlying problem with the test, but does this PR meaningfully reduce the frequency this test fails? I'd guess my CI probably fails due to this test 1 in 10 times. If this PR improves that ratio, then I'm onboard for approving this change. |
Optionally, could the test be fixed by adding in an "ACK"? After Thread 1 says "READY", can it wait until Thread 2 responds with an "ACK" before starting the timeout countdown? |
Yes, this definitely reduces the flakiness down to almost 0 error rate. I cannot reproduce this failure any more. |
I thought about adding an intermediate sync step like you suggested. But then that "wait until Thread 2 responds with an ACK" is another wait for which a timeout need to be set up and handled. It is pretty much the same as waiting in the original wait with an increased timeout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving for the sake of reducing CI failures. If this test starts failing frequently again, I do not think we should indefinitely increase the timeout, but instead fix outright.
Heh, just hit this again... A proper fix is likely required. https://buildkite.com/solana-labs/solana/builds/79632#01829260-52a4-4e55-bb0f-cd67d72b439b |
I've been running into this as well on |
…st (backport #27008) (#27156) Increase timeout to reduce the flakyness of rpc signature receving test (#27008) * Increase timeout to reduce the flakyness of rpc signature receving test * Minor fmt fix (cherry picked from commit 52d8a20) Co-authored-by: Xiang Zhu <[email protected]>
…st (solana-labs#27008) * Increase timeout to reduce the flakyness of rpc signature receving test * Minor fmt fix
I believe this is the next wait in the same test (not the initial issue mitigated by Xiang). Need to increase that timeout as well (CC #27214) |
Problem
The test sends the signatures and then a ready message all in tokio async tasks. The problem is that tokio async tasks are not run in order, so receiving the ready message does not guarantee that the signature sending is done.
This test is too flaky at 15-second timeout. Debugging show occasional multi-second delay which could come from multiple sources -- other tokio tasks, tokio scheduler, OS scheduler. The async nature makes it hard to track down the origin of the delay.
Summary of Changes
Without fundamentally changing the test design, this change only increases the timeout to reduce the flakiness of the test.
Fixes # 16970
thread 'test_rpc_subscriptions' panicked at 'recv_timeout, 569/1000 signatures remaining', core/tests/rpc.rs:354:17