Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core::test_rpc_subscriptions and core::test_rpc_slot_updates are flaky #16970

Closed
ruuda opened this issue Apr 30, 2021 · 5 comments
Closed

core::test_rpc_subscriptions and core::test_rpc_slot_updates are flaky #16970

ruuda opened this issue Apr 30, 2021 · 5 comments
Assignees
Milestone

Comments

@ruuda
Copy link
Contributor

ruuda commented Apr 30, 2021

Problem

The following two commands eventually fail. My estimate is that the tests fail less than 1 in 10 iterations.

(export RUST_LOG=INFO; set -e; while true; do cargo test --package solana-core --release --test rpc test_rpc_subscriptions; done)
(export RUST_LOG=INFO; set -e; while true; do cargo test --package solana-core --release --test rpc test_rpc_slot_updates; done)

It produces the following errors

---- test_rpc_subscriptions stdout ----
thread 'test_rpc_subscriptions' panicked at 'recv_timeout, 569/1000 signatures remaining', core/tests/rpc.rs:354:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

(also observed with different values than 569)

---- test_rpc_slot_updates stdout ----
thread 'test_rpc_slot_updates' panicked at 'assertion failed: `(left == right)`
  left: `"Frozen"`,
 right: `"Completed"`', core/tests/rpc.rs:206:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', core/tests/rpc.rs:171:55

Proposed Solution

I don’t know enough about what these tests do to propose a solution here.

@mvines mvines added this to the The Future! milestone May 10, 2021
@xiangzhu70 xiangzhu70 self-assigned this Jun 22, 2022
@xiangzhu70
Copy link
Contributor

Starting to work on it. The debug note doc is https://docs.google.com/document/d/17kJ122_wZbEA8SWqke6tUAyKTksAnYYGSubV-ytn2rw/edit?usp=sharing

@xiangzhu70
Copy link
Contributor

xiangzhu70 commented Jun 28, 2022

I think I know now the cause, but I am not sure about the fix yet.

This test in the main thread sets up a tokio async runtime thread, and runs the tokio tasks in it. The tasks include sending info (signatures, accounts etc) to some crossbeam channels, and the last task is to send a ready message to a ready crossbeam channel. The main thread waits for the ready message and then retrieves the info from the crossbeam channels.

The problem is that the tokio async tasks are not run in order, so the ready message is not really sent after all the signatures are sent. So the ready message is not the right way to signal the main thread the signatures are ready to be received. In most cases the signatures are still sent before the main thread retrieves them, but in rare cases there are pauses in the sending timing, and therefore the main thread times out. Hence the flakiness.

    for handle in task_handles {
       handle.await.unwrap();
   }

   ready_sender.send(()).unwrap();

Waiting for all previous tokio tasks to be done before sending the ready message should be the correct solution. Tested in an isolated environment and verified it works.

But somehow in this test, this “ready_sender.send(())” causes sig_notifications.next() to fail. So the signature tasks fail to send the signatures.

I could debug further why it fails. Another way to synchronize is to simply poll status_receiver.len() in the main thread and wait till all signatures are ready to be received.

@xiangzhu70
Copy link
Contributor

xiangzhu70 commented Aug 9, 2022

Increased the signature waiting timeout with #27008

@brooksprumo
Copy link
Contributor

This test failed for me in CI again 😢

#27008 (comment)

@yihau
Copy link
Member

yihau commented Aug 26, 2022

core::test_rpc_slot_updates #26593
core::test_rpc_subscriptions #27394

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants