Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle host ports with emulator tests #1464

Merged

Conversation

elizabethengelman
Copy link
Contributor

@elizabethengelman elizabethengelman commented Jul 16, 2024

Troubleshooting emulator tests

Closes #1461 🤞

What

We were still seeing emulator test failures.

Why/Issue Description

The speculos emulator dockerfile exposes a bunch of ports that we don't necessarily need for our tests. We relied on testcontainer's port publishing, which publishes all ports by default. I think that there were conflicts in which ports were being used for which tests. Since the tests are run in parallel, sometimes a test would try to access the API/UI endpoint on the GH runner host, and another test container was also using that port. 😵

This PR explicitly only publishes the two ports that we need, and makes sure to get two open ports per test.

Known limitations

We are on an older version of testcontainers, which we can hopefully update as part of #1471. I'm just pointing this out here, because our current version of testcontainers doesn't make it easy to publish just the ports with want, at dynamic ports on the host. Maybe we can clean this up when testcontainers is updated.

Shout out to @Ifropc for digging into this too!

@elizabethengelman elizabethengelman force-pushed the fix/troubleshooting-emulator-tests branch from 641111a to 831a2a9 Compare July 16, 2024 23:04
@elizabethengelman elizabethengelman force-pushed the fix/troubleshooting-emulator-tests branch 2 times, most recently from 61f56f4 to 49fb30b Compare July 30, 2024 18:35
also make sure to randomize host port so that tests don't clobber each
other when ran in parallel
@elizabethengelman elizabethengelman force-pushed the fix/troubleshooting-emulator-tests branch from 49fb30b to cba0bdc Compare July 30, 2024 18:44
@elizabethengelman elizabethengelman changed the title Add nocapture flag to tests Handle host ports with emulator tests Jul 30, 2024
@elizabethengelman elizabethengelman marked this pull request as ready for review July 30, 2024 19:11
@elizabethengelman elizabethengelman force-pushed the fix/troubleshooting-emulator-tests branch from 045e905 to a15e7c7 Compare July 30, 2024 19:13
@elizabethengelman elizabethengelman force-pushed the fix/troubleshooting-emulator-tests branch from a15e7c7 to 01b0096 Compare July 30, 2024 19:16
@Ifropc
Copy link
Contributor

Ifropc commented Jul 30, 2024

The instability of tests seems to still be there 😵‍💫😵‍💫

failures:

---- emulator_tests::test_sign_tx::when_the_device_is_nanox stdout ----
listener1: TcpListener { addr: 127.0.0.1:36175, fd: 84 }
Retry count: 0
Wait time: 1s
Error occurred while exchanging with Ledger device: Error connecting to ledger device
thread 'emulator_tests::test_sign_tx::when_the_device_is_nanox' panicked at cmd/crates/stellar-ledger/tests/test/emulator_tests.rs:185:13:
assertion failed: false
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- emulator_tests::test_sign_tx::when_the_device_is_nanos_plus stdout ----
listener1: TcpListener { addr: 127.0.0.1:36975, fd: 63 }
Retry count: 0
Wait time: 1s
Retry count: 1
Wait time: 2s
Error occurred while exchanging with Ledger device: Error connecting to ledger device
thread 'emulator_tests::test_sign_tx::when_the_device_is_nanos_plus' panicked at cmd/crates/stellar-ledger/tests/test/emulator_tests.rs:185:13:
assertion failed: false

---- emulator_tests::test_sign_tx_hash_when_hash_signing_is_enabled::when_the_device_is_nanos stdout ----
listener1: TcpListener { addr: 127.0.0.1:42571, fd: 93 }
Retry count: 0
Wait time: 1s
thread 'emulator_tests::test_sign_tx_hash_when_hash_signing_is_enabled::when_the_device_is_nanos' panicked at cmd/crates/stellar-ledger/tests/test/emulator_tests.rs:266:13:
Unexpected result: Error occurred while exchanging with Ledger device: Error connecting to ledger device

---- emulator_tests::test_sign_tx::when_the_device_is_nanos stdout ----
listener1: TcpListener { addr: 127.0.0.1:33041, fd: 30 }
Retry count: 0
Wait time: 1s
Retry count: 1
Wait time: 2s
Error occurred while exchanging with Ledger device: Error connecting to ledger device
thread 'emulator_tests::test_sign_tx::when_the_device_is_nanos' panicked at cmd/crates/stellar-ledger/tests/test/emulator_tests.rs:185:13:
assertion failed: false


failures:
    emulator_tests::test_sign_tx::when_the_device_is_nanos
    emulator_tests::test_sign_tx::when_the_device_is_nanos_plus
    emulator_tests::test_sign_tx::when_the_device_is_nanox
    emulator_tests::test_sign_tx_hash_when_hash_signing_is_enabled::when_the_device_is_nanos

test result: FAILED. 11 passed; 4 failed; 0 ignored; 0 measured; 0 filtered out; finished in 77.78s

Good thing is that it's only Error connecting to ledger device error now, no port issues anymore 🎉

@elizabethengelman
Copy link
Contributor Author

Bummer! Thanks for checking this out - I am struggling to repro this locally, even when running the test a bunch of times in short order. 🤔

Looking at it again this morning - it looks like we are waiting for the UI port to be running, but are not running for the emulated device itself. I will push an update for that shortly.

@Ifropc
Copy link
Contributor

Ifropc commented Jul 31, 2024

Still get the failures on the latest HEAD...

failures:

---- emulator_tests::test_sign_tx::when_the_device_is_nanos stdout ----
Error occurred while exchanging with Ledger device: Error connecting to ledger device
thread 'emulator_tests::test_sign_tx::when_the_device_is_nanos' panicked at cmd/crates/stellar-ledger/tests/test/emulator_tests.rs:190:13:
assertion failed: false
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    emulator_tests::test_sign_tx::when_the_device_is_nanos

test result: FAILED. 14 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 41.10s

error: test failed, to rerun pass `--test test`
ifro@nixos:~/projects/stellar-cli/ > git log
commit 0b8850348d168ffac72a44b0d46f5e4b18003413 (HEAD, AhaLabs/fix/troubleshooting-emulator-tests)
Author: Elizabeth Engelman <[email protected]>
Date:   Wed Jul 31 12:29:21 2024 -0400

    Make sure that ports are not conflicting between test threads

Seems to be passing on CI though 😅
Do you think you could add a dumb of container logs just before the failure? Maybe it will give us some ideas.
Worst case we can maybe we restart the test all together if getting Error connecting to ledger device 🤔

@Ifropc
Copy link
Contributor

Ifropc commented Jul 31, 2024

Ran the test 30 times locally, the only flaky test is emulator_tests::test_sign_tx::when_the_device_is_nanos and it failed 28/30 times. Not sure what makes this test special 🤔

@elizabethengelman
Copy link
Contributor Author

elizabethengelman commented Jul 31, 2024

Ran the test 30 times locally, the only flaky test is emulator_tests::test_sign_tx::when_the_device_is_nanos and it failed 28/30 times. Not sure what makes this test special 🤔

I don't know why I didn't see these failures before, but i am seeing it now!

I think what's happening now is that even though I was checking if the port is in use, I was checking at 127.0.0.1, but it looks like docker is publishing ports at all network interfaces, not just 127.0.0.1. I'll push up a change shortly to check if the port is open on all interfaces (0.0.0.0).

Oh yeah, and why that test is so special... I wonder if it is just the order in which the tests are run, that one tends to be the first one to reuse a port that was already in use? Not 100% sure.

@elizabethengelman elizabethengelman force-pushed the fix/troubleshooting-emulator-tests branch from ebfa99b to c1f8951 Compare July 31, 2024 19:51
@Ifropc
Copy link
Contributor

Ifropc commented Jul 31, 2024

I was actually trying to run just this test before (in isolation), and it was still flaky (I was also thinking that maybe there was some environment issue). It was before the changes introduced in this PR though

@elizabethengelman elizabethengelman force-pushed the fix/troubleshooting-emulator-tests branch from a5bcc72 to 3749430 Compare August 1, 2024 14:28
@elizabethengelman
Copy link
Contributor Author

I was able to get this to pass a few times in CI on my fork, with the workflow running on a couple of branches at the same time. Are we comfortable merging this in, and then I can reenable and test it out for real? @Ifropc @leighmcculloch

If we start seeing issues again, we can disable again, and reevaluate.

@leighmcculloch leighmcculloch merged commit aba5bdd into stellar:main Aug 1, 2024
22 of 24 checks passed
@elizabethengelman elizabethengelman deleted the fix/troubleshooting-emulator-tests branch August 1, 2024 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Emulator tests still flakey
3 participants