Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests flakiness #1771

Open
sangaman opened this issue Jul 30, 2020 · 17 comments
Open

Tests flakiness #1771

sangaman opened this issue Jul 30, 2020 · 17 comments
Assignees
Labels

Comments

@sangaman
Copy link
Collaborator

This issue is for tracking intermittent test failures that cause our tests to be flaky and require manual restarts even when there is no underlying issue in the xud code itself. Since there seem to be quite a few, I figured it would be easier to track them all in one issue. These primarily effect simulation tests but also occasionally our mocha/jest tests.

@sangaman
Copy link
Collaborator Author

Failure at the start of running simulation tests, a new one I haven't seen before.

 === RUN   TestIntegration
2020/07/30 13:23:37 Connext node reachability failure: invalid response at http://172.17.0.1:8080/config: [123 34 115 116 97 116 117 115 67 111 100 101 34 58 52 48 52 44 34 109 101 115 115 97 103 101 34 58 34 67 97 110 110 111 116 32 71 69 84 32 47 99 111 110 102 105 103 34 44 34 101 114 114 111 114 34 58 34 78 111 116 32 70 111 117 110 100 34 125]
exit status 1
FAIL	github.com/ExchangeUnion/xud-simulation	0.083s
~/work/xud/xud/test/simulation/temp/indra ~/work/xud/xud/test/simulation

@kilrau kilrau added the P2 mid priority label Jul 30, 2020
@sangaman
Copy link
Collaborator Author

sangaman commented Jul 31, 2020

EDIT: Should be fixed by #1790.

This happens occasionally with the Mocha tests, I've seen it several times both locally and on GitHub Actions.

   1) P2P Pool Tests
       should update a node on new handshake:

      AssertionError: expected '123.123.123.123' to equal '86.75.30.9'
      + expected - actual

      -123.123.123.123
      +86.75.30.9
      
      at /home/runner/work/xud/xud/test/integration/Pool.spec.ts:103:49
      at Generator.next (<anonymous>)
      at fulfilled (test/integration/Pool.spec.ts:24:58)

  2) P2P Pool Tests
       reconnect logic
         "before each" hook for "should not reconnect upon shutdown inbound":
     node 022a49046f252821dfcb3a3f7c8e203af97fb2b8223d8b9204380ccdce6910755d at 321.321.321.321:1337 already connected

@sangaman
Copy link
Collaborator Author

sangaman commented Aug 3, 2020

EDIT: This should be fixed by #1791.

This one I've seen for a while, it happens intermittently and more often locally. I've tried looking into this a bit before but was stumped.

  1) Parser
       test TCP segmentation/concatenation support
         should parse encrypted Ping SessionInit Ping concatenated and split on byte 15 from each packet beginning:

      AssertionError: expected '"{\\"body\\":{\\"ephemeralPubKey\\":\\"73b236f0-aa26-4391-8b44-57df28796f0b\\",\\"nodePubKey\\":\\"354bf50b-d846-443f-bc7d-8bd708fe74f2\\",\\"nodeState\\":{\\"addresses\\":[{\\"host\\":\\"1.1.1.1\\",\\"port\\":8885},{\\"host\\":\\"2.2.2.2\\",\\"port\\":8885}],\\"connextIdentifier\\":\\"95734182-9ed5-46e5-b993-22d280f210ec\\",\\"lndPubKeys\\":{\\"BTC\\":\\"775106c5-79e5-48bd-984b-52d53a3b604d\\",\\"LTC\\":\\"197262b9-bd75-4082-8e85-46ad3b7ecc5f\\"},\\"lndUris\\":{\\"BTC\\":[\\"\\"],\\"LTC\\":[\\"\\"]},\\"pairs\\":[\\"78f62c27-c4e1-41eb-8ceb-5400c1fa932c\\"],\\"raidenAddress\\":\\"fe024285-1e1d-44ec-bb7c-e5ccf4df0389\\",\\"tokenIdentifiers\\":{\\"BTC\\":\\"bitcoin-testnet\\",\\"LTC\\":\\"litecoin-testnet\\"}},\\"peerPubKey\\":\\"25ee301d-a407-4619-b68c-e8dc2b63f340\\",\\"sign\\":\\"f6eeed07-00d0-43ad-b6a5-d48d686718c4\\",\\"version\\":\\"1.0.0\\"},\\"header\\":{\\"id\\":\\"5efe7f02-d548-11ea-befd-ef7b6a3706c2\\"}}"' to equal '"{\\"header\\":{\\"id\\":\\"5efe7f01-d548-11ea-befd-ef7b6a3706c2\\"}}"'
      + expected - actual

      -"{\"body\":{\"ephemeralPubKey\":\"73b236f0-aa26-4391-8b44-57df28796f0b\",\"nodePubKey\":\"354bf50b-d846-443f-bc7d-8bd708fe74f2\",\"nodeState\":{\"addresses\":[{\"host\":\"1.1.1.1\",\"port\":8885},{\"host\":\"2.2.2.2\",\"port\":8885}],\"connextIdentifier\":\"95734182-9ed5-46e5-b993-22d280f210ec\",\"lndPubKeys\":{\"BTC\":\"775106c5-79e5-48bd-984b-52d53a3b604d\",\"LTC\":\"197262b9-bd75-4082-8e85-46ad3b7ecc5f\"},\"lndUris\":{\"BTC\":[\"\"],\"LTC\":[\"\"]},\"pairs\":[\"78f62c27-c4e1-41eb-8ceb-5400c1fa932c\"],\"raidenAddress\":\"fe024285-1e1d-44ec-bb7c-e5ccf4df0389\",\"tokenIdentifiers\":{\"BTC\":\"bitcoin-testnet\",\"LTC\":\"litecoin-testnet\"}},\"peerPubKey\":\"25ee301d-a407-4619-b68c-e8dc2b63f340\",\"sign\":\"f6eeed07-00d0-43ad-b6a5-d48d686718c4\",\"version\":\"1.0.0\"},\"header\":{\"id\":\"5efe7f02-d548-11ea-befd-ef7b6a3706c2\"}}"
      +"{\"header\":{\"id\":\"5efe7f01-d548-11ea-befd-ef7b6a3706c2\"}}"
      
      at /home/daniel/Documents/GitHub/xud/test/unit/Parser.spec.ts:52:46

@sangaman
Copy link
Collaborator Author

sangaman commented Aug 4, 2020

Another simulation test error in GitHub actions that went away on retry:

 === RUN   TestSecurityUnsettledChannels
2020/08/03 11:52:51 xud: creating network
2020/08/03 11:52:51 ltcd: launching node...
2020/08/03 11:52:52 ltcd: 200 blocks generated
2020/08/03 11:52:52 lnd-ltc: launching network...
2020/08/03 11:53:05 btcd: launching node...
2020/08/03 11:53:05 btcd: 200 blocks generated
2020/08/03 11:53:05 lnd-btc: launching network...
2020/08/03 11:53:16 connext: launching network...
2020/08/03 11:53:20 xud: launching network...
2020/08/03 11:53:42 cannot start xud network: timeout waiting for xud to be ready
exit status 1
FAIL	github.com/ExchangeUnion/xud-simulation	248.021s
~/work/xud/xud/test/simulation/temp/indra ~/work/xud/xud/test/simulation

Note that this same error happens not just on the security simulation tests, but also the integration and instability simulation tests as well.

@sangaman
Copy link
Collaborator Author

sangaman commented Aug 4, 2020

EDIT: This was a recently introduced edge case bug and is fixed by #1781.

This jest error I believe I've seen only locally:

 FAIL  test/jest/Pool.spec.ts
  ● P2P Pool › it connects exactly once if two peers attempt connections to each other simultaneously

    read ECONNRESET

@kilrau
Copy link
Contributor

kilrau commented Aug 4, 2020

Two different failures observed in https://github.com/ExchangeUnion/xud/pull/1769/checks?check_run_id=927870407 :

ONE

 docker network rm indra indra_cf_tester indra_node_tester indra_test_store 2> /dev/null || true
indra
docker secret rm indra_database_dev 2> /dev/null || true
indra_database_dev
docker volume rm indra_chain_1337 indra_chain_1338 indra_database_dev  2> /dev/null || true
indra_database_dev
docker volume rm `docker volume ls -q -f name=indra_database_test_*` 2> /dev/null || true
rm -rf .chaindata/*
rm -rf .flags/deployed-contracts
~/work/xud/xud/test/simulation
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] test:sim:run:integration: `(cd test/simulation && ./docker-run.sh TestIntegration)`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the [email protected] test:sim:run:integration script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/runner/.npm/_logs/2020-07-30T14_17_09_849Z-debug.log
##[error]Process completed with exit code 1.

TWO

The indra stack has been deployed, waiting for the proxy to start responding..
Timed out waiting for proxy to respond..

@sangaman
Copy link
Collaborator Author

sangaman commented Aug 4, 2020

    --- FAIL: TestIntegration/p2p_already_connected (0.00s)
        actions.go:132: 
            	Error Trace:	actions.go:132
            	            				tests-integration.go:160
            	            				harnessTest.go:95
            	            				xud_test.go:86
            	Error:      	Received unexpected error:
            	            	rpc error: code = AlreadyExists desc = node 037c5ed36a34091b37cf2603e7f59bb1ebc670d328ec292d08ecda8dfb0d4ecc53 at 127.0.0.1:40821 already connected
            	Test:       	TestIntegration/p2p_already_connected

@sangaman
Copy link
Collaborator Author

sangaman commented Aug 5, 2020

EDIT: Should be fixed by #1790

Mocha tests:

P2P Pool Tests
    ✓ should open a connection with a peer
    ✓ should close a peer
  actual: {
    ✓ should throw error when connecting to tor node with tor disabled
    message: 'node 0390015423b2466f1d07b09cd0023241a3190de3ff928028e5df50fa3687ed53d4 at 321.321.321.321:1337 already connected',
    1) should update a node on new handshake
    code: '2.1',
    reconnect logic
    stack: undefined
2020-07-31 02:37:41.4141 [P2P] warn: could not open connection to outbound peer (0390015423b2466f1d07b09cd0023241a3190de3ff928028e5df50fa3687ed53d4 (undefined)): node 0390015423b2466f1d07b09cd0023241a3190de3ff928028e5df50fa3687ed53d4 at 321.321.321.321:1337 already connected
      2) "before each" hook for "should not reconnect upon shutdown inbound"
  },

  expected: undefined
  API Service
}

@sangaman
Copy link
Collaborator Author

sangaman commented Aug 5, 2020

 --- FAIL: TestSecurityUnsettledChannels (610.89s)
    --- PASS: TestSecurityUnsettledChannels/network_initialization (0.75s)
    --- PASS: TestSecurityUnsettledChannels/maker_shutdown_after_1st_htlc (165.99s)
    --- FAIL: TestSecurityUnsettledChannels/taker_stalling_after_2nd_htlc (395.75s)
        tests-security.go:72: 
            	Error Trace:	tests-security.go:72
            	            				tests-security.go:495
            	            				harnessTest.go:95
            	            				xud_test.go:270
            	Error:      	Not equal: 
            	            	expected: 1
            	            	actual  : 0
            	Test:       	TestSecurityUnsettledChannels/taker_stalling_after_2nd_htlc
FAIL
exit status 1

@kilrau
Copy link
Contributor

kilrau commented Aug 13, 2020

Improved quite a bunch, moving to the next project.

@sangaman
Copy link
Collaborator Author

I saw the parser error again, but it seems to be far less common. I'll go back and check and at some point if there's anything I missed from the first time I addressed it.

1) Parser
     test TCP segmentation/concatenation support
       should parse encrypted Ping SessionInit Ping concatenated and split on byte 16 from each packet beginning:

    AssertionError: expected '"{\\"body\\":{\\"ephemeralPubKey\\":\\"3e798f0f-728a-4fdd-b434-fd5199a7de87\\",\\"nodePubKey\\":\\"e757d34c-27cc-4a13-8298-c830f4712d81\\",\\"nodeState\\":{\\"addresses\\":[{\\"host\\":\\"1.1.1.1\\",\\"port\\":8885},{\\"host\\":\\"2.2.2.2\\",\\"port\\":8885}],\\"connextIdentifier\\":\\"d1f1ca10-2c54-4224-8bf8-1f14f930a65e\\",\\"lndPubKeys\\":{\\"BTC\\":\\"73ca86e7-b234-476a-a496-d67871ef7ebd\\",\\"LTC\\":\\"edc6c883-35dc-4f6b-b1d2-2a6aa36dfe47\\"},\\"lndUris\\":{\\"BTC\\":[\\"\\"],\\"LTC\\":[\\"\\"]},\\"pairs\\":[\\"898c3cdb-b43a-404f-90c4-233b642e76d7\\"],\\"tokenIdentifiers\\":{\\"BTC\\":\\"bitcoin-testnet\\",\\"LTC\\":\\"litecoin-testnet\\"}},\\"peerPubKey\\":\\"638f05fb-317b-4b9d-8673-fab1317535f1\\",\\"sign\\":\\"e1660403-9324-4f09-b0c9-9106376f1d99\\",\\"version\\":\\"1.0.0\\"},\\"header\\":{\\"id\\":\\"ceb98102-e231-11ea-9cbe-0de5bfbc72ec\\"}}"' to equal '"{\\"header\\":{\\"id\\":\\"ceb98101-e231-11ea-9cbe-0de5bfbc72ec\\"}}"'
    + expected - actual

    -"{\"body\":{\"ephemeralPubKey\":\"3e798f0f-728a-4fdd-b434-fd5199a7de87\",\"nodePubKey\":\"e757d34c-27cc-4a13-8298-c830f4712d81\",\"nodeState\":{\"addresses\":[{\"host\":\"1.1.1.1\",\"port\":8885},{\"host\":\"2.2.2.2\",\"port\":8885}],\"connextIdentifier\":\"d1f1ca10-2c54-4224-8bf8-1f14f930a65e\",\"lndPubKeys\":{\"BTC\":\"73ca86e7-b234-476a-a496-d67871ef7ebd\",\"LTC\":\"edc6c883-35dc-4f6b-b1d2-2a6aa36dfe47\"},\"lndUris\":{\"BTC\":[\"\"],\"LTC\":[\"\"]},\"pairs\":[\"898c3cdb-b43a-404f-90c4-233b642e76d7\"],\"tokenIdentifiers\":{\"BTC\":\"bitcoin-testnet\",\"LTC\":\"litecoin-testnet\"}},\"peerPubKey\":\"638f05fb-317b-4b9d-8673-fab1317535f1\",\"sign\":\"e1660403-9324-4f09-b0c9-9106376f1d99\",\"version\":\"1.0.0\"},\"header\":{\"id\":\"ceb98102-e231-11ea-9cbe-0de5bfbc72ec\"}}"
    +"{\"header\":{\"id\":\"ceb98101-e231-11ea-9cbe-0de5bfbc72ec\"}}"

@sangaman
Copy link
Collaborator Author

Still seeing this one, I'll have to revisit it:

  1) Parser
       test TCP segmentation/concatenation support
         should parse encrypted Ping SessionInit Ping concatenated and split on byte 17 from each packet beginning:

      AssertionError: expected '"{\\"body\\":{\\"ephemeralPubKey\\":\\"a2d3794f-428f-4211-90e6-28f5131d714e\\",\\"nodePubKey\\":\\"69e9716a-38b4-4558-8f50-312540e4bbfa\\",\\"nodeState\\":{\\"addresses\\":[{\\"host\\":\\"1.1.1.1\\",\\"port\\":8885},{\\"host\\":\\"2.2.2.2\\",\\"port\\":8885}],\\"connextIdentifier\\":\\"3992aa06-aae6-4771-9f0c-20963f17a30b\\",\\"lndPubKeys\\":{\\"BTC\\":\\"2748ce07-9d2f-4e00-97ed-9758257251bb\\",\\"LTC\\":\\"30114f06-2e60-4c58-a46d-52a999af3bf0\\"},\\"lndUris\\":{\\"BTC\\":[\\"\\"],\\"LTC\\":[\\"\\"]},\\"pairs\\":[\\"d989343d-5272-421a-b07b-4bca07fe923c\\"],\\"tokenIdentifiers\\":{\\"BTC\\":\\"bitcoin-testnet\\",\\"LTC\\":\\"litecoin-testnet\\"}},\\"peerPubKey\\":\\"671d3ba0-7112-4d39-af78-9ae7adda121f\\",\\"sign\\":\\"5d866ac5-f840-4753-a255-a45e27f3cc08\\",\\"version\\":\\"1.0.0\\"},\\"header\\":{\\"id\\":\\"c1cb3ed9-e631-11ea-8329-1d1f40df5a05\\"}}"' to equal '"{\\"header\\":{\\"id\\":\\"c1cb3ed8-e631-11ea-8329-1d1f40df5a05\\"}}"'
      + expected - actual

      -"{\"body\":{\"ephemeralPubKey\":\"a2d3794f-428f-4211-90e6-28f5131d714e\",\"nodePubKey\":\"69e9716a-38b4-4558-8f50-312540e4bbfa\",\"nodeState\":{\"addresses\":[{\"host\":\"1.1.1.1\",\"port\":8885},{\"host\":\"2.2.2.2\",\"port\":8885}],\"connextIdentifier\":\"3992aa06-aae6-4771-9f0c-20963f17a30b\",\"lndPubKeys\":{\"BTC\":\"2748ce07-9d2f-4e00-97ed-9758257251bb\",\"LTC\":\"30114f06-2e60-4c58-a46d-52a999af3bf0\"},\"lndUris\":{\"BTC\":[\"\"],\"LTC\":[\"\"]},\"pairs\":[\"d989343d-5272-421a-b07b-4bca07fe923c\"],\"tokenIdentifiers\":{\"BTC\":\"bitcoin-testnet\",\"LTC\":\"litecoin-testnet\"}},\"peerPubKey\":\"671d3ba0-7112-4d39-af78-9ae7adda121f\",\"sign\":\"5d866ac5-f840-4753-a255-a45e27f3cc08\",\"version\":\"1.0.0\"},\"header\":{\"id\":\"c1cb3ed9-e631-11ea-8329-1d1f40df5a05\"}}"
      +"{\"header\":{\"id\":\"c1cb3ed8-e631-11ea-8329-1d1f40df5a05\"}}"

@sangaman
Copy link
Collaborator Author

Ran into this twice while trying to run npm version (which runs through tests).

--- PASS: TestSecurity (168.59s)
    --- PASS: TestSecurity/network_initialization (0.52s)
    --- PASS: TestSecurity/taker_stalling_on_swapAccepted (15.62s)
    --- PASS: TestSecurity/maker_stalling_after_1st_htlc (100.68s)
    --- PASS: TestSecurity/taker_stalling_after_swap_succeeded (6.53s)
=== RUN   TestSecurityUnsettledChannels
2020/08/24 18:15:45 xud: creating network
2020/08/24 18:15:45 ltcd: launching node...
2020/08/24 18:15:45 ltcd: 200 blocks generated
2020/08/24 18:15:45 lnd-ltc: launching network...
2020/08/24 18:15:54 btcd: launching node...
2020/08/24 18:15:54 btcd: 200 blocks generated
2020/08/24 18:15:54 lnd-btc: launching network...
2020/08/24 18:16:01 connext: launching network...
2020/08/24 18:16:03 xud: launching network...
2020/08/24 18:16:08 Running 4 unsettled-channels security tests
=== RUN   TestSecurityUnsettledChannels/network_initialization
=== RUN   TestSecurityUnsettledChannels/maker_shutdown_after_1st_htlc
=== RUN   TestSecurityUnsettledChannels/taker_stalling_after_2nd_htlc
=== RUN   TestSecurityUnsettledChannels/taker_shutdown_after_2nd_htlc
2020/08/24 18:23:47 xud process (33-Alice) did not shutdown gracefully. process (13584) killed

Not sure if this is something you might have an idea about @LePremierHomme, why the taker_shutdown_after_2nd_htlc test doesn't shutdown gracefully.

@sangaman
Copy link
Collaborator Author

sangaman commented Aug 26, 2020

I'm seeing this cause failures at various points in the simulation tests, it may be related to the 2048 bit TLS cert change in #1692 although I've also seen it happen without that commit.

Failed to read TLS certificate: credentials: failed to append certificates, remaining tries: 4
Failed to read TLS certificate: credentials: failed to append certificates, remaining tries: 3
Failed to read TLS certificate: credentials: failed to append certificates, remaining tries: 2
Failed to read TLS certificate: credentials: failed to append certificates, remaining tries: 1
Failed to read TLS certificate: credentials: failed to append certificates, remaining tries: 0

Edit: This appears to be fixed by the reversion in #1841.

@sangaman
Copy link
Collaborator Author

sangaman commented Nov 23, 2020

Here's a new one I've seen a couple of times only recently, both locally and in GitHub Actions:

  1) P2P Pool Tests
       "after all" hook for "should update a node on new handshake":
     Error: Timeout of 10000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/home/daniel/Documents/GitHub/xud/test/integration/Pool.spec.ts)
      at listOnTimeout (internal/timers.js:549:17)
      at processTimers (internal/timers.js:492:7)

Edit: Still seeing this after the fix to the stall timer.

@sangaman
Copy link
Collaborator Author

sangaman commented Dec 4, 2020

Looks like there may be some jest tests that don't always terminate properly, at least in GitHub Actions. #1938 (comment)

@ghost
Copy link

ghost commented Dec 22, 2020

This error is becoming increasingly annoying:

   1) P2P Pool Tests
       "after all" hook for "should update a node on new handshake":
     Error: Timeout of 10000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/home/runner/work/xud/xud/test/integration/Pool.spec.ts)
      at listOnTimeout (internal/timers.js:554:17)
      at processTimers (internal/timers.js:497:7)
  • does not reproduce often when using a "fast" CPU
  • does reproduce often in CI (GitHub actions)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants