Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make test errors on fresh checkout #3148

Open
bnewbold opened this issue Nov 30, 2024 Discussed in #3040 · 10 comments
Open

Make test errors on fresh checkout #3148

bnewbold opened this issue Nov 30, 2024 Discussed in #3040 · 10 comments
Labels
bug Something isn't working

Comments

@bnewbold
Copy link
Collaborator

Discussed in #3040

Originally posted by zacuke November 18, 2024
Hello, this test below fails every time I run make test. I've been trying quite a few things:

  • reducing the 3000 parameter on line 1209,
  • increasing the ulimit -s value,
  • setting v8.setFlagsFromString('--stack-size=#####');

But nothing seems to help. If I lower the 3000 value too much, then the string seems to be too short, and it properly makes a round trip without 400 error and the test fails for a different reason: it expects the server to reject the too-long string.

packages/pds test: FAIL PDS tests/crud.test.ts (43.036 s)
packages/pds test:   ● crud operations › compare-and-swap › writes fail on values that can't reliably transform between cbor to lex
packages/pds test:     RangeError: Maximum call stack size exceeded
packages/pds test:         at JSON.stringify (<anonymous>)
packages/pds test:       1196 |     it("writes fail on values that can't reliably transform between cbor to lex", async () => {
packages/pds test:       1197 |       const passthroughBody = (data: unknown) =>
packages/pds test:     > 1198 |         ui8ToArrayBuffer(new TextEncoder().encode(JSON.stringify(data)))
packages/pds test:            |                                                        ^
packages/pds test:       1199 |
packages/pds test:       1200 |       const result = aliceAgent.call(
packages/pds test:       1201 |         'com.atproto.repo.createRecord',
packages/pds test:       at stringify (tests/crud.test.ts:1198:56)
packages/pds test:       at Object.passthroughBody (tests/crud.test.ts:1203:9)

Line 1209 with the seemingly arbitrary 3000 parameter value was recently changed and the commit message itself seems to indicate node v20 support, but I get better results using v18 (although this Maximum call stack error happens either way)

deepObject: createDeepObject(3000),

So, what is everyone doing to get a succesful make test run?

@bnewbold bnewbold added the bug Something isn't working label Nov 30, 2024
@zacuke
Copy link

zacuke commented Dec 1, 2024

Note

Starting with fresh installs of Ubuntu Server 22.04 and 24.04

Get latest of the code

  • sudo apt install jq make curl
  • make nvm-setup doesn't work (says nvm not found) so run commands individually
  • nvm install 18
  • nvm use 18
  • corepack enable
  • make deps
  • make build

Now ready to run make test

  • Occasionally

    • packages/crypto/tests/random.test.ts line 13 will fail because the value is slightly greater than 1.05
  • Fairly often fails

    • error: "Exceeded timeout of 60000 ms for a hook. Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."
  • Reliably fails if more than 4 vcpu cores

    • error: listen EADDRINUSE: address already in use
    • error: sorry, too many clients already
  • Additionally

    • if test was interrupted, postgresql may already contain data that causes errors like: email [email protected] already taken
      • running again seems to fix
    • not sure if matters but this shows a lot: Possible EventEmitter memory leak detected. 11 error listeners added to [Socket]. Use emitter.setMaxListeners() to increase limit
  • The good news

    • make test completes on Ubuntu Server with 4 vcpu cores
    • testTimeout: 120000 in jest.config.js helps
    • 8 GB of memory helps

Note

Starting with fresh installs of Ubuntu Desktop 22.04 and 24.04

  • Install docker using instructions https://docs.docker.com/engine/install/ubuntu/

    • sudo addgroup docker
    • sudo usermod -aG docker $USER
  • Ubuntu Desktop 24.04 additional config

  • Get latest of the code

    • sudo apt install jq make curl
    • Install nvm
    • make nvm-setup doesn't work (says nvm not found) so run commands individually
    • nvm install 18
    • nvm use 18
    • corepack enable
    • make deps
    • make build
  • Even with 4 vcpu cores sometimes get this error

    • error: listen EADDRINUSE: address already in use
    • this seems to be a known issue with get-port dependency
    • also intermittently fails with Actor not found
  • In summary:

    • Fresh install resolved Maximum call stack size exceeded and desktop crashing
      • Playing with Line 1209 in tests/crud.test.ts brings the error back, so I suspect it won't be the last we see of it
      • There must be something in existing setups that has reduced call stack size, perhaps upgraded OS?
    • 4 vcpu helped the most
    • testTimeout: 120000 was needed too
    • get-port dependency will cause intermittent EADDRINUSE: address already in use

@OracPrime
Copy link

That has got me further, thanks. It now fails on the xrpc server tests, I suspect because I am running the tests inside a VNC session, which is bumping the instance count by one...

packages/xrpc-server test: FAIL XRPC Server tests/stream.test.ts
packages/xrpc-server test:   ● Stream › kills handler and closes client disconnect.
packages/xrpc-server test:     expect(received).toBe(expected) // Object.is equality
packages/xrpc-server test:     Expected: 5
packages/xrpc-server test:     Received: 6
packages/xrpc-server test:       116 |     const currentCount = i
packages/xrpc-server test:       117 |     await wait(5)
packages/xrpc-server test:     > 118 |     expect(i).toBe(currentCount)
packages/xrpc-server test:           |               ^
packages/xrpc-server test:       119 |
packages/xrpc-server test:       120 |     httpServer.close()
packages/xrpc-server test:       121 |   })
packages/xrpc-server test:       at Object.toBe (tests/stream.test.ts:118:15)

However if I comment that test out it then starts failing on EADDRINUSE in the packages/bsky test
And in packages/pds it fails with "sorry, too many clients already" when trying to access the database

packages/pds test:  › 3 snapshots failed from 1 test suite. Inspect your code changes or run `npm test -- -u` to update them.
packages/pds test: Test Suites: 4 failed, 33 passed, 37 total
packages/pds test: Tests:       25 failed, 2 skipped, 315 passed, 342 total
packages/pds test: Snapshots:   3 failed, 24 passed, 27 total
packages/pds test: Time:        100.098 s
packages/pds test: Ran all test suites.

@OracPrime
Copy link

The EADDRINUSE seems to be intermittent, but the too many clients problem remains.

@OracPrime
Copy link

Is there a reason why all these steps are necessary? Can the repo not include docker files to build a test environment container with the correct security tweaks in place?

@zacuke
Copy link

zacuke commented Dec 2, 2024

The real mystery to me is how the Github actions run the tests so reliably. The instructions above are meant to try to setup a local environment as closely as possible to what Github actions uses.

One thing of note is that the Github actions run as 8 separate shards and also includes extra parametes. I've also tried this approach local but it doesn't seem to make much difference with respect to the intermittent failures.

for i = 1 to 8
   Run pnpm test:withFlags --maxWorkers=1 --shard=i/8 --passWithNoTests

But not only that - it appears the entire environment is torn down in between each shard. Which I have not tried local ... since it would require, as you suggest, an entire new docker layer (and then it would have docker within docker, because the tests already use docker). However, their tests do have a non-docker mode I haven't explored very much (as the Github actions run the docker based tests).

I also see these intermittent errors when running the code in production. On a host with very little traffic, requests intermittently fail with Internal server error, but they succeed when retried the second time.

To add to the complexity of trying to grasp all this, their official pds install instructions uses a docker file which seems to pull a special image they have prepared ghcr.io/bluesky-social/pds:0.4 - so if it does appear stable in that case, we would need to pry open that image to find out what tweaks have been performed and then also apply them to development and hosting environments.

But going back to the real mystery, I don't see any special tweaks being applied during the Github actions.

@OracPrime
Copy link

Do you not get postgresql rejecting the number of connections?

@zacuke
Copy link

zacuke commented Dec 2, 2024

I've seen postgresql rejecting number of connections, but only as a side effect of the timeout or other errors occurring first.

@OracPrime
Copy link

I've been attempting to use pgbouncer to resolve it, but setup is a nightmare :(

@OracPrime
Copy link

OracPrime commented Dec 5, 2024

I think maxworkers is the key. Without it too much runs in parallel and various things fall over.
pnpm test:withFlags --maxWorkers=1
works but is quite slow, and just after all the tests pass I get an out of memory error.
To run a single test you need something like
pnpm test:withFlags --maxWorkers=1 packages/bsky/tests/admin/admin-auth.test.ts --passWithNoTests

You can also use this with debugger auto-attach and a breakpoint to debug a test.

Of course for a single test you can also go into the right package's directory and run test from there:

~/github/atproto/packages/pds$ pnpm test packages/pds/tests/proxied/admin.test.ts

@zacuke
Copy link

zacuke commented Dec 5, 2024

Summary so far of things that help, but don't fully resolve. By not fully resolve, that means make test might complete, but it's roughly a 10% chance of completion (in my testing).

  • Reinstalling Ubuntu
  • 4 vcpu (exactly, not more or less)
  • testTimeout: 120000 in jest.config.js files
  • --maxWorkers=1
  • --shard #/8 (in a loop to do all 8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants