`Make test` errors on fresh checkout #3148

bnewbold · 2024-11-30T18:14:26Z

Discussed in #3040

^{Originally posted by zacuke November 18, 2024}
Hello, this test below fails every time I run make test. I've been trying quite a few things:

reducing the 3000 parameter on line 1209,
increasing the ulimit -s value,
setting v8.setFlagsFromString('--stack-size=#####');

But nothing seems to help. If I lower the 3000 value too much, then the string seems to be too short, and it properly makes a round trip without 400 error and the test fails for a different reason: it expects the server to reject the too-long string.

packages/pds test: FAIL PDS tests/crud.test.ts (43.036 s)
packages/pds test:   ● crud operations › compare-and-swap › writes fail on values that can't reliably transform between cbor to lex
packages/pds test:     RangeError: Maximum call stack size exceeded
packages/pds test:         at JSON.stringify (<anonymous>)
packages/pds test:       1196 |     it("writes fail on values that can't reliably transform between cbor to lex", async () => {
packages/pds test:       1197 |       const passthroughBody = (data: unknown) =>
packages/pds test:     > 1198 |         ui8ToArrayBuffer(new TextEncoder().encode(JSON.stringify(data)))
packages/pds test:            |                                                        ^
packages/pds test:       1199 |
packages/pds test:       1200 |       const result = aliceAgent.call(
packages/pds test:       1201 |         'com.atproto.repo.createRecord',
packages/pds test:       at stringify (tests/crud.test.ts:1198:56)
packages/pds test:       at Object.passthroughBody (tests/crud.test.ts:1203:9)

Line 1209 with the seemingly arbitrary 3000 parameter value was recently changed and the commit message itself seems to indicate node v20 support, but I get better results using v18 (although this Maximum call stack error happens either way)

deepObject: createDeepObject(3000),

So, what is everyone doing to get a succesful make test run?

The text was updated successfully, but these errors were encountered:

zacuke · 2024-12-01T08:56:02Z

Note

Starting with fresh installs of Ubuntu Server 22.04 and 24.04

docker selected during initial setup
install puppeteer https://s55ma.radioamater.si/2023/05/30/ubuntu-22-04-install-puppeteer/
For Ubuntu Server 24.04
- libasound2 needs to be libasound2t64
- https://chromium.googlesource.com/chromium/src/+/main/docs/security/apparmor-userns-restrictions.md
fix docker for non-root
- sudo addgroup docker
- sudo usermod -aG docker $USER
- sudo apt install acl
- sudo setfacl -m "g:docker:rw" /var/run/docker.sock

Get latest of the code

sudo apt install jq make curl
make nvm-setup doesn't work (says nvm not found) so run commands individually
nvm install 18
nvm use 18
corepack enable
make deps
make build

Now ready to run make test

Occasionally
- packages/crypto/tests/random.test.ts line 13 will fail because the value is slightly greater than 1.05
Fairly often fails
- error: "Exceeded timeout of 60000 ms for a hook. Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."
Reliably fails if more than 4 vcpu cores
- error: listen EADDRINUSE: address already in use
- error: sorry, too many clients already
Additionally
- if test was interrupted, postgresql may already contain data that causes errors like: email [email protected] already taken
  - running again seems to fix
- not sure if matters but this shows a lot: Possible EventEmitter memory leak detected. 11 error listeners added to [Socket]. Use emitter.setMaxListeners() to increase limit
The good news
- make test completes on Ubuntu Server with 4 vcpu cores
- testTimeout: 120000 in jest.config.js helps
- 8 GB of memory helps

Note

Starting with fresh installs of Ubuntu Desktop 22.04 and 24.04

Install docker using instructions https://docs.docker.com/engine/install/ubuntu/
- sudo addgroup docker
- sudo usermod -aG docker $USER
Ubuntu Desktop 24.04 additional config
- https://chromium.googlesource.com/chromium/src/+/main/docs/security/apparmor-userns-restrictions.md
Get latest of the code
- sudo apt install jq make curl
- Install nvm
- make nvm-setup doesn't work (says nvm not found) so run commands individually
- nvm install 18
- nvm use 18
- corepack enable
- make deps
- make build
Even with 4 vcpu cores sometimes get this error
- error: listen EADDRINUSE: address already in use
- this seems to be a known issue with get-port dependency
- also intermittently fails with Actor not found
In summary:
- Fresh install resolved Maximum call stack size exceeded and desktop crashing
  - Playing with Line 1209 in tests/crud.test.ts brings the error back, so I suspect it won't be the last we see of it
  - There must be something in existing setups that has reduced call stack size, perhaps upgraded OS?
- 4 vcpu helped the most
- testTimeout: 120000 was needed too
- get-port dependency will cause intermittent EADDRINUSE: address already in use

OracPrime · 2024-12-01T16:40:17Z

That has got me further, thanks. It now fails on the xrpc server tests, I suspect because I am running the tests inside a VNC session, which is bumping the instance count by one...

packages/xrpc-server test: FAIL XRPC Server tests/stream.test.ts
packages/xrpc-server test:   ● Stream › kills handler and closes client disconnect.
packages/xrpc-server test:     expect(received).toBe(expected) // Object.is equality
packages/xrpc-server test:     Expected: 5
packages/xrpc-server test:     Received: 6
packages/xrpc-server test:       116 |     const currentCount = i
packages/xrpc-server test:       117 |     await wait(5)
packages/xrpc-server test:     > 118 |     expect(i).toBe(currentCount)
packages/xrpc-server test:           |               ^
packages/xrpc-server test:       119 |
packages/xrpc-server test:       120 |     httpServer.close()
packages/xrpc-server test:       121 |   })
packages/xrpc-server test:       at Object.toBe (tests/stream.test.ts:118:15)

However if I comment that test out it then starts failing on EADDRINUSE in the packages/bsky test
And in packages/pds it fails with "sorry, too many clients already" when trying to access the database

packages/pds test:  › 3 snapshots failed from 1 test suite. Inspect your code changes or run `npm test -- -u` to update them.
packages/pds test: Test Suites: 4 failed, 33 passed, 37 total
packages/pds test: Tests:       25 failed, 2 skipped, 315 passed, 342 total
packages/pds test: Snapshots:   3 failed, 24 passed, 27 total
packages/pds test: Time:        100.098 s
packages/pds test: Ran all test suites.

OracPrime · 2024-12-01T18:40:30Z

The EADDRINUSE seems to be intermittent, but the too many clients problem remains.

OracPrime · 2024-12-02T14:33:01Z

Is there a reason why all these steps are necessary? Can the repo not include docker files to build a test environment container with the correct security tweaks in place?

zacuke · 2024-12-02T15:56:55Z

The real mystery to me is how the Github actions run the tests so reliably. The instructions above are meant to try to setup a local environment as closely as possible to what Github actions uses.

One thing of note is that the Github actions run as 8 separate shards and also includes extra parametes. I've also tried this approach local but it doesn't seem to make much difference with respect to the intermittent failures.

for i = 1 to 8
   Run pnpm test:withFlags --maxWorkers=1 --shard=i/8 --passWithNoTests

But not only that - it appears the entire environment is torn down in between each shard. Which I have not tried local ... since it would require, as you suggest, an entire new docker layer (and then it would have docker within docker, because the tests already use docker). However, their tests do have a non-docker mode I haven't explored very much (as the Github actions run the docker based tests).

I also see these intermittent errors when running the code in production. On a host with very little traffic, requests intermittently fail with Internal server error, but they succeed when retried the second time.

To add to the complexity of trying to grasp all this, their official pds install instructions uses a docker file which seems to pull a special image they have prepared ghcr.io/bluesky-social/pds:0.4 - so if it does appear stable in that case, we would need to pry open that image to find out what tweaks have been performed and then also apply them to development and hosting environments.

But going back to the real mystery, I don't see any special tweaks being applied during the Github actions.

OracPrime · 2024-12-02T18:36:30Z

Do you not get postgresql rejecting the number of connections?

zacuke · 2024-12-02T20:13:12Z

I've seen postgresql rejecting number of connections, but only as a side effect of the timeout or other errors occurring first.

OracPrime · 2024-12-02T22:53:50Z

I've been attempting to use pgbouncer to resolve it, but setup is a nightmare :(

OracPrime · 2024-12-05T14:08:41Z

I think maxworkers is the key. Without it too much runs in parallel and various things fall over.
pnpm test:withFlags --maxWorkers=1
works but is quite slow, and just after all the tests pass I get an out of memory error.
To run a single test you need something like
pnpm test:withFlags --maxWorkers=1 packages/bsky/tests/admin/admin-auth.test.ts --passWithNoTests

You can also use this with debugger auto-attach and a breakpoint to debug a test.

Of course for a single test you can also go into the right package's directory and run test from there:

~/github/atproto/packages/pds$ pnpm test packages/pds/tests/proxied/admin.test.ts

zacuke · 2024-12-05T17:56:05Z

Summary so far of things that help, but don't fully resolve. By not fully resolve, that means make test might complete, but it's roughly a 10% chance of completion (in my testing).

Reinstalling Ubuntu
4 vcpu (exactly, not more or less)
testTimeout: 120000 in jest.config.js files
--maxWorkers=1
--shard #/8 (in a loop to do all 8)

bnewbold added the bug Something isn't working label Nov 30, 2024

zacuke mentioned this issue Dec 1, 2024

fix quotes in package.json to support all platforms #3154

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Make test` errors on fresh checkout #3148

`Make test` errors on fresh checkout #3148

bnewbold commented Nov 30, 2024

zacuke commented Dec 1, 2024

OracPrime commented Dec 1, 2024

OracPrime commented Dec 1, 2024

OracPrime commented Dec 2, 2024

zacuke commented Dec 2, 2024

OracPrime commented Dec 2, 2024

zacuke commented Dec 2, 2024

OracPrime commented Dec 2, 2024

OracPrime commented Dec 5, 2024 •

edited

Loading

zacuke commented Dec 5, 2024 •

edited

Loading

Make test errors on fresh checkout #3148

Make test errors on fresh checkout #3148

Comments

bnewbold commented Nov 30, 2024

Discussed in #3040

zacuke commented Dec 1, 2024

OracPrime commented Dec 1, 2024

OracPrime commented Dec 1, 2024

OracPrime commented Dec 2, 2024

zacuke commented Dec 2, 2024

OracPrime commented Dec 2, 2024

zacuke commented Dec 2, 2024

OracPrime commented Dec 2, 2024

OracPrime commented Dec 5, 2024 • edited Loading

zacuke commented Dec 5, 2024 • edited Loading

`Make test` errors on fresh checkout #3148

`Make test` errors on fresh checkout #3148

OracPrime commented Dec 5, 2024 •

edited

Loading

zacuke commented Dec 5, 2024 •

edited

Loading