Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test load balancing (migrating changes from TypeScript-Demo-Lib) #394

Merged
merged 9 commits into from
Jul 12, 2022

Conversation

emmacasolin
Copy link
Contributor

@emmacasolin emmacasolin commented Jun 30, 2022

Description

Test load balancing allows us to run our tests on the CI/CD more efficiently. We can apply it to our macOS and Windows runners while continuing to use the existing test pipelines script for Linux tests (for easier debugging during development until we have better Junit reporting on Gitlab MatrixAI/TypeScript-Demo-Lib#66.

We need to migrate the changes made in TS-Demo-Lib, as well as determine the most efficient number of shards for macOS/Windows parallel test runners. This doesn't need to be calculated dynamically, just a static value from trial and error is fine for now (a proper algorithm can be implemented later in #393)

Issues Fixed

Tasks

  • 1. Fix rules for test pipelines (should only run on feature commits - not staging)
  • 2. Move build:linux/windows/masos jobs into a build-generate.sh script
  • 3. Stop build:linux/windows/masos jobs from uploading dists and move this to a separate job (to prevent multiple dists from being uploaded)
  • 4. Improve Junit reporting on Gitlab by utilising classNameTemplate, titleTemplate, and addFileAttribute options (plus ancestorSeparator and reportTestSuiteErrors)
  • 5. Update Jest and Junit to 28.* to utilise sharding for test load balancing
  • 6. Add coverage back to tests in CI/CD
  • [ ] 7. Determine the most efficient number of shards for test load balancing on masOS/Windows (static value) Leaving as 2 for now, we can optimise when doing Apply bin packing algorithms to the setup of our parallel CI/CD test runners #393

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@emmacasolin emmacasolin self-assigned this Jun 30, 2022
@emmacasolin
Copy link
Contributor Author

Some additional Junit options that I've now added on top of the ones brought in from TypeScript-Demo-Lib:

  • ancestorSeparator: ' > ' - adds a > separator between the names of nested describes in the suite names, so status status with global agent becomes status > status with global agent
  • reportTestSuiteErrors: 'true' - ensures that test suites that fail to run entirely (for example due to a type error) still show up in the Junit output. Without this, these failures weren't showing up on gitlab so you wouldn't know they occurred unless you looked at the pipeline output

@emmacasolin
Copy link
Contributor Author

Updating jest to 28.* seems to mess with our usage of mockImplementation for the prompts module.

We use it in a few places (tests/bin/sessions.test.ts, tests/bin/agent/lockall.test.ts, tests/bin/agent/lock.test.ts, and tests/bin/utils.retryAuthentication.test.ts) like

mockedPrompts.mockImplementation(async (_opts: any) => {
  return { password };
});

Which throws a type error:

Argument of type '(_opts: any) => Promise<{ password: any; }>' is not assignable to parameter of type 'typeof prompts'.
  Type '(_opts: any) => Promise<{ password: any; }>' is missing the following properties from type 'typeof prompts': inject, override, prompt, prompts ts(2345)

@emmacasolin
Copy link
Contributor Author

Updating jest to 28.* seems to mess with our usage of mockImplementation for the prompts module.

We use it in a few places (tests/bin/sessions.test.ts, tests/bin/agent/lockall.test.ts, tests/bin/agent/lock.test.ts, and tests/bin/utils.retryAuthentication.test.ts) like

mockedPrompts.mockImplementation(async (_opts: any) => {
  return { password };
});

Which throws a type error:

Argument of type '(_opts: any) => Promise<{ password: any; }>' is not assignable to parameter of type 'typeof prompts'.
  Type '(_opts: any) => Promise<{ password: any; }>' is missing the following properties from type 'typeof prompts': inject, override, prompt, prompts ts(2345)

Changing the mock from prompts to prompts.prompt looks to be the solution

jest.mock('prompts');
const mockedPrompts = mocked(prompts.prompt);

@emmacasolin
Copy link
Contributor Author

I'm running into a problem with choosing a shard count for the mac/windows runners. The main issue is that the windows runners start to fail if I set the shard count too high with the error

$ choco install nodejs --version=16.14.2 -y
Chocolatey v0.10.15
Installing the following packages:
nodejs
By installing you accept licenses for the packages.
nodejs not installed. An error occurred during installation:
 The remote server returned an error: ([42](https://gitlab.com/MatrixAI/open-source/js-polykey/-/jobs/2660029073#L42)9) Too Many Requests. Too Many Requests
nodejs package files install completed. Performing other installation steps.
The install of nodejs was NOT successful.
nodejs not installed. An error occurred during installation:
 The remote server returned an error: (429) Too Many Requests. Too Many Requests
Chocolatey installed 0/1 packages. 1 packages failed.
 See the log for details (C:\ProgramData\chocolatey\logs\chocolatey.log).
Failures
 - nodejs (exited 1) - nodejs not installed. An error occurred during installation:
 The remote server returned an error: (429) Too Many Requests. Too Many Requests

With 12 shards they all failed with this error, and with 20 shards 8 of them failed.

The problem is that, even with 12 shards, a lot of the tests are timing out (on the mac runners, not sure about windows since they won't run). But I can't increase the shard count due to the issue with the windows runners.

There's the possibility that bringing in a very simple algorithm to try to optimise the shards (e.g. https://github.com/kamilkisiela/split-tests) would allow us to reduce the shard count without the tests timing out, since I ran some of the shards locally and there are a few minutes' difference between shard run times even in the few that I ran.

@CMCDragonkai
Copy link
Member

429 means there's a rate limit. Either in gitlab or on chocolate's side. So you know which side?

For now stick with just shard count of 2.

Just because a shared count is low, doesn't mean tests should timeout. Timeout clock only starts when the test function starts execution. There's only 2 cores on the windows runners. If they are timing out, we need to fix the bug. We may have introduced this but with beforeAll using global agent locking. But you need to provide more diagnostic information in this PR about which tests are failing and timing out.

Try with Mac first, it's closer to Unix. This uses homebrew, which I think doesn't have as much rate limiting going on as chocolatey.

@CMCDragonkai
Copy link
Member

@emmacasolin if you connect a PR to issues on zenhub, they are all moved together. This PR only addresses #392, so the other issues are related issues, but are not connected issues.

@emmacasolin
Copy link
Contributor Author

I have a job running on the CI/CD now with just two shards for the mac runner to see how it goes, but during the meantime I ran the two shards locally as well. All of the failures are timeouts or timeout-related and fall into the general categories of nodes/notifications/vaults/NAT tests.

To summarise:

Shard 1 Shard 2
Total run time 485s 605s
No. test suites total 68 68
No. test suites failed 6 7
No. tests total 319 388
No. tests failed 27 26

Full failure output:

Shard 1/2

A worker process has failed to exit gracefully and has been force exited. This is likely caused by tests leaking due to improper teardown. Try running with --detectOpenHandles to find leaks. Active timers can also cause this, ensure that .unref() was called on them.
Summary of all failing tests
 FAIL  tests/notifications/NotificationsManager.test.ts (60.582 s)
  ● NotificationsManager › notifications manager readiness

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)

  ● NotificationsManager › can send notifications with permission

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)

  ● NotificationsManager › cannot send notifications without permission

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)

  ● NotificationsManager › can receive notifications from senders with permission

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)

  ● NotificationsManager › cannot receive notifications from senders without permission

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)

  ● NotificationsManager › marks notifications as read

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)

  ● NotificationsManager › all notifications are read oldest to newest by default

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)

  ● NotificationsManager › can read only unread notifications

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)

  ● NotificationsManager › can read a single notification

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)

  ● NotificationsManager › can read notifications in reverse order

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)

  ● NotificationsManager › notifications can be capped and oldest notifications deleted

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)

  ● NotificationsManager › can find a gestalt invite notification

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)

  ● NotificationsManager › clears notifications

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)

  ● NotificationsManager › notifications are persistent across restarts

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)

  ● NotificationsManager › creating fresh notifications manager

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      60 |
      61 |   let receiver: PolykeyAgent;
    > 62 |   beforeAll(async () => {
         |   ^
      63 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      64 |     mockedGenerateKeyPair = jest
      65 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/notifications/NotificationsManager.test.ts:62:3
      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:28:1)


  ● Test suite failed to run

    TypeError: Cannot read properties of undefined (reading 'stop')

      153 |   }, global.defaultTimeout);
      154 |   afterAll(async () => {
    > 155 |     await receiver.stop();
          |                    ^
      156 |     await queue.stop();
      157 |     await nodeConnectionManager.stop();
      158 |     await nodeManager.stop();

      at Object.<anonymous> (tests/notifications/NotificationsManager.test.ts:155:20)

 FAIL  tests/bin/secrets/secrets.test.ts (165.263 s)
  ● CLI secrets › commandListSecrets › should list secrets

    thrown: "Exceeded timeout of 20000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      122 |   });
      123 |   describe('commandListSecrets', () => {
    > 124 |     test('should list secrets', async () => {
          |     ^
      125 |       const vaultName = 'Vault4' as VaultName;
      126 |       const vaultId = await polykeyAgent.vaultManager.createVault(vaultName);
      127 |

      at tests/bin/secrets/secrets.test.ts:124:5
      at tests/bin/secrets/secrets.test.ts:123:3
      at Object.<anonymous> (tests/bin/secrets/secrets.test.ts:11:1)

 FAIL  tests/nat/endpointDependentNAT.test.ts (185.764 s)
  ● endpoint dependent NAT traversal › node1 behind EDM NAT cannot connect to node2 behind EDM NAT

    thrown: "Exceeded timeout of 40000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      153 |       global.defaultTimeout * 2,
      154 |     );
    > 155 |     test(
          |     ^
      156 |       'node1 behind EDM NAT cannot connect to node2 behind EDM NAT',
      157 |       async () => {
      158 |         const {

      at tests/nat/endpointDependentNAT.test.ts:155:5
      at describeIf (tests/utils.ts:203:5)
      at Object.<anonymous> (tests/nat/endpointDependentNAT.test.ts:10:11)

  ● endpoint dependent NAT traversal › node1 behind EDM NAT cannot connect to node2 behind EIM NAT

    thrown: "Exceeded timeout of 40000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      207 |       global.defaultTimeout * 2,
      208 |     );
    > 209 |     test(
          |     ^
      210 |       'node1 behind EDM NAT cannot connect to node2 behind EIM NAT',
      211 |       async () => {
      212 |         const {

      at tests/nat/endpointDependentNAT.test.ts:209:5
      at describeIf (tests/utils.ts:203:5)
      at Object.<anonymous> (tests/nat/endpointDependentNAT.test.ts:10:11)

 FAIL  tests/nat/endpointIndependentNAT.test.ts (234.325 s)
  ● endpoint independent NAT traversal › node1 behind EIM NAT connects to node2 behind EIM NAT

    thrown: "Exceeded timeout of 40000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      188 |       global.defaultTimeout * 2,
      189 |     );
    > 190 |     test(
          |     ^
      191 |       'node1 behind EIM NAT connects to node2 behind EIM NAT',
      192 |       async () => {
      193 |         const {

      at tests/nat/endpointIndependentNAT.test.ts:190:5
      at describeIf (tests/utils.ts:203:5)
      at Object.<anonymous> (tests/nat/endpointIndependentNAT.test.ts:10:11)

  ● endpoint independent NAT traversal › node1 behind EIM NAT connects to node2 behind EIM NAT via seed node

    thrown: "Exceeded timeout of 40000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      294 |       global.defaultTimeout * 2,
      295 |     );
    > 296 |     test(
          |     ^
      297 |       'node1 behind EIM NAT connects to node2 behind EIM NAT via seed node',
      298 |       async () => {
      299 |         const {

      at tests/nat/endpointIndependentNAT.test.ts:296:5
      at describeIf (tests/utils.ts:203:5)
      at Object.<anonymous> (tests/nat/endpointIndependentNAT.test.ts:10:11)

  ● endpoint independent NAT traversal › node1 behind EIM NAT cannot connect to node2 behind EDM NAT

    thrown: "Exceeded timeout of 40000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      346 |       global.defaultTimeout * 2,
      347 |     );
    > 348 |     test(
          |     ^
      349 |       'node1 behind EIM NAT cannot connect to node2 behind EDM NAT',
      350 |       async () => {
      351 |         const {

      at tests/nat/endpointIndependentNAT.test.ts:348:5
      at describeIf (tests/utils.ts:203:5)
      at Object.<anonymous> (tests/nat/endpointIndependentNAT.test.ts:10:11)

 FAIL  tests/vaults/VaultInternal.test.ts (456.826 s)
  ● VaultInternal › can change commits and preserve the log with no intermediate vault mutation

    thrown: "Exceeded timeout of 20000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      176 |     expect(file).toBe('testdata');
      177 |   });
    > 178 |   test('can change commits and preserve the log with no intermediate vault mutation', async () => {
          |   ^
      179 |     const initCommit = (await vault.log(undefined, 1))[0].commitId;
      180 |     await vault.writeF(async (efs) => {
      181 |       await efs.writeFile('test1', 'testdata1');

      at tests/vaults/VaultInternal.test.ts:178:3
      at Object.<anonymous> (tests/vaults/VaultInternal.test.ts:26:1)

  ● VaultInternal › does not allow changing to an unrecognised commit

    thrown: "Exceeded timeout of 20000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      199 |     expect(files).toEqual(['test1', 'test2', 'test3']);
      200 |   });
    > 201 |   test('does not allow changing to an unrecognised commit', async () => {
          |   ^
      202 |     await expect(() => vault.version('unrecognisedcommit')).rejects.toThrow(
      203 |       vaultsErrors.ErrorVaultReferenceInvalid,
      204 |     );

      at tests/vaults/VaultInternal.test.ts:201:3
      at Object.<anonymous> (tests/vaults/VaultInternal.test.ts:26:1)

  ● VaultInternal › can change to the latest commit

    thrown: "Exceeded timeout of 20000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      224 |     );
      225 |   });
    > 226 |   test('can change to the latest commit', async () => {
          |   ^
      227 |     const initCommit = (await vault.log(undefined, 1))[0].commitId;
      228 |     await vault.writeF(async (efs) => {
      229 |       await efs.writeFile('test1', 'testdata1');

      at tests/vaults/VaultInternal.test.ts:226:3
      at Object.<anonymous> (tests/vaults/VaultInternal.test.ts:26:1)

  ● VaultInternal › adjusts HEAD after vault mutation, discarding forward and preserving backwards history

    thrown: "Exceeded timeout of 20000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      248 |     expect(files).toEqual(['test1', 'test2', 'test3']);
      249 |   });
    > 250 |   test('adjusts HEAD after vault mutation, discarding forward and preserving backwards history', async () => {
          |   ^
      251 |     const initCommit = (await vault.log(undefined, 1))[0].commitId;
      252 |     await vault.writeF(async (efs) => {
      253 |       await efs.writeFile('test1', 'testdata1');

      at tests/vaults/VaultInternal.test.ts:250:3
      at Object.<anonymous> (tests/vaults/VaultInternal.test.ts:26:1)

  ● VaultInternal › concurrent write operations prevented

    thrown: "Exceeded timeout of 20000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      290 |     });
      291 |   });
    > 292 |   test('concurrent write operations prevented', async () => {
          |   ^
      293 |     await Promise.all([
      294 |       vault.writeF(async (efs) => {
      295 |         await efs.writeFile('secret-1', 'secret-content-1');

      at tests/vaults/VaultInternal.test.ts:292:3
      at Object.<anonymous> (tests/vaults/VaultInternal.test.ts:26:1)

 FAIL  tests/vaults/VaultManager.test.ts (483.949 s)
  ● VaultManager › able to read and load existing metadata

    thrown: "Exceeded timeout of 40000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      279 |     }
      280 |   });
    > 281 |   test(
          |   ^
      282 |     'able to read and load existing metadata',
      283 |     async () => {
      284 |       const vaultManager = await VaultManager.createVaultManager({

      at tests/vaults/VaultManager.test.ts:281:3
      at Object.<anonymous> (tests/vaults/VaultManager.test.ts:44:1)


Test Suites: 6 failed, 62 passed, 68 total
Tests:       27 failed, 1 todo, 291 passed, 319 total
Snapshots:   0 total
Time:        485.153 s, estimated 488 s
Ran all test suites.
GLOBAL TEARDOWN
Destroying Global Data Dir: /tmp/polykey-test-global-ADmYah

Shard 2/2

A worker process has failed to exit gracefully and has been force exited. This is likely caused by tests leaking due to improper teardown. Try running with --detectOpenHandles to find leaks. Active timers can also cause this, ensure that .unref() was called on them.
Summary of all failing tests
 FAIL  tests/agent/service/nodesCrossSignClaim.test.ts (27.894 s)
  ● nodesCrossSignClaim › successfully cross signs a claim

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      34 |   let mockedGenerateKeyPair: jest.SpyInstance;
      35 |   let mockedGenerateDeterministicKeyPair: jest.SpyInstance;
    > 36 |   beforeAll(async () => {
         |   ^
      37 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      38 |     mockedGenerateKeyPair = jest
      39 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/agent/service/nodesCrossSignClaim.test.ts:36:3
      at Object.<anonymous> (tests/agent/service/nodesCrossSignClaim.test.ts:21:1)
          at runMicrotasks (<anonymous>)

  ● nodesCrossSignClaim › fails after receiving undefined singly signed claim

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      34 |   let mockedGenerateKeyPair: jest.SpyInstance;
      35 |   let mockedGenerateDeterministicKeyPair: jest.SpyInstance;
    > 36 |   beforeAll(async () => {
         |   ^
      37 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      38 |     mockedGenerateKeyPair = jest
      39 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/agent/service/nodesCrossSignClaim.test.ts:36:3
      at Object.<anonymous> (tests/agent/service/nodesCrossSignClaim.test.ts:21:1)
          at runMicrotasks (<anonymous>)

  ● nodesCrossSignClaim › fails after receiving singly signed claim with no signature

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      34 |   let mockedGenerateKeyPair: jest.SpyInstance;
      35 |   let mockedGenerateDeterministicKeyPair: jest.SpyInstance;
    > 36 |   beforeAll(async () => {
         |   ^
      37 |     const globalKeyPair = await testUtils.setupGlobalKeypair();
      38 |     mockedGenerateKeyPair = jest
      39 |       .spyOn(keysUtils, 'generateKeyPair')

      at tests/agent/service/nodesCrossSignClaim.test.ts:36:3
      at Object.<anonymous> (tests/agent/service/nodesCrossSignClaim.test.ts:21:1)
          at runMicrotasks (<anonymous>)


  ● Test suite failed to run

    TypeError: Cannot read properties of undefined (reading 'destroy')

       97 |   }, global.defaultTimeout);
       98 |   afterAll(async () => {
    >  99 |     await grpcClient.destroy();
          |                      ^
      100 |     await grpcServer.stop();
      101 |     await pkAgent.stop();
      102 |     await pkAgent.destroy();

      at Object.<anonymous> (tests/agent/service/nodesCrossSignClaim.test.ts:99:22)

 FAIL  tests/git/utils.test.ts (22.19 s)
  ● Git utils › read index › of a packfile

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)

  ● Git utils › list refs › on master

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)

  ● Git utils › encoding › a string

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)

  ● Git utils › encoding › an empty string

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)

  ● Git utils › encoding › an upload pack

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)

  ● Git utils › resolve refs › to a commit oid

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)

  ● Git utils › resolve refs › to HEAD

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)

  ● Git utils › resolve refs › to HEAD including depth

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)

  ● Git utils › resolve refs › to non-existant refs

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)

  ● Git utils › read an object › missing

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)

  ● Git utils › read an object › parsed

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)

  ● Git utils › read an object › content

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)

  ● Git utils › read an object › wrapped

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)

  ● Git utils › read an object › deflated

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)

  ● Git utils › read an object › from packfile

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      23 |   let dbKey: Buffer;
      24 |
    > 25 |   beforeAll(async () => {
         |   ^
      26 |     dataDir = await fs.promises.mkdtemp(
      27 |       path.join(os.tmpdir(), 'polykey-test-'),
      28 |     );

      at tests/git/utils.test.ts:25:3
      at Object.<anonymous> (tests/git/utils.test.ts:14:1)
          at runMicrotasks (<anonymous>)


  ● Test suite failed to run

    ErrorEncryptedFSNotRunning:

      at node_modules/pify/index.js:32:6
      at FileSystem.<anonymous> (node_modules/pify/index.js:6:9)
      at FileSystem.ret [as _writeFile] (node_modules/pify/index.js:56:76)
      at FileSystem.write (node_modules/isomorphic-git/index.cjs:4285:18)
          at runMicrotasks (<anonymous>)
      at writeObjectLoose (node_modules/isomorphic-git/index.cjs:4429:37)
      at _writeObject (node_modules/isomorphic-git/index.cjs:4480:5)
      at node_modules/isomorphic-git/index.cjs:4684:17

 FAIL  tests/bin/vaults/vaults.test.ts (300.797 s)
  ● CLI vaults › commandVaultLog › Should get all writeFs

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      633 |     let writeF3Oid: string;
      634 |
    > 635 |     beforeEach(async () => {
          |     ^
      636 |       vaultId = await polykeyAgent.vaultManager.createVault(vaultName);
      637 |
      638 |       await polykeyAgent.vaultManager.withVaults([vaultId], async (vault) => {

      at tests/bin/vaults/vaults.test.ts:635:5
      at tests/bin/vaults/vaults.test.ts:626:3
      at Object.<anonymous> (tests/bin/vaults/vaults.test.ts:35:1)

  ● CLI vaults › commandVaultLog › should get a specific writeF

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      633 |     let writeF3Oid: string;
      634 |
    > 635 |     beforeEach(async () => {
          |     ^
      636 |       vaultId = await polykeyAgent.vaultManager.createVault(vaultName);
      637 |
      638 |       await polykeyAgent.vaultManager.withVaults([vaultId], async (vault) => {

      at tests/bin/vaults/vaults.test.ts:635:5
      at tests/bin/vaults/vaults.test.ts:626:3
      at Object.<anonymous> (tests/bin/vaults/vaults.test.ts:35:1)

  ● CLI vaults › commandVaultLog › should get a specific writeF

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      653 |       });
      654 |     });
    > 655 |     afterEach(async () => {
          |     ^
      656 |       await polykeyAgent.vaultManager.destroyVault(vaultId);
      657 |     });
      658 |

      at tests/bin/vaults/vaults.test.ts:655:5
      at tests/bin/vaults/vaults.test.ts:626:3
      at Object.<anonymous> (tests/bin/vaults/vaults.test.ts:35:1)

 FAIL  tests/nat/DMZ.test.ts (94.223 s)
  ● DMZ › agents in different namespaces can ping each other via seed node

    thrown: "Exceeded timeout of 40000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      216 |       global.defaultTimeout * 2,
      217 |     );
    > 218 |     test(
          |     ^
      219 |       'agents in different namespaces can ping each other via seed node',
      220 |       async () => {
      221 |         const {

      at tests/nat/DMZ.test.ts:218:5
      at describeIf (tests/utils.ts:203:5)
      at Object.<anonymous> (tests/nat/DMZ.test.ts:14:11)
          at runMicrotasks (<anonymous>)

 FAIL  tests/bin/nodes/ping.test.ts (55.41 s)
  ● ping › succeed when pinging a live node

    expect(received).toBe(expected) // Object.is equality

    Expected: 0
    Received: 1

      169 |       dataDir,
      170 |     );
    > 171 |     expect(exitCode).toBe(0);
          |                      ^
      172 |     expect(JSON.parse(stdout)).toEqual({
      173 |       success: true,
      174 |       message: 'Node is Active.',

      at Object.<anonymous> (tests/bin/nodes/ping.test.ts:171:22)
          at runMicrotasks (<anonymous>)

 FAIL  tests/nodes/NodeConnection.test.ts (492.153 s)
  ● NodeConnection test › should call `killSelf and throw if the server kill's during testStreamFail

    thrown: "Exceeded timeout of 40000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      787 |     global.defaultTimeout * 2,
      788 |   );
    > 789 |   test.each(options)(
          |                     ^
      790 |     "should call `killSelf and throw if the server %s's during testStreamFail",
      791 |     async (option) => {
      792 |       let nodeConnection:

      at node_modules/jest-each/build/bind.js:45:11
          at Array.forEach (<anonymous>)
      at tests/nodes/NodeConnection.test.ts:789:21
      at Object.<anonymous> (tests/nodes/NodeConnection.test.ts:66:1)

 FAIL  tests/vaults/VaultOps.test.ts (533.706 s)
  ● VaultOps › adding and committing a secret 10 times

    thrown: "Exceeded timeout of 80000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      132 |     });
      133 |   });
    > 134 |   test(
          |   ^
      135 |     'adding and committing a secret 10 times',
      136 |     async () => {
      137 |       const content = 'secret-content';

      at tests/vaults/VaultOps.test.ts:134:3
      at Object.<anonymous> (tests/vaults/VaultOps.test.ts:19:1)

  ● VaultOps › adding and committing a secret 10 times

    thrown: "Exceeded timeout of 20000 ms for a hook.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      79 |   });
      80 |
    > 81 |   afterEach(async () => {
         |   ^
      82 |     await vaultInternal.stop();
      83 |     await vaultInternal.destroy();
      84 |     await db.stop();

      at tests/vaults/VaultOps.test.ts:81:3
      at Object.<anonymous> (tests/vaults/VaultOps.test.ts:19:1)

  ● VaultOps › updating secret content

    ErrorEncryptedFSNotRunning:

      at node_modules/pify/index.js:32:6
      at FileSystem.<anonymous> (node_modules/pify/index.js:6:9)
      at FileSystem.ret [as _writeFile] (node_modules/pify/index.js:56:76)
      at FileSystem.write (node_modules/isomorphic-git/index.cjs:4285:18)
          at runMicrotasks (<anonymous>)
      at writeObjectLoose (node_modules/isomorphic-git/index.cjs:4429:37)
      at _writeObject (node_modules/isomorphic-git/index.cjs:4480:5)
      at node_modules/isomorphic-git/index.cjs:4570:19

  ● VaultOps › updating secret content

    thrown: "Exceeded timeout of 80000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      150 |     global.defaultTimeout * 4,
      151 |   );
    > 152 |   test(
          |   ^
      153 |     'updating secret content',
      154 |     async () => {
      155 |       await vaultOps.addSecret(vault, 'secret-1', 'secret-content');

      at tests/vaults/VaultOps.test.ts:152:3
      at Object.<anonymous> (tests/vaults/VaultOps.test.ts:19:1)

  ● VaultOps › updating secret content

    ErrorDBNotRunning:

      329 |   public async destroy(tran?: DBTransaction): Promise<void> {
      330 |     if (tran == null) {
    > 331 |       return await this.db.withTransactionF(async (tran) =>
          |                            ^
      332 |         this.destroy_(tran),
      333 |       );
      334 |     }

      at constructor_.withTransactionF (node_modules/@matrixai/db/src/DB.ts:191:24)
      at constructor_.destroy (src/vaults/VaultInternal.ts:331:28)
      at node_modules/@matrixai/async-init/src/CreateDestroyStartStop.ts:87:36
      at node_modules/@matrixai/async-locks/src/RWLockWriter.ts:223:53
      at withF (node_modules/@matrixai/resources/src/utils.ts:24:18)
      at Object.<anonymous> (tests/vaults/VaultOps.test.ts:83:5)

  ● VaultOps › updating a secret 10 times

    thrown: "Exceeded timeout of 40000 ms for a test.
    Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."

      181 |     ).toStrictEqual('secret-content-change');
      182 |   });
    > 183 |   test(
          |   ^
      184 |     'updating a secret 10 times',
      185 |     async () => {
      186 |       await vaultOps.addSecret(vault, 'secret-1', 'secret-content');

      at tests/vaults/VaultOps.test.ts:183:3
      at Object.<anonymous> (tests/vaults/VaultOps.test.ts:19:1)


Test Suites: 7 failed, 61 passed, 68 total
Tests:       26 failed, 3 todo, 359 passed, 388 total
Snapshots:   0 total
Time:        604.628 s
Ran all test suites.
GLOBAL TEARDOWN
Destroying Global Data Dir: /tmp/polykey-test-global-aotD4w

The pipeline just finished so here are the results of that as well: https://gitlab.com/MatrixAI/open-source/js-polykey/-/pipelines/577509931/test_report. More failures on mac than locally and the tests took significantly longer, but that might be including setup time.

@emmacasolin
Copy link
Contributor Author

This is one of the tests that sometimes times out. While most of the KeyManager tests mock the key generation, this one uses the actual implementation. However, does this test really need to use the real generateDeterministicKeyPair?

test(
  'creates a recovery code and can recover from the same code',
  async () => {
    // Use the real generateDeterministicKeyPair
    mockedGenerateDeterministicKeyPair.mockRestore();
    const keysPath = `${dataDir}/keys`;
    // Minimum key pair size is 1024
    // Key pair generation can take 4 to 15 seconds
    const keyManager = await KeyManager.createKeyManager({
      password,
      keysPath,
      rootKeyPairBits: 1024,
      logger,
    });
    const nodeId = keyManager.getNodeId();
    // Acquire the recovery code
    const recoveryCode = keyManager.getRecoveryCode()!;
    expect(recoveryCode).toBeDefined();
    await keyManager.stop();
    // Oops forgot the password
    // Use the recovery code to recover and set the new password
    await keyManager.start({
      password: 'newpassword',
      recoveryCode,
    });
    expect(await keyManager.checkPassword('newpassword')).toBe(true);
    expect(keyManager.getNodeId()).toStrictEqual(nodeId);
    await keyManager.stop();
  },
  global.defaultTimeout * 2,
);

@CMCDragonkai
Copy link
Member

The npm run build isn't really necessary for test jobs, and for ts demo lib, it shouldn't be necessary either for the build jobs. A separate build:dist job can be done to do npm run build to acquire the dist where it is needed. This is already done in TS demo lib. So you should remove the npm run build from our testing and build jobs, that can also speed up our CICD times.

For native addons, an additional work must be done for caching the build. I'm going to have that applied to TS demo lib native, and js-db, since js-polykey doesn't have native addons in this repository.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jul 1, 2022

For windows and macos runners, we have to solve the caching problem of their dependencies before we can do sharding.

For windows see: #394 (comment)

For macos remember that mac machines on CICD is x86 and on Mac in office is different. So we can debug how to get homebrew to cache properly on our mac in office, must make it portable to the CICD. You can try first by trying to cache both the cache and the formula directory. Check if brew --cache is pointing to the same location for both CICD and on matrix-mac-1 in office. In matrix-mac-1, it is ~/Library/Caches/Homebrew, but in CICD, this should be overwritten with HOMEBREW_CACHE variable. Also: https://docs.brew.sh/Tips-N'-Tricks

There's a few other env variables to set to prevent further updates... but I hadn't investigated further how homebrew works.

@CMCDragonkai
Copy link
Member

For shard timeout debugging, make sure to use perf_hooks, iotop, iftop and htop at the same time. Use the max workers to try and replicate a test timeout failure for just 2 workers.

@emmacasolin
Copy link
Contributor Author

The --maxWorkers option is supposed to specify the maximum number of workers that will be spawned for running tests, and while I can see that only two test files are running at a time when I set --maxWorkers=2, htop shows way more jest child processes than 2 running at a time - it's closer to 10 on average. When setting --runInBand these child processes don't appear, so they're definitely child processes from the worker pool since --runInBand doesn't create a worker pool.

But if jest is creating more than two workers even when running only two test files, this might be why we get timeouts. If just a single test file spreads across multiple cores then having too many at the same time will start to block each other. Even when using only one worker (or running just one test file) there are 11 of these in htop

image

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jul 1, 2022 via email

@CMCDragonkai
Copy link
Member

I'm going to add an additional task here relating to chocolatey package security. The caching of chocolatey package will be essential for CI/CD usage and shards for the windows runner. A new issue will be linked to this PR.

@CMCDragonkai
Copy link
Member

Added #397 to address chocolatey usage.

@emmacasolin
Copy link
Contributor Author

I think the issues we're seeing with timeouts when we run multiple tests are just a problem with Jest consuming resources, as this and many other articles point out: https://medium.com/stuart-engineering/a-practical-guide-to-removing-frustration-with-jest-and-jenkins-cc73dd332152

Jest will use all of the resources available, and most people seem to recommend setting --maxWorkers to 50% for this reason. I don't think the timeouts are a problem with our tests, I think we just need to work out the most ideal jest settings for each of our testing environments (both locally and on each platform on the CI/CD).

@CMCDragonkai
Copy link
Member

Would that help on the cicd? We only have 1 core on the gitlab Linux runners.

Perhaps it will help for the windows and Mac runners.

@CMCDragonkai
Copy link
Member

I read the post, why don't you try with 50% and see if it helps free up some CPU to do CPU intensive work and reduce timeouts? Do it on your local system to check first.

@emmacasolin
Copy link
Contributor Author

Yeah setting it to 50% locally reduces the timeouts significantly, and 40% removes them entirely. I'm running the mac CI/CD tests right now with 50% and they still have timeouts, but I'll need to wait to see the test output at the end to know if the number of timeouts was reduced. As for Linux, we don't really see all that many timeouts, but we also aren't using shards there yet.

@emmacasolin
Copy link
Contributor Author

Setting --maxWorkers=50% for the mac runners seems to have increased the test success rate from 84% to 92%, and has also decreased the total test time by about 1000s (this value is not "real time", it's the combined time for all tests ignoring tests running in parallel).

@emmacasolin
Copy link
Contributor Author

emmacasolin commented Jul 4, 2022

For windows and macos runners, we have to solve the caching problem of their dependencies before we can do sharding.

For windows see: #394 (comment)

For macos remember that mac machines on CICD is x86 and on Mac in office is different. So we can debug how to get homebrew to cache properly on our mac in office, must make it portable to the CICD. You can try first by trying to cache both the cache and the formula directory. Check if brew --cache is pointing to the same location for both CICD and on matrix-mac-1 in office. In matrix-mac-1, it is ~/Library/Caches/Homebrew, but in CICD, this should be overwritten with HOMEBREW_CACHE variable. Also: https://docs.brew.sh/Tips-N'-Tricks

There's a few other env variables to set to prevent further updates... but I hadn't investigated further how homebrew works.

I think the cache is working correctly.

It gets downloaded at the beginning with no issues:

Checking cache for feature-test-load-balancing-protected...
Runtime platform                                    arch=amd64 os=darwin pid=1215 revision=febb2a09 version=15.0.0
Downloading cache.zip from https://storage.googleapis.com/gitlab-com-macos-runner-cache/project/19919[26](https://gitlab.com/MatrixAI/open-source/js-polykey/-/jobs/2672865825#L26)8/feature-test-load-balancing-protected 
Downloading cache 146.[27](https://gitlab.com/MatrixAI/open-source/js-polykey/-/jobs/2672865825#L27) MB/146.27 MB (33.8 MB/s)                
Successfully extracted cache

And while downloading dependencies it won't redownload all of them, e.g.:

==> Downloading https://ghcr.io/v2/homebrew/core/gnupg/blobs/sha256:a80cd7d926b0b82a15fb956f1e2fc137fb024f318807d666dd64133538b2f977
Already downloaded: /Users/gitlab/builds/MatrixAI/open-source/js-polykey/tmp/Homebrew/downloads/fdb6445c6f44d6a2bfa01e836b0f910d267aa990d01b0f37849097a43112a21f--gnupg--2.3.6.big_sur.bottle.tar.gz

I noticed this message after node gets installed though, which made me realise that there was some updating being done, which is why the setup was taking so long:

==> Summary
🍺  /usr/local/Cellar/node@16/16.15.1: 1,850 files, 47.6MB
==> Upgrading 5 dependents of upgraded formulae:
Disable this behaviour by setting HOMEBREW_NO_INSTALLED_DEPENDENTS_CHECK.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
curl 7.74.0 -> 7.84.0, gnutls 3.6.15 -> 3.7.6, gnupg 2.2.25 -> 2.3.6, nghttp2 1.42.0_1 -> 1.48.0, unbound 1.13.0 -> 1.16.0

Setting HOMEBREW_NO_INSTALLED_DEPENDENTS_CHECK now stops all of these updates from happening and speeds up the setup time by about half:

==> Summary
🍺  /usr/local/Cellar/node@16/16.15.1: 1,850 files, 47.6MB
Warning: HOMEBREW_NO_INSTALLED_DEPENDENTS_CHECK is set: not checking for outdated
dependents or dependents with broken linkage!

There are actually still these dependencies that still need to be installed each time, even with the above env variable set

==> Installing dependencies for node@16: c-ares, icu4c, libnghttp2, libuv, ca-certificates and [email protected]

There are quite a few options/variables you can set that affect brew install so it may be worth looking into these https://docs.brew.sh/Manpage#install-options-formulacask-

@emmacasolin
Copy link
Contributor Author

There are actually still these dependencies that still need to be installed each time, even with the above env variable set

==> Installing dependencies for node@16: c-ares, icu4c, libnghttp2, libuv, ca-certificates and [email protected]

There are quite a few options/variables you can set that affect brew install so it may be worth looking into these https://docs.brew.sh/Manpage#install-options-formulacask-

This option might be useful here:

--ignore-dependencies: An unsupported Homebrew development flag to skip installing any dependencies of any kind. If the dependencies are not already present, the formula will have issues. If you’re not developing Homebrew, consider adjusting your PATH rather than using this flag.

@emmacasolin
Copy link
Contributor Author

Even when setting --runInBand, so we're only running one test at a time, the tests still take significantly longer on the mac runners compared to locally. I have three theories for why this may be:

  1. The Jest cache isn't working correctly for the mac runners - this would also explain why the order of tests on the mac runners is always the same and is not in runtime order (i.e. Jest's default sorting for the test order, which relies on cached data from previous runs, has to fall back on using filesize to determine test order, which it looks like it's doing)
  2. Even with only one test running at a time Jest is using too much RAM - if the mac runners are running out of ram then that will cause the tests to run slower, however, you would expect the tests to get slower the longer the job runs if this was what was happening and this isn't the case
  3. The mac runners are just slower than my local machine

I think the first theory is most likely, and it has the most evidence pointing towards it. The question is why would the cache not be working?

@emmacasolin
Copy link
Contributor Author

Just a note that the files build:platforms-generate.sh and check:test-generate.sh can't be cloned onto matrix-win-1 because : is a forbidden character on windows. I don't think it causes any problems on the CI/CD so just something to be aware of.

@emmacasolin
Copy link
Contributor Author

Interestingly, the Jest cache (on CI/CD runs) seems to be working fine on Windows - it's just Mac that isn't using the results of past runs to optimise future ones. I'll have to see if this is the case on matrix-mac-1 or if it's just on the CI/CD.

@emmacasolin
Copy link
Contributor Author

There seem to be fewer timeouts on Windows compared to Mac, however there are quite a lot of test failures. A lot of them seem to be simple things like expecting a path to be broken up with / instead of \, so for those I can just add a check for the platform we're on and change the expected format based on that (or maybe it would be possible to use path?). Some of the other ones may be more complex to resolve. I'll work on fixing the tests on matrix-win-1 since it's faster than waiting for the CI/CD.

@CMCDragonkai
Copy link
Member

This will need to rebase on top of #396 when that merges. I'm on the final leg to merge it.

@CMCDragonkai
Copy link
Member

Needs rebase on staging.

In particular take note of the new default.

default:
  interruptible: true
  before_script:
    # Replace this in windows runners that use powershell
    # with `mkdir -Force "$CI_PROJECT_DIR/tmp"`
    - mkdir -p "$CI_PROJECT_DIR/tmp"

It would be good to have those in your child pipelines too.

Once the above is addressed, we merge into staging for further testing.

I'm guessing alot of tests will need to be fixed before merging into master. At the very least, linux tests should be passing.

- Includes test suites that failed to run
- Adds separator between names of nested describes
Introduced sharding for macOS and Windows runners and test directory pipelines for Linux runner
Stopped Mac CI jobs from updating dependencies since this was increasing setup time
Optimised Windows CI/CD setup by internalising and caching chocolatey packages

Fixes #397
.gitlab-ci.yml Outdated Show resolved Hide resolved
@CMCDragonkai
Copy link
Member

Ok if all are resolved, proceed to merge. (Tick off everything).

@emmacasolin emmacasolin merged commit ceaebe6 into staging Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants