Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundling and pkg fixes #26

Merged
merged 12 commits into from
Oct 17, 2023
Merged

Bundling and pkg fixes #26

merged 12 commits into from
Oct 17, 2023

Conversation

tegefaulkes
Copy link
Contributor

@tegefaulkes tegefaulkes commented Oct 6, 2023

Description

This PR focueses on 3 things.

  1. Using esbuild to bundle the polykey CLI.
  2. Getting pkg to work with the bundled code.
  3. Fixing all everything so that polykey CLI is in a workable state.

Issues Fixed

Tasks

  • 1. Using esbuild to bundle the Polykey CLI.
  • 2. Getting pkg to work with the bundled code.
  • 3. Fixing all everything so that Polykey CLI is in a workable state.
  • 4. Replace node2nix with buildNpmPackage
  • 5. Replace vercel/pkg with @yao-pkg/pkg
  • 6. Make sure application, docker, pkg builds are building
  • 7. Ensure that CI is producing and uploading to ECR - will need to confirm everything on staging
  • [ ] 8. Client verification of server certificate needs to be more robust - to be resolved here: Encapsulate WebSocketClient in PolykeyClient Polykey#582 (comment)

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@tegefaulkes tegefaulkes self-assigned this Oct 6, 2023
@ghost
Copy link

ghost commented Oct 6, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@tegefaulkes
Copy link
Contributor Author

Ok, I have modified the build script. The typescript compile stage is now in the prebuild script and compiles to the ./tmp/build directory. The build script uses esbuild and the code in ./tmp/build and outputs a single polykey.js file to the ./dist directory.

esbuild can't resolve any dynamic imports, so it has a fun time with the native modules. In the case of DB, it's loading the .node modules from the directory ../../ which remains unchanged when bundled. The current fix for this is to specify these modules as external so that esbuild doesn't try to include these and change the path. That just means we'll need to tell pkg to include these as assets wholesale.

We can get a little more fancy with this. I think esbuild includes some ways to custom resolve paths, but I'm not sure it even recognises these as paths. Worst case we can be more specific with excluding and monkey patch some code and copy the files we need over.

Next steps.

  1. Make sure the worker script is included and working.
  2. fix up pkg, may have to revert versions back to node 18 or 19 for this.

@tegefaulkes
Copy link
Contributor Author

Fixed the worker script problem. I just added a new polykeyWorker.ts that reexports the worker script from polykey. I then added this as a target for esbuild. This is then used as the worker script.

It's a little hacky, since the worker is loaded with new Worker('./polykeyWorker') in polykey. esbuild doesn't recognise it as an import so it's bundled verbatim. It will then attempt to load the the script from ./ when running the bundled code. Since we added our own it just loads that.

We can do similar things for the .node files. For example the db loads them from the ../../ directory. This is not ideal since after bundling that points outside of the root directory. But if we can monkey patch that string in the bundled code and copy over the required .node files then that should work.

@tegefaulkes
Copy link
Contributor Author

We'll worry about the .node files later.

Moving on to getting pkg working.

@tegefaulkes
Copy link
Contributor Author

After reverting to node 18, running pkg is working.

To move on from here I need to do a few things.

  1. Might have to downgrade Polykey repo node version down to v18, and possible other dependencies need this applied as well.
  2. I need a released version of Polykey, there is some nuance with how the node_modules are structured when linking locally.
  3. I need to test the packaged file on some platforms to confirm that it works.

I think at this point I can start working on fixing Polykey generally.

@CMCDragonkai
Copy link
Member

What about node 19?

@tegefaulkes
Copy link
Contributor Author

It was easiest to revert the changes back to 18 for now. We can try version 19 but I think that's low priority.

@CMCDragonkai
Copy link
Member

Staging fixes should be done here too.

@CMCDragonkai
Copy link
Member

If we are using the esbuild output in the docker image build. This means our docker image no longer requires all of the npm packages that was structured before. This means all the relevant JS code has been embedded into the single polykey.js output file.

The only thing that is required separately would be the worker script, and the .node native shared objects.

This should mean the image becomes MUCH smaller.

Especially if minification can work.

If minification doesn't work atm, just create an issue for it, we can revisit after we have deployed testnet 6.

@CMCDragonkai
Copy link
Member

A few things to consider here.

  1. The pkg tool is only used as part of nix build scripts, and the expectation is that this becomes a single executable.
  2. We have just introduced esbuild as an intermediate step to bundle up the JS code before going to pkg. This means esbuild execution should be part of the release.nix derivations right before pkg.
  3. However esbuild output can also be useful for the docker image, as it can reduce the size of the docker image as well.
  4. But the docker image right now uses node2nix and there's some derivations in utils.nix that sets that up. And this of course relies on npm run build.
  5. In that case, if you incorporate esbuild into npm run build stage, that means the dist is no longer a regular npm package, and you'd want to move most dependencies to dev dependencies.

I think to ensure consistency - you'd want to incorporate esbuild into the nix scripts, not as part of npm run build.

@CMCDragonkai
Copy link
Member

Currently docker image doesn't really the esbuild output. It could continue using the regular dist.

But only pkg needs the esbuild output. I think we can incorporate it directly into the release.nix.

Will need to think about how to incorporate the steps separately for docker later.

@CMCDragonkai
Copy link
Member

I can have a look at this tomorrow. But please post up any build failure logs as to the reason for docker image build, if we can keep esbuild to the pkg build process, then docker image builds should be the same as before.

@tegefaulkes
Copy link
Contributor Author

Build is failing with the following.

> [email protected] prebuild
> shx rm -rf ./tmp/build && tsc -p ./tsconfig.build.json

src/utils/parsers.ts:46:7 - error TS2742: The inferred type of 'parseNodeId' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/ids/types'. This is likely not portable. A type annotation is necessary.

46 const parseNodeId = validateParserToArgParser(validationUtils.parseNodeId);
         ~~~~~~~~~~~

src/utils/parsers.ts:47:7 - error TS2742: The inferred type of 'parseGestaltId' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/ids/types'. This is likely not portable. A type annotation is necessary.

47 const parseGestaltId = validateParserToArgParser(
         ~~~~~~~~~~~~~~

src/utils/parsers.ts:53:7 - error TS2742: The inferred type of 'parseHost' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/network/types'. This is likely not portable. A type annotation is necessary.

53 const parseHost = validateParserToArgParser(validationUtils.parseHost);
         ~~~~~~~~~

src/utils/parsers.ts:54:7 - error TS2742: The inferred type of 'parseHostname' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/network/types'. This is likely not portable. A type annotation is necessary.

54 const parseHostname = validateParserToArgParser(validationUtils.parseHostname);
         ~~~~~~~~~~~~~

src/utils/parsers.ts:55:7 - error TS2742: The inferred type of 'parseHostOrHostname' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/network/types'. This is likely not portable. A type annotation is necessary.

55 const parseHostOrHostname = validateParserToArgParser(
         ~~~~~~~~~~~~~~~~~~~

src/utils/parsers.ts:58:7 - error TS2742: The inferred type of 'parsePort' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/network/types'. This is likely not portable. A type annotation is necessary.

58 const parsePort = validateParserToArgParser(validationUtils.parsePort);
         ~~~~~~~~~

src/utils/parsers.ts:59:7 - error TS2742: The inferred type of 'parseNetwork' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/nodes/types'. This is likely not portable. A type annotation is necessary.

59 const parseNetwork = validateParserToArgParser(validationUtils.parseNetwork);
         ~~~~~~~~~~~~

src/utils/parsers.ts:60:7 - error TS2742: The inferred type of 'parseSeedNodes' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/nodes/types'. This is likely not portable. A type annotation is necessary.

60 const parseSeedNodes = validateParserToArgParser(
         ~~~~~~~~~~~~~~

src/utils/parsers.ts:63:7 - error TS2742: The inferred type of 'parseProviderId' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/ids/types'. This is likely not portable. A type annotation is necessary.

63 const parseProviderId = validateParserToArgParser(
         ~~~~~~~~~~~~~~~

src/utils/parsers.ts:66:7 - error TS2742: The inferred type of 'parseIdentityId' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/ids/types'. This is likely not portable. A type annotation is necessary.

66 const parseIdentityId = validateParserToArgParser(
         ~~~~~~~~~~~~~~~

src/utils/parsers.ts:70:7 - error TS2742: The inferred type of 'parseProviderIdList' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/ids/types'. This is likely not portable. A type annotation is necessary.

70 const parseProviderIdList = validateParserToArgListParser(
         ~~~~~~~~~~~~~~~~~~~


Found 11 errors in the same file, starting at: src/utils/parsers.ts:46

/nix/store/5s1yg5l36wzgy1dj0vv1ibarc4g7vrdr-stdenv-linux/setup: line 136: pop_var_context: head of shell_variables not a function context
error: builder for '/nix/store/l55ly6bxr44cq3s7k6a9bmbw7pgk81nd-polykey-cli-0.0.1.drv' failed with exit code 1;
       last 10 log lines:
       >
       > src/utils/parsers.ts:70:7 - error TS2742: The inferred type of 'parseProviderIdList' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/ids/types'. This is likely not portable. A type annotation is necessary.
       >
       > 70 const parseProviderIdList = validateParserToArgListParser(
       >          ~~~~~~~~~~~~~~~~~~~
       >
       >
       > Found 11 errors in the same file, starting at: src/utils/parsers.ts:46
       >
       > /nix/store/5s1yg5l36wzgy1dj0vv1ibarc4g7vrdr-stdenv-linux/setup: line 136: pop_var_context: head of shell_variables not a function context
       For full logs, run 'nix log /nix/store/l55ly6bxr44cq3s7k6a9bmbw7pgk81nd-polykey-cli-0.0.1.drv'.
error: 1 dependencies of derivation '/nix/store/ylr6j3l8wnncdyzsby2bzn499zbv6i1i-polykey-cli-0.0.1.drv' failed to build

Doing prebuild normally runs fine. Not sure why it's an issue in nix-build.

Simplest fix would be to add the type annotations in but maybe not the best fix.

@tegefaulkes
Copy link
Contributor Author

Right now fixing up the tests and getting the build working can be done in parallel. It's possible to build and do a Polykey agent start just fine so that's the minimum needed for testing building and pkg.

What's broken in tests right now is how errors are serialised and deseralized. It's something I'm working on int js-rpc. As part of this I'm making the toError responsible for wrapping the errors in a ErrorXRemote error.

@CMCDragonkai
Copy link
Member

Build is failing with the following.

> [email protected] prebuild
> shx rm -rf ./tmp/build && tsc -p ./tsconfig.build.json

src/utils/parsers.ts:46:7 - error TS2742: The inferred type of 'parseNodeId' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/ids/types'. This is likely not portable. A type annotation is necessary.

46 const parseNodeId = validateParserToArgParser(validationUtils.parseNodeId);
         ~~~~~~~~~~~

src/utils/parsers.ts:47:7 - error TS2742: The inferred type of 'parseGestaltId' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/ids/types'. This is likely not portable. A type annotation is necessary.

47 const parseGestaltId = validateParserToArgParser(
         ~~~~~~~~~~~~~~

src/utils/parsers.ts:53:7 - error TS2742: The inferred type of 'parseHost' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/network/types'. This is likely not portable. A type annotation is necessary.

53 const parseHost = validateParserToArgParser(validationUtils.parseHost);
         ~~~~~~~~~

src/utils/parsers.ts:54:7 - error TS2742: The inferred type of 'parseHostname' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/network/types'. This is likely not portable. A type annotation is necessary.

54 const parseHostname = validateParserToArgParser(validationUtils.parseHostname);
         ~~~~~~~~~~~~~

src/utils/parsers.ts:55:7 - error TS2742: The inferred type of 'parseHostOrHostname' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/network/types'. This is likely not portable. A type annotation is necessary.

55 const parseHostOrHostname = validateParserToArgParser(
         ~~~~~~~~~~~~~~~~~~~

src/utils/parsers.ts:58:7 - error TS2742: The inferred type of 'parsePort' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/network/types'. This is likely not portable. A type annotation is necessary.

58 const parsePort = validateParserToArgParser(validationUtils.parsePort);
         ~~~~~~~~~

src/utils/parsers.ts:59:7 - error TS2742: The inferred type of 'parseNetwork' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/nodes/types'. This is likely not portable. A type annotation is necessary.

59 const parseNetwork = validateParserToArgParser(validationUtils.parseNetwork);
         ~~~~~~~~~~~~

src/utils/parsers.ts:60:7 - error TS2742: The inferred type of 'parseSeedNodes' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/nodes/types'. This is likely not portable. A type annotation is necessary.

60 const parseSeedNodes = validateParserToArgParser(
         ~~~~~~~~~~~~~~

src/utils/parsers.ts:63:7 - error TS2742: The inferred type of 'parseProviderId' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/ids/types'. This is likely not portable. A type annotation is necessary.

63 const parseProviderId = validateParserToArgParser(
         ~~~~~~~~~~~~~~~

src/utils/parsers.ts:66:7 - error TS2742: The inferred type of 'parseIdentityId' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/ids/types'. This is likely not portable. A type annotation is necessary.

66 const parseIdentityId = validateParserToArgParser(
         ~~~~~~~~~~~~~~~

src/utils/parsers.ts:70:7 - error TS2742: The inferred type of 'parseProviderIdList' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/ids/types'. This is likely not portable. A type annotation is necessary.

70 const parseProviderIdList = validateParserToArgListParser(
         ~~~~~~~~~~~~~~~~~~~


Found 11 errors in the same file, starting at: src/utils/parsers.ts:46

/nix/store/5s1yg5l36wzgy1dj0vv1ibarc4g7vrdr-stdenv-linux/setup: line 136: pop_var_context: head of shell_variables not a function context
error: builder for '/nix/store/l55ly6bxr44cq3s7k6a9bmbw7pgk81nd-polykey-cli-0.0.1.drv' failed with exit code 1;
       last 10 log lines:
       >
       > src/utils/parsers.ts:70:7 - error TS2742: The inferred type of 'parseProviderIdList' cannot be named without a reference to 'polykey-cli/node_modules/polykey/dist/ids/types'. This is likely not portable. A type annotation is necessary.
       >
       > 70 const parseProviderIdList = validateParserToArgListParser(
       >          ~~~~~~~~~~~~~~~~~~~
       >
       >
       > Found 11 errors in the same file, starting at: src/utils/parsers.ts:46
       >
       > /nix/store/5s1yg5l36wzgy1dj0vv1ibarc4g7vrdr-stdenv-linux/setup: line 136: pop_var_context: head of shell_variables not a function context
       For full logs, run 'nix log /nix/store/l55ly6bxr44cq3s7k6a9bmbw7pgk81nd-polykey-cli-0.0.1.drv'.
error: 1 dependencies of derivation '/nix/store/ylr6j3l8wnncdyzsby2bzn499zbv6i1i-polykey-cli-0.0.1.drv' failed to build

Doing prebuild normally runs fine. Not sure why it's an issue in nix-build.

Simplest fix would be to add the type annotations in but maybe not the best fix.

It's possible solving MatrixAI/Polykey#532 will have a side-effect of solving this problem.

@CMCDragonkai
Copy link
Member

So you have 2 problems right now:

  1. You're fixing tests for PK CLI - you need to use 1.2.1-alpha.3 that I just released with the new client domain. Or you can link it to the current staging right now.
  2. The docker build should not be using the esbuild step. Move the esbuild step to the pkg outputs in release.nix. We can figure out the esbuild docker output integration later.

I can look at 2. now.

Comment on lines +22 to +23
const { WebSocketClient } = await import('@matrixai/ws');
const clientUtils = await import('polykey/dist/client/utils');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems problematic...

Surely it's possible to expose the necessary abstractions directly from PK?

Comment on lines 53 to 66
webSocketClient = await WebSocketClient.createWebSocketClient({
expectedNodeIds: [clientOptions.nodeId],
config: {
verifyPeer: true,
verifyCallback: async (certs) => {
await clientUtils.verifyServerCertificateChain(
[clientOptions.nodeId],
certs,
);
},
},
host: clientOptions.clientHost,
port: clientOptions.clientPort,
logger: this.logger.getChild(WebSocketClient.name),
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't PolykeyClient encapsulate the WS client setup. I don't like having every command setup its own websocket transport. That seems very incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we wanted PolykeyClient to be agnostic to transport like the RPC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No of course not!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can PK client be agnostic, and yet PK agent be not agnostic. Doesn't make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amydevs can you help on this problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok once WebSocketClient is moved into the PolykeyClient. That should mean @matrixai/ws should be removed from the package.json.

Comment on lines -59 to +68
streamFactory: (ctx) => webSocketClient.startConnection(ctx),
streamFactory: () => webSocketClient.connection.newStream(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make sense. PolykeyClient should be doing this all automatically.

@CMCDragonkai
Copy link
Member

To deal with docker build I need to try solve MatrixAI/Polykey#532, then you can try and see if that fixes it.

@tegefaulkes
Copy link
Contributor Author

I had to disable 2 tests for ping.test.ts, these are testing failing ping. It's throwing an error when it tries to refresh the timeout timer after it has timed out. I'll need to fix the RPC to ignore that since it's fully allowed to keep running after a timeout.

@CMCDragonkai
Copy link
Member

I had to disable 2 tests for ping.test.ts, these are testing failing ping. It's throwing an error when it tries to refresh the timeout timer after it has timed out. I'll need to fix the RPC to ignore that since it's fully allowed to keep running after a timeout.

What does this have to do with RPC?

@tegefaulkes
Copy link
Contributor Author

It's the timer is being refreshed inside the RPC logic. It's being refreshed after timing out and thus throwing an error.

@tegefaulkes
Copy link
Contributor Author

I'm disabling 'concurrent with bootstrap results in 1 success' in start.test.ts. Both the start and bootstrap are succeeding here. It's not worth looking into right now.

@CMCDragonkai
Copy link
Member

I'm disabling 'concurrent with bootstrap results in 1 success' in start.test.ts. Both the start and bootstrap are succeeding here. It's not worth looking into right now.

Why is this failing?

@tegefaulkes
Copy link
Contributor Author

Ok, all tests are passing now except for those 3 tests I disabled.

I need to write a test and fix for js-rpc dealing with messages sent and received after the handler has timed out.

@tegefaulkes
Copy link
Contributor Author

I'm disabling 'concurrent with bootstrap results in 1 success' in start.test.ts. Both the start and bootstrap are succeeding here. It's not worth looking into right now.

Why is this failing?

The test starts a polykey agent and runs a bootstrap concurrently. It expects that one of these will fail but both are succeeding. If I had to guess its bootstrapping and then starting without conflicting as a race condition. A failure of the test itself essentially.

@tegefaulkes
Copy link
Contributor Author

I pushed some fixes to Polykey and doing another pre-release. The next version should be 1.2.1-alpha.4.

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Oct 12, 2023

Ok, after doing a prerelease on Polykey and using that version for testing we are getting failure. Seems that Polykey is failing to start now. here is the output.

[faulkes@faulkeswork:~/matrixcode/polykey/Polykey-CLI]$ npm run polykey -- agent start --node-path ./tmp/sdfghdfasd -v

> [email protected] polykey
> ts-node src/polykey.ts agent start --node-path ./tmp/sdfghdfasd -v

✔ Enter new password … ********
✔ Confirm new password … ********
INFO:polykey.PolykeyAgent:Creating PolykeyAgent
INFO:polykey.PolykeyAgent:Setting umask to 077
INFO:polykey.PolykeyAgent:Setting node path to ./tmp/sdfghdfasd
INFO:polykey.PolykeyAgent.Status:Starting Status
INFO:polykey.PolykeyAgent.Status:Writing Status to tmp/sdfghdfasd/status.json
INFO:polykey.PolykeyAgent.Status:Status is STARTING
INFO:polykey.PolykeyAgent.Schema:Creating Schema
INFO:polykey.PolykeyAgent.Schema:Starting Schema
INFO:polykey.PolykeyAgent.Schema:Setting state path to tmp/sdfghdfasd/state
INFO:polykey.PolykeyAgent.Schema:Started Schema
INFO:polykey.PolykeyAgent.Schema:Created Schema
INFO:polykey.PolykeyAgent.KeyRing:Creating KeyRing
INFO:polykey.PolykeyAgent.KeyRing:Setting keys path to tmp/sdfghdfasd/state/keys
INFO:polykey.PolykeyAgent.KeyRing:Starting KeyRing
INFO:polykey.PolykeyAgent.KeyRing:Checking tmp/sdfghdfasd/state/keys/private.jwk
INFO:polykey.PolykeyAgent.KeyRing:Generating root key pair and recovery code
INFO:polykey.PolykeyAgent.KeyRing:Checking tmp/sdfghdfasd/state/keys/db.jwk
INFO:polykey.PolykeyAgent.KeyRing:Generating db key
INFO:polykey.PolykeyAgent.KeyRing:Started KeyRing
INFO:polykey.PolykeyAgent.KeyRing:Created KeyRing
INFO:polykey.PolykeyAgent.DB:Creating DB
INFO:polykey.PolykeyAgent.DB:Starting DB
INFO:polykey.PolykeyAgent.DB:Setting DB path to tmp/sdfghdfasd/state/db
INFO:polykey.PolykeyAgent.DB:Started DB
INFO:polykey.PolykeyAgent.DB:Created DB
INFO:polykey.PolykeyAgent:Creating TaskManager
INFO:polykey.PolykeyAgent:Starting TaskManager in Lazy Mode
INFO:polykey.PolykeyAgent:Begin Tasks Repair
INFO:polykey.PolykeyAgent:Finish Tasks Repair
INFO:polykey.PolykeyAgent:Started TaskManager
INFO:polykey.PolykeyAgent:Created TaskManager
INFO:polykey.PolykeyAgent.CertManager:Creating CertManager
INFO:polykey.PolykeyAgent.CertManager:Starting CertManager
INFO:polykey.PolykeyAgent.CertManager:Begin current certificate setup
INFO:polykey.PolykeyAgent.CertManager:Generating new current certificate
WARN:polykey.PolykeyAgent:Failed Creating PolykeyAgent
INFO:polykey.PolykeyAgent:Stopping TaskManager
INFO:polykey.PolykeyAgent:Stopped TaskManager
INFO:polykey.PolykeyAgent.DB:Stopping DB
INFO:polykey.PolykeyAgent.DB:Stopped DB
INFO:polykey.PolykeyAgent.KeyRing:Stopping KeyRing
INFO:polykey.PolykeyAgent.KeyRing:Stopped KeyRing
INFO:polykey.PolykeyAgent.Schema:Stopping Schema
INFO:polykey.PolykeyAgent.Schema:Stopped Schema
INFO:polykey.PolykeyAgent.Status:Stopping Status
INFO:polykey.PolykeyAgent.Status:Writing Status to tmp/sdfghdfasd/status.json
INFO:polykey.PolykeyAgent.Status:Status is DEAD
Error: Cannot initialize GeneralNames. Incorrect incoming arguments

I'm guessing the last line before the problem happens is INFO:polykey.PolykeyAgent.CertManager:Generating new current certificate it means that the certificate generation is failing.

I recently removed the webcrypto polyfill monkey patch since it wasn't required in node20, but since we're running in node18 here that might be a problem? The only other difference that could be the issue here is that the node_module dependencies could be different without linking polykey.

Also note that, If I switch back to linking the local polykey code even though it's the same version as the prerelease it works fine.

@CMCDragonkai
Copy link
Member

Task 3 will depend on fixing the test case.

If @addievo can get the encapsulation fully working, we could incorporate here, but I think this will probably merge first, then another PR to integrate that change.

@felschr
Copy link

felschr commented Oct 16, 2023

It is necessary to ensure that the lock file is properly generated, that means sometimes deleting the node_modules and package-lock.json and regenerating from scratch. - this will need to go into README.md

npm issues that track the problem of missing integrity & resolved fields:
npm/cli#4263, npm/cli#4460 npm/cli#6301

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 17, 2023

Docker build works fine right now.

[nix-shell:~/Projects/Polykey-CLI]$ docker run -it -v "$(pwd)/tmp:/srv" polykey-cli-0.0.1:lp7f5j5wmcxjn6fvhry63sk4fr4jra06 agent start -np /srv
✔ Enter new password … ******
✔ Confirm new password … ******
pid	1
nodeId	"v7poq4ut3in64l9r81tqe4bicfire2quhhtl7ga6fg218gp7e1c80"
clientHost	"127.0.0.1"
clientPort	34707
agentHost	"::"
agentPort	39376
recoveryCode	"kitchen wool romance mirror buyer old welcome arctic pistol horse shrug hundred general gain wage where milk resist pepper absorb bean avoid foster bicycle"
^C

@CMCDragonkai
Copy link
Member

So the esbuilt bundle seems to behave differently in relation to signal handling - in particular cleanup.

[nix-shell:~/Projects/Polykey-CLI]$ node ./build/polykey.js agent start
✔ Please enter the password … ******
pid	1161525
nodeId	"vj4tcqd7fom79fo3q418g1616739260ngdibpteguvaid49kcakjg"
clientHost	"::1"
clientPort	36029
agentHost	"::"
agentPort	47935
[nix-shell:~/Projects/Polykey-CLI]$ dist/polykey.js agent start
✔ Please enter the password … ******
pid	1161652
nodeId	"vj4tcqd7fom79fo3q418g1616739260ngdibpteguvaid49kcakjg"
clientHost	"::1"
clientPort	37555
agentHost	"::"
agentPort	56440

The issue is that Ctrl+C is immediate for dist/ while Ctrl+C takes some time for build/.


This may be related to the occasional exception I get when running polykey agent against an existing node state.

ERROR:polykey.PolykeyAgent.scheduler:Failed scheduling loop TypeError: The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received an instance of Object
ErrorTaskManagerScheduler: TaskManager scheduling loop encountered an unrecoverable error
  cause: TypeError: The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received an instance of Object

Also I had seen this earlier: #26 (comment) but without the earlier Failed scheduling loop error message.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 17, 2023

@tegefaulkes you should update your VBox platform, the docker problem is an old problem we already know from a while ago. The solution is to match our other platform config. NixOS/nixpkgs#170279

@tegefaulkes
Copy link
Contributor Author

Ok, so we have found two small issues with the CLI that I think are inter-related.

  1. The esbuild output, when running pk agent start and then closing the program with ctrl c doesn't do a graceful shutdown of the PolykeyAgen. Where as the normal build output does this properly. I've Can see that the clean up is being called, It's just not waiting for it to do anything before the process exits.
  2. When running the agent after shutting it down, we sometimes get an error coming from the TaskManager scheduler. I'll have a snippet of the error below.
pid     4167964
nodeId  "vh75uomlvc1fm710e1nepu71kfj1fs1lag99or0htq9n9f9gcsqpg"
clientHost      "::1"
clientPort      40871
agentHost       "::"
agentPort       40755
TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received an instance of Object
    at new NodeError (node:internal/errors:405:5)
    at Function.from (node:buffer:333:9)
    at decrypt (/home/faulkes/matrixcode/polykey/Polykey-CLI/dist/polykeyWorker.js:34861:59)
    at /home/faulkes/matrixcode/polykey/Polykey-CLI/dist/polykeyWorker.js:478:24
    at Generator.next (<anonymous>)
    at /home/faulkes/matrixcode/polykey/Polykey-CLI/dist/polykeyWorker.js:384:71
    at new Promise (<anonymous>)
    at __awaiter3 (/home/faulkes/matrixcode/polykey/Polykey-CLI/dist/polykeyWorker.js:366:14)
    at runFunction (/home/faulkes/matrixcode/polykey/Polykey-CLI/dist/polykeyWorker.js:475:14)
    at /home/faulkes/matrixcode/polykey/Polykey-CLI/dist/polykeyWorker.js:521:13
ERROR:polykey.PolykeyAgent.scheduler:Failed scheduling loop TypeError: The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received an instance of Object
ErrorTaskManagerScheduler: TaskManager scheduling loop encountered an unrecoverable error
  cause: TypeError: The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received an instance of Object

I'm looking into problem 1. right now. I think 2. is caused by 1.

@CMCDragonkai
Copy link
Member

Ok this is updated with alpha.13 which uses @matrixai/quic at 1.0.0. And the docker works.

However still need to fix the reported problems @tegefaulkes is on and the existing test problems here.

Now I can try to get the CI ready.

@tegefaulkes
Copy link
Contributor Author

progress update.

Problem 2 with the taskmanager. The error is coming out of the worker script. The data seems to be formatted badly in the esbuild build.

worker function being called.

decrypt(key, cipherText) {
    console.log('key', key);
    console.log('cipherText', cipherText);
    const plainText = keysUtils.decryptWithKey(Buffer.from(key), Buffer.from(cipherText));
    if (plainText != null) {
        return (0, worker_1.Transfer)(plainText.buffer);
    }
    else {
        return;
    }
}

normal build data

key ArrayBuffer {
  [Uint8Contents]: <02 50 56 20 16 0f 49 2f ed c2 29 c5 d6 5a 0b 5f 7d e5 0b b5 37 5f 20 47 b9 4e d4 77 52 f1 a2 e0>,
  byteLength: 32
}
cipherText ArrayBuffer {
  [Uint8Contents]: <e5 24 f0 37 b5 37 90 9e 86 3e d6 37 f1 02 1c 01 bb 5d f5 eb bf fe 58 60 04 94 39 88 07 c9 9f b3 be af 48 6c 1b 2e 1b ac 4b 95 2b 37 64 87 2e ae a3 91 f8 ef 1b f5 c9 3c 02 73 b4 f6 73 48 1d 8a 3c bf e4 69 28 dc 78 bb ec 2a 30 e7 dc 2d 40 47 35 00 ac 77 ae ce 7e ec 6e 66 6e c7 41 64 a4 79 b5 17 c7 6f ... 198 more bytes>,
  byteLength: 298
}

esbuild data

key {
  send: ArrayBuffer {
    [Uint8Contents]: <02 50 56 20 16 0f 49 2f ed c2 29 c5 d6 5a 0b 5f 7d e5 0b b5 37 5f 20 47 b9 4e d4 77 52 f1 a2 e0>,
    byteLength: 32
  },
  transferables: [
    ArrayBuffer {
      [Uint8Contents]: <02 50 56 20 16 0f 49 2f ed c2 29 c5 d6 5a 0b 5f 7d e5 0b b5 37 5f 20 47 b9 4e d4 77 52 f1 a2 e0>,
      byteLength: 32
    }
  ]
}
cipherText {
  send: ArrayBuffer {
    [Uint8Contents]: <63 9b 45 61 5d 83 bc b9 49 e3 16 c6 5b 79 a0 64 be b6 16 ce cb bd 83 a6 26 cc ae 2d 8c 83 c6 63 a0 7b 12 68 bd 8c 16 95 b3 b8 5c 2d bb cf ec e7 dd 6c 8f a8 44 c5 57 62 56 53 90 06 03 59 f6 8e 32 d9 c9 0e 0a 7b 4b fa f6 d3 35 57 16 f5 34 54 fe 45 20 83 82 bc d3 ca 27 a3 97 2f 67 f4 47 a0 37 e5 a8 23 ... 204 more bytes>,
    byteLength: 304
  },
  transferables: [
    ArrayBuffer {
      [Uint8Contents]: <63 9b 45 61 5d 83 bc b9 49 e3 16 c6 5b 79 a0 64 be b6 16 ce cb bd 83 a6 26 cc ae 2d 8c 83 c6 63 a0 7b 12 68 bd 8c 16 95 b3 b8 5c 2d bb cf ec e7 dd 6c 8f a8 44 c5 57 62 56 53 90 06 03 59 f6 8e 32 d9 c9 0e 0a 7b 4b fa f6 d3 35 57 16 f5 34 54 fe 45 20 83 82 bc d3 ca 27 a3 97 2f 67 f4 47 a0 37 e5 a8 23 ... 204 more bytes>,
      byteLength: 304
    }
  ]
}

@CMCDragonkai
Copy link
Member

So the threads package doesn't work nicely with esbuild output. We just need to make sure to keep it external for now, however we should be replacing it later during ESM migration and it would be important keep in mind this problem.

@CMCDragonkai
Copy link
Member

Ok all tests is passing now.

But there's a weird exit gracefully problem:

A worker process has failed to exit gracefully and has been force exited. This is likely caused by tests leaking due to improper teardown. Try running with --detectOpenHandles to find leaks. Active timers can also cause this, ensure that .unref() was called on them.
Test Suites: 38 passed, 38 total
Tests:       21 skipped, 1 todo, 120 passed, 142 total
Snapshots:   0 total
Time:        56.186 s
Ran all test suites.
GLOBAL TEARDOWN
Destroying Global Data Dir: /tmp/polykey-test-global-kyq3VE

Possibly related to MatrixAI/js-timer#15, not a problem atm.

@CMCDragonkai
Copy link
Member

@tegefaulkes can you start squashing this so it can be merged. We can merge in the replacement of websocketclient afterwards.

Also make sure you're squashing any of my wip commits.

tegefaulkes and others added 11 commits October 17, 2023 17:59
* fixed parser imports to take it from the domain utilities, also exported the validate functions
* all using `binParsers` now
* `esbuild` bundle is used for application build
* changed to using `buildNpmPackage` instead of `node2nix`
* changed to using `@yao-pkg/pkg` instead of `pkg` to use v18.15 nodejs
This makes sure the `esbuild` treats `threads` as an external dependency. There is some monkey patching going on that doesn't work if it's bundled.
It's a race condition with propagating keys changes. We needed to wait and then attempt calls with the new nodeId.

I think there's a separate issue when verifying with the old nodeId. That needs to be checked inside `Polykey` though.
@CMCDragonkai
Copy link
Member

Ok this is squashed, merging to staging to confirm the final CI steps is working which should also be auto-building and pushing to the ECR.

@CMCDragonkai
Copy link
Member

However, still pending major issue is MatrixAI/Polykey#582. That's next priority for PK CLI.

@CMCDragonkai CMCDragonkai merged commit 68f9c7d into staging Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Bundling and fixing pkg Update node2nix with the upstream fixes to building bin links
5 participants