Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent migration stage 2 #535

Merged
merged 27 commits into from
Aug 18, 2023
Merged

Agent migration stage 2 #535

merged 27 commits into from
Aug 18, 2023

Conversation

tegefaulkes
Copy link
Contributor

@tegefaulkes tegefaulkes commented Jul 20, 2023

Description

The following tests were disabled in #534 and need to be fixed and re-enabled here.

  • validation/index.test.ts:372 'manipulate this when using function expressions'
  • vaults/VaultInternal.test.ts:187 'can change commits and preserve the log with no intermediate vault mutation'
  • vaults/VaultInternal.test.ts:236 'can change to the latest commit'
  • vaults/VaultInternal.test.ts:427 'concurrent read operations allowed'
  • vaults/VaultManager.test.ts:473 'with remote agents'

Related issues

Issues Fixed

Tasks

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@tegefaulkes tegefaulkes self-assigned this Jul 20, 2023
@ghost
Copy link

ghost commented Jul 20, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

This was referenced Jul 20, 2023
@tegefaulkes
Copy link
Contributor Author

#525 (comment) needs to be addressed here.

@tegefaulkes
Copy link
Contributor Author

ICE still needs to be re-implemented here. I've already marked out in the code where it needs to be.

@CMCDragonkai
Copy link
Member

What happened to the previous ICE?

@tegefaulkes
Copy link
Contributor Author

There was some refactoring of the connection code. Mostly moving multi connection logic inside of the NodeConnectioManager. I just didn't implement it yet and left it for stage 2 PR.

@tegefaulkes
Copy link
Contributor Author

I'm back onto this now. First step is to address #527. Then to remove uWebsockets in #540.

@tegefaulkes tegefaulkes force-pushed the feature-agent_migration_stage2 branch from 55d7116 to bd7822f Compare August 7, 2023 08:11
@tegefaulkes
Copy link
Contributor Author

#527 is done.

Looking at either removing uws or fixing vaults cloning/pulling next. I think removing uws may be more critical to getting CLI and testnet up and running.

@CMCDragonkai
Copy link
Member

I suggest when working on ws to factor out to js-ws.

@CMCDragonkai
Copy link
Member

Just wanted to mention this. There's no issue tracking this atm, because MatrixAI/js-rpc#2 is actually a much larger issue addressing all forms of binary usage including error management.

Instead, the only thing that needs to be done here is ensuring that there is leading message on both ends even when dropping to raw streams.

This should be done as part of fixing up the vault/git RPC calls.

I don't think it's necessary to create an issue for this... since it's already part of fixing up vaults work.

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Aug 14, 2023

The path

  1. vault RPC error management and streaming, Adding leading response metadata to RPC streams. and cleaning handlers
  2. FAF signalling. Don't wait for signalling to complete before ending RPC call. Also upgrade signalling to coalesce signalling attempts. Also have the relaying node fill in all connection information.
  3. Local discovery with MDNS and node graph expansion to handle extra metadata.
  4. Getting CLI building. Minimally need the docker image working, then PKG working, @CMCDragonkai is working on this.
  5. initial testnet deployment, deploying the test seednodes and stability testing
  6. Running NAT punch through tests, and general network integration testing.

@CMCDragonkai
Copy link
Member

I'm putting ESM on the backburner for now due to problems with workers. It's quite likely we will need to entirely replace the threadsjs with our own implementation. So for now there's quite a few libraries that are fully ESM native, you can't really upgrade to them atm (actually I'm not entirely sure what happens if a CJS project tries to import it, the goal was to be fully ESM).

Instead I'm going to focus on CLI review. The priorities will be local traversal, solidifying the design of the signalling, CLI CI, testnet testing.

We can come back to the ESM stuff soon, and hopefully get a better build.

@tegefaulkes
Copy link
Contributor Author

Added a reverse response message to RPC raw streams. This is provided as part of the return for the raw handler. with the return type as [JSONValue, ReadableStream<Uint8Array>]. The JSONValue provided is the leading response message for that stream. The handler can now also throw, in that case the error will be converted to an RPC error message which will be thrown on the caller side.

@CMCDragonkai
Copy link
Member

Can you abstract around the QUICConfig type? MatrixAI/Polykey-CLI#4 (comment)

PK CLI shouldn't be depending on that directly.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Aug 15, 2023

PK should be exporting any necessary functions for creating NodeIds: MatrixAI/Polykey-CLI#4 (comment)

Currently the PK CLI does this:

const invalidNodeId = IdInternal.fromString<NodeId>('INVALIDID');

function generateRandomNodeId(): NodeId {
  const random = keysUtils.getRandomBytes(16).toString('hex');
  return IdInternal.fromString<NodeId>(random);
}

This sort of code seems like something that PK can provide. PK CLI then doesn't need @matrixai/id directly.

The generation of NodeID is used in these tests:

»» ~/Projects/Polykey-CLI/tests
 ♖ ag 'generateRandomNodeId'                   ⚡(staging) pts/5 16:45:35
vaults/vaults.test.ts
26:  const nodeId1 = testUtils.generateRandomNodeId();
27:  const nodeId2 = testUtils.generateRandomNodeId();
28:  const nodeId3 = testUtils.generateRandomNodeId();
411:          const targetNodeId = testUtils.generateRandomNodeId();
457:        const targetNodeId = testUtils.generateRandomNodeId();
536:        const targetNodeId = testUtils.generateRandomNodeId();

utils.test.ts
94:      const nodeId = testUtils.generateRandomNodeId();

nodes/add.test.ts
17:  const validNodeId = testUtils.generateRandomNodeId();

I suggest that...

  1. PK should expose a function that creates a NodeId.
  2. It should be able to take in string data.
  3. Maybe toNodeId? Is there an existing parser for this? See the validation standard https://github.com/MatrixAI/MatrixAI-Graph/issues/44
  4. It should be available in nodes/utils, so I can use it in the tests in PK CLI.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Aug 15, 2023

The commander package is no longer used in PK. It can be removed too.

The @types/nexpect is no longer used in PK, it can be removed.

@CMCDragonkai
Copy link
Member

I'll make some changes to nodes/utils to get the right function for @matrixai/id.

And see what I can do for the QUIC type... and you can rebase on top.

But you should remove commander since it will change the package-lock.json.

@CMCDragonkai
Copy link
Member

Functions that is in PK that produce NodeId:

  • parseNodeId - ids/index.ts - calls decodeNodeId
  • decodeNodeId - validation/utils.ts - this decodes a multibase base32 hex encoding public key
  • publicKeyToNodeId - keys/utils/asymmetric.ts - this takes a Ed25519 Public Key buffer to the NodeId

Whereas... the function used in PK CLI tests is basically just:

IdInternal.fromString<NodeId>('...somestring...');

This string is expected to be the random binary string. The opposite of String.fromCharCode. The binary version of some buffer.

It looks like this:

> x = new Uint8Array([0, 0, 0])
Uint8Array(3) [ 0, 0, 0 ]
> String.fromCharCode(...x)
'\x00\x00\x00'

If the string is logged out, it is not necessarily printable characters.


I could try changing the tests so that instead of doing that, it would use the decodeNodeId which then expects a multibase base 32 hex encoded public key.

However I think it is more efficient to provide a function that simply takes a random 32 byte string and produces a NodeId from that.

We can easily do something like:

  • toNodeIdString
  • fromNodeIdString

To address #532, these 2 functions should exist in ids/index.ts. In fact ids/index.ts already has createXIdGenerator variants, which can take over exactly what we expect to our tests.

@CMCDragonkai
Copy link
Member

I'm going to do encodeNodeIdString and decodeNodeIdString, because we actually do have a type for this called NodeIdString.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Aug 15, 2023

Ok regarding network configuration, I'm going to need to do on this branch.

I'm proposing something like this:

    networkConfig: {
      /**
       * Agent host defaults to `::` dual stack.
       * This is because the agent service is supposed to be public.
       */
      agentHost: '::',
      agentPort: 0,
      /**
       * Client host defaults to `localhost`.
       * This will depend on the OS configuration.
       * Usually it will be IPv4 `127.0.0.1` or IPv6 `::1`.
       * This is because the client service is private most of the time.
       */
      clientHost: 'localhost',
      clientPort: 0,
      /**
       * If using dual stack `::`, then this forces only IPv6 bindings.
       */
      ipv6Only: false,

      /**
       * Agent service transport keep alive interval time.
       * This the maxmum time between keep alive messages.
       * See the transport layer for further details.
       */
      agentKeepAliveIntervalTime: 10_000, // 10 seconds

      /**
       * Agent service transport max idle timeout.
       * This is the maximum time that a connection can be idle.
       * See the transport layer for further details.
       */
      agentMaxIdleTimeout: 60_000, // 1 minute

      /**
       * Client service transport parameters.
       */
      clientMaxReadableStreamBytes: 1_000_000_000, // About 1 GB
      clientMaxIdleTimeout: 120, // 2 minutes
      clientPingIntervalTime: 1_000, // 1 second
      clientPingTimeoutTimeTime: 10_000, // 10 seconds

      /**
       * Controls the stream parser buffer limit.
       * This is the maximum number of bytes that the stream parser
       * will buffer before rejecting the RPC call.
       */
      clientParserBufferByteLimit: 1_000_000, // About 1MB
      clientHandlerTimeoutTime: 60_000, // 1 minute
      clientHandlerTimeoutGraceTime: 2_000, // 2 seconds
    },

The client parameters aren't entirely properly namespaced, and I want to remove the clientMaxReadableStreamBytes since MatrixAI/js-ws#3 is also about introducing backpressure to our websocket system.

Right now until js-ws has been factored out, and since we're just using ws, it's fine to enable it. Once browser WebSocket is needed for 7th testnet deployment, then we will figure out how that works in the browser.

@CMCDragonkai CMCDragonkai force-pushed the feature-agent_migration_stage2 branch from 9c26b16 to 854f4ba Compare August 15, 2023 10:04
@CMCDragonkai
Copy link
Member

Rebased on top. Remember to sync up @tegefaulkes tomorrow.

@CMCDragonkai
Copy link
Member

There are 2 types in nodes/types.ts that is QuicConfig and QUICClientConfig.

The QUICClientConfig isn't used at all, and I'm removing it right now.

The QuicConfig type is being used... but it should not be.

Now that NodeConnectionManager is encapsulating the QUIC transport layer (which I wasn't really sure if that was the best idea), the NodeConnectionManager must present its own configuration self-standing, it cannot be propagating implementation details. Encapsulating QUIC means that QUIC is now an implementation detail.

@CMCDragonkai
Copy link
Member

I've pushed WIP commits for the above removal of configuration. It needs further work in NodeConnectionManager.

The way I think about it, if the transport layer of QUIC is inside nodes domain then the same thing has to happen to the WS for the client service. But if there isn't an equivalent, it sounds like this is something that should be maintained separately and have a separate lifecycle.

Not sure, need to examine the rest of this.

tegefaulkes and others added 25 commits August 18, 2023 17:07
* Related #527

[ci skip]
* Related #527

[ci skip]
* Related #527

[ci skip]
* Related #540

[ci skip]
…` and related env variables that are all migrated to Polykey-CLI
@tegefaulkes tegefaulkes force-pushed the feature-agent_migration_stage2 branch from 8ac935e to c0fad60 Compare August 18, 2023 07:10
@tegefaulkes
Copy link
Contributor Author

Pending CI but ready to merge.

@tegefaulkes tegefaulkes marked this pull request as ready for review August 18, 2023 07:27
@tegefaulkes tegefaulkes merged commit e2df9c5 into staging Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants