Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPCServer specification #500

Closed
7 tasks done
tegefaulkes opened this issue Jan 16, 2023 · 16 comments · Fixed by #498
Closed
7 tasks done

RPCServer specification #500

tegefaulkes opened this issue Jan 16, 2023 · 16 comments · Fixed by #498
Assignees
Labels
development Standard development r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy

Comments

@tegefaulkes
Copy link
Contributor

tegefaulkes commented Jan 16, 2023

Specification

The RPCServer takes incoming streams and handles the using a provided handler. There are 6 points of focus regarding this.

  1. Stream Handlers
  2. JSON Transformer
  3. Metadata
  4. Authentication
  5. Error Handling
  6. Types

Stream handlers

The stream handlers are core to how the RPC server works. When an stream is created by the quic system it is provided to the RPCServer to be handled. There is one point of interaction here stream creation event -> handle stream. When a stream creation event happens it should call rpcServer.handleStream(streamPair: ReadableWritabePair<Uint8Array, Uint8Array>, connectionInfo: ConnectionInfo). This method should check the first message of the stream for the method and metadata. Hand off the metadata to the metadata interceptor. Get the required handler from the registered handlers and call the handler with the input stream and data. Then it should consume the handler output and provide the output data to the output stream.

The most generic form of these handlers will be a DuplexStreamHandler<I extends JSONValue, O extends JSONValue>. Since we will be handling ReadableWritablePair<Uint8Array, Uint8Array>, this can be consumed 1:1 by a duplex stream handler. This handler will take the form of...

type Handler<I, O> = (  
  input: I,  
  container: POJO,  
  connectionInfo: ConnectionInfo,  
  ctx: ContextCancellable,  
) => O;  
type DuplexStreamHandler<I extends JSONValue, O extends JSONValue> = Handler<  
  AsyncGenerator<I>,  
  AsyncGenerator<O>  
>;

This can be implemented as ...

const duplexHandler: DuplexStreamHandler<JSONValue, JSONValue> =  
  async function* (input, _container, _connectionInfo, _ctx) {  
    yield* input;
  };

Ultimately there are 4 kinds of stream handlers, DuplexStream, ClientStream, ServerStream, Unary. The latter 3 can be implemented as a DuplexStreamHandler. Their types are as follows.

// Receives 1 value and returns many
type ServerStreamHandler<I extends JSONValue, O extends JSONValue> = Handler<  
  I,  
  AsyncGenerator<O>  
>;  
// Receives many values and returns 1
type ClientStreamHandler<I extends JSONValue, O extends JSONValue> = Handler<  
  AsyncGenerator<I>,  
  Promise<O>  
>;  
// Receives 1 value and returns 1.
type UnaryHandler<I extends JSONValue, O extends JSONValue> = Handler<  
  I,  
  Promise<O>  
>;

Implementing these 3 as a DuplexStreamHandler is done by wrapping each respective function within a DuplexStreamHandler. For example a ServerStreamHandler can be wrapped like below. The others can be implemented as variations of this.

const wrapperDuplex: DuplexStreamHandler<I, O> = async function* (  
  input,  
  container,  
  connectionInfo,  
  ctx,  
) {  
  for await (const inputVal of input) {  
    yield* handler(inputVal, container, connectionInfo, ctx);  
    break;  
  }  
};

The key features of this are

  1. All handlers are ultimately handled as a DuplexStreamHandler , the client, server and unary handlers are internally converted to a duplex handler when registered.
  2. The RPC server interfaces with incoming streams through a single call to handleStream(). This should be called during a stream created event.

JSON Transformer

The stream pair we're working with will be pretty low level. It is a stream of Uint8Array. This means we need to convert from something akin to a byte stream to parsed JSON RPC messages. This can happen in stages.

  1. Message separation: Uint8Array -> separated JSON RPC messages in Uint8Array form.
  2. Message parsing: Uint8Array -> parsed JsonRPCMessage's Where each JSON RPC message is parsed into a JSON object and validated to match the specified JSON RPC message structure.
  3. The reverse needs to be done where JsonRPCMessage -> Uint8Array.

Each stage of transformation makes use of the webstreams TransformationStream. The first stage transformation takes the raw Uint8Array and emits seperated JSON RPC messages in JSON object form. This is done by making use of the JSON stream parser from https://www.npmjs.com/package/@streamparser/json. We can feed in the input data to the stream parser and it should emit JSON that it detects. The parser should output the top level JSON objects if we provided the following options { separator: '', paths: ['$'] }. The separator of '' Allows the parser to process back to back JSON objects of {JsonRpcMessage}{JsonRpcMessage}. The paths parameter of ['$'] Will have the parser only output the top level JSON object. ultimately stream conversion should be {jso|nMessa|ge}{JsonMess|age} uintArray form -> {JsonRpcMessage}|{JsonRpcMessage} JSON object form
The 2nd stage focuses on validating the messages. This is also a transformation stream that takes each 'chunk' of the JSON object and uses the generic parsing function to validate the message structure. If any message fails validation it will result in an error the next time the stream is read.

Metadata

Metadata needs to be sent along side the normal data. This is for communicating information relating the to communication that isn't directly parameters for the RPC call. The metadata must be included in the JSON RPC message params parameter.

The JSON message params parameter can contain any JSONValue. We should reserve the metadata key within this structure for the metadata use. The metadata should be structured as a POJO.

There is a general order to the metadata messages.

  1. Server send leading metadata
  2. client responds with leading metadata
  3. server and client can send arbitrary metadata at any time
  4. client sends trailing metadata
  5. server responds with trailing metadata
  6. communication ends.

When a call is made the handler side responds with server side leading metadata. The structure of this is not set in stone but it can include information about handler side limitations and data for constructing a authentication response. The Client can then respond with it's leading metadata. This will include the authentication information requred by the server side if needed.

Metadata can be sent at any time by both the client and server. This will likely be done by just including the metadata within the message parameters.

While it is simple enough to provide the metadata for each call or call handling. there are common metadata communications that need to be done across all calls and ideally should only be implemented in one place. For this we need to support metadata interceptors. These will be functions that we can register to the server that will be called on every stream.

There are 3 kinds of interceptors.

  1. And initial interceptor for generating the initial metadata to be sent before any data communication.
  2. initial responding metadata. Called with the first message. will be provided the initial metadata if it was received
  3. General interceptor called on any received metadata after the initial 2 interceptors. Bit unsure about this one, It happens in reaction to any received metadata but should the response be it's own metadata message or attached to the next data message? If it's attached I can't guarantee there will be another message. Maybe if the messages end we just send a metadata only message before fully ending?

There can be multiple registered interceptors with each one adding to the metadata. The interceptor should take metadata and return it's own metadata (metadata: POJO) => Promise<POJO>. Each stage of metadata will add to the add to the outgoing metadata with a spread operation const newMetadata = {...oldMetadata, ...interceptorMetadata} such that the interceptor can overwrite the metadata if needed. This should overwrite any metadata that was part of the message to begin with.

Authenticating

Authentication is generally left to the user of the RPCServer to set up. It can be handled by the interceptor system and the metadata. For example when doing basic bearer token authentication we can use the type 2 interceptor on the client to add the bearer token to the metadata of the first message. The server can use the type 2 interceptor to authenticate authenticate this token when the first message is received.

Authenticating via the TLS connection certificate should be handled via the quic system. We may want to check the nodeId of a handled connection. I don't see this being useful for every connection but it's needed for checking ACL for certain RPC calls so authenticating based on nodeId should be done within the handler.

A slightly more advanced method of token authentication and be done with a combination of the type 1 and 2 interceptors. The server side can use the type 1 interceptor to send metadata containing a secret for generating a authentication token. The client can use the type 2 to use the metadata to generate the expected token. The server can then use the type 2 to authenticate the token. This should allow for a more secure version of the token authentication.

Error handling

There are several sources of errors.

  1. Errors from the input stream. Usually indicates a failure of communication. These should not be thrown through the output stream.
  2. Parsing error. These come from the input stream parsing stage. They should be clearly parsing errors and are intended to be converted to error message thrown through the output stream.
  3. handler internal errors. These come out of the handler itself. These should be converted to error messages and thrown through the output stream.
  4. abort signal. The handler is provided an abortion signal. If it chooses to throw the reason then it should be thrown through the output. These errors are either timeouts or the RPC stopping.

There is a clear flow to any error that can happen. An error can come through the input stream, get thrown within the handler, caught by the stream handler and converted to an JSON rpc reponse error message and sent through the output stream. Most errors will be caught, converted and sent though the output stream. If an error can't be handled or isn't intended for the client caller then it should be emitted via the error event system.

When an error is intended for the client caller. it must be caught and converted to a JSON RPC error message. the error itself should be stringified and added to the message along with the relevant fields filled in.

The error event system should be an EventTarget. Anyone using the RPCServer can register an error event handler for processing these errors. Generally this will just be for logging out the error that couldn't be handled. No errors should bubble up outside the RPCServer unless it's a critical failure of the RPCServer itself.

The EventTarget could be a property of the RPCServer or the RPCServer can extend the EventTarget to streamline the usage.

Types

Strict typing is very important for this system and as such the types must be enforced as much as possible.

The JSON RPC messages have a strict structure. There are 4 kinds of messages, Request, Notification, response result, response error. These are defined within the JSON RPC spec

There are 4 kinds of handlers. When you register a handler you provide them in the form of.

type Handler<I, O> = (  
  input: I,  
  container: POJO,  
  connectionInfo: ConnectionInfo,  
  ctx: ContextCancellable,  
) => O;  
type DuplexStreamHandler<I extends JSONValue, O extends JSONValue> = Handler<  
  AsyncGenerator<I>,  
  AsyncGenerator<O>  
>;  
type ServerStreamHandler<I extends JSONValue, O extends JSONValue> = Handler<  
  I,  
  AsyncGenerator<O>  
>;  
type ClientStreamHandler<I extends JSONValue, O extends JSONValue> = Handler<  
  AsyncGenerator<I>,  
  Promise<O>  
>;  
type UnaryHandler<I extends JSONValue, O extends JSONValue> = Handler<  
  I,  
  Promise<O>  
>;

The types above will enforce the types within the handlers but the stream parsing can't check if the message data matches these types. As a result, the data within the handler will be typed as expected but could potentially be anything. It is up to the implementation of the handler to validate the data or trust it on blind faith.

Additional context

Tasks

  • 1. Stream handling functionality
  • 2. stream transformation
  • 3. metadata and interceptors
  • 4. Implementing authentication.
  • 5. Error handling
  • 5. Types
  • 7. Raw handlers
@tegefaulkes tegefaulkes added the development Standard development label Jan 16, 2023
@tegefaulkes tegefaulkes self-assigned this Jan 16, 2023
@tegefaulkes
Copy link
Contributor Author

Just the start, there is plenty more to add.

@tegefaulkes
Copy link
Contributor Author

Some of the Metadata stuff still needs to be worked out. In my head it will work a similar way to the GRPC metadata interceptors. there will be 3-ish kinds,

  1. Initial metadata generator. Called when communication starts and is sent before any data. useful for sending any data the other side needs before responding. Such as communication constraints on data or a seed secret for creating an authentication token.
  2. initial responding metadata. Called with the first message. will be provided the initial metadata if it was received.
  3. General interceptor called on any received metadata after the initial 2 interceptors. Bit unsure about this one, It happens in reaction to any received metadata but should the response be it's own metadata message or attached to the next data message? If it's attached I can't guarantee there will be another message. Maybe if the messages end we just send a metadata only message before fully ending?

Do we want or need trailing metadata messages?

@tegefaulkes
Copy link
Contributor Author

There may be an issue with our design of the handler and how the metadata will work. To process the metadata the handler will have to consume the incoming messages. As it stands the handler has the freedom to just not consume the input messages. Would that mean that the input metadata is never read? If someone designed a handler like that would It break authentication set up via the interceptors?

If the handler never responds with a message do we end up with the same issue? Do we need to enforce an empty message just to allow for metadata to be sent?

@CMCDragonkai
Copy link
Member

I don't think we need trailing metadata. We never actually used it any of our current GRPC usage.

We also don't have initial metadata.

We just have metadata available for every single request message being sent.

It's optional of course.

Like for a unary request, the request message can have metadata, that's the initial metadata.

For the unary response, the response message can have metadata, that's the trailing metadata.

If we are talking about streams, then every message in the stream can have metadata. We take each one and process them if they exist. It's possible that during a stream, only the first message has metadata, and subsequent messages don't have any metadata. Same for the response, it's possible that only the last message has metadata.

Essentially with our new RPC system our metadata is far more flexible. No more initial/trailing, every single message at any position could have metadata.

And our "interceptors" which should be renamed "middleware" can process any of the metadata at any point in time.

You might want to do this in the middle of the stream, perhaps another transform stream? But in this case, it's just a transparent stream, in the sense that it reads off any metadata and does something with them.

@CMCDragonkai
Copy link
Member

Of course to process metadata, you have to consume a message. Imagine a stream pipe, it consumes a message and yields the message. It's the same as "transparent proxy", it does nothing but simply inspect the message for metadata, perform a side effect and proceed as if nothing happened, the end consumer is none-wiser.

There's a possibility for this proxy stream to inspect the metadata, and then transform it into a different sort of data structure passed downstream. But I don't think is necessary.

I've worked with HTTP based middleware for many years, and what you do here is you just need a "handler wrapper". A HOF handler that wraps the lower order handler. It can do some operations like checking authentication prior to passing the rest of the logic down. Every HTTP based middleware system works like this.

Because we support duplex streaming, our HOF handler would essentially need to be like a async generator, that does a yield * to the lower order generator.

The lower order generator can still inspect the metadata if it wants to, there's no mutation on the data to the stream, but the HOF generator is able to decide to shortcut the lower order generator and just respond with like a 401 Unauthenticated.

Or in our case, throw some exception into the stream.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jan 17, 2023

  1. Initial metadata generator. Called when communication starts and is sent before any data. useful for sending any data the other side needs before responding. Such as communication constraints on data or a seed secret for creating an authentication token.

Let's try to avoid having such complex protocols. The initial message is the same message that contains the initial metadata. If the authentication fails, you don't even bother reading the other parts of the message. Think of it like HTTP. There's no such thing as an "initial get request" before you send the actual get request. The headers and the body of the HTTP message is one unit of message (they are not 2 messages). We are doing the exact same thing here, just that our transport format is JSON instead of \r\n... etc.

This applies to requests and responses at the same time.

@CMCDragonkai
Copy link
Member

So as a consequence:

In my head it will work a similar way to the GRPC metadata interceptors

Is not what we want. We do not want to do it in the same way as GRPC. GRPC's way of doing things is not right for what we want. We want to leave grpc behind.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jan 17, 2023

Some pseudo code of the HOF handler:

authHandleWrapper(input) {
  const data = await input.read();
  checkMeta(data.meta);

  // If we have passed, run the actual handler (but this time you have to "stuff back" the input)
  // To do this, we have to create a new stream based off the old stream
  const input_ = async function*() {
    // Do something here...
    // There's probably a more elegant way to do this
    yield data;
    yield* input;
  };
  yield* actualHandler(input_);
}

Or to be a proper HOF:

wrapHandler(handler) {
  return async function* (input) {
    const data = await input.read();
    checkMeta(data.meta);
    const input_ = async function*() {
      yield data;
      yield* input;
    };
    yield* handler(input_);
  };
}

rpcServer.register(wrapHandler(handler));

@CMCDragonkai
Copy link
Member

Some background context regarding middleware design:

Remember that is http req/res based. But we can generalise it to handle streams of req/res. The same principle applies, only the input/output types are different.

@tegefaulkes
Copy link
Contributor Author

ETAs estimates.

  1. Stream handling functionality: Mostly done, small changes may be needed
  2. stream transformation: Mostly done, small changes may be needed
  3. metadata and interceptors: some work needed, 1-2 days
  4. Implementing authentication. : Mostly working out usage, core of it is provided by 3. 0.5 days
  5. Error handling: Mostly done, edge cases need to be worked out. 1 day.
  6. Types: Mostly done, small changes to types need to be made and metadata worked out along side 3. 1 day.

@tegefaulkes
Copy link
Contributor Author

Server side: What kind of errors can we expect?

  1. Errors coming from the input stream
  2. Errors due to failing to parse the input stream ErrorParse, ErrorMessageTooLarge
  3. Handler not found.
  4. Errors internal to the handler. handler message parsing, handler errors
  5. Abortion errors for the handler. error RPCServer stopping, error handler timed out,

@CMCDragonkai
Copy link
Member

Server side: What kind of errors can we expect?

  1. Errors coming from the input stream
  2. Errors due to failing to parse the input stream ErrorParse, ErrorMessageTooLarge
  3. Handler not found.
  4. Errors internal to the handler. handler message parsing, handler errors
  5. Abortion errors for the handler. error RPCServer stopping, error handler timed out,

These need to be turned into actual exception classes you keep inside rpc/errors.ts.

@CMCDragonkai
Copy link
Member

Remember to incorporate it into the spec above.

tegefaulkes added a commit that referenced this issue Jan 19, 2023
There is now a reasonably enforced hierarchy of `message` => `request` | `response` => `requestMessage` | `requestNotification` | `responseResult` | `responseError`.

Related #500
Related #501

[ci skip]
tegefaulkes added a commit that referenced this issue Jan 19, 2023
@tegefaulkes
Copy link
Contributor Author

I've added in the error events in the RPCServer. Currently most kinds of error is converted to an error message and passed along. The only errors we don't want to pass along are ones that are emitted as a result of the stream failure. I've created a placeholder error for now so I can test it but it's up to the Quic system for what that error is exactly.

tegefaulkes added a commit that referenced this issue Jan 23, 2023
tegefaulkes added a commit that referenced this issue Jan 23, 2023
Mostly a type change, `Buffer` just extended `Uint8Array`.

Related #500

[ci skip]
@tegefaulkes
Copy link
Contributor Author

I've just done some prototyping on the middleware. I don't think we need to enforce any specific params structure for the messages. Since the middleware works on the whole message directly and is pretty versatile from the user perspective then I think it's up to the user to create their params structure the way they like.

tegefaulkes added a commit that referenced this issue Jan 25, 2023
tegefaulkes added a commit that referenced this issue Jan 25, 2023
tegefaulkes added a commit that referenced this issue Jan 27, 2023
Related #500
Related #501

[ci skip]
tegefaulkes added a commit that referenced this issue Jan 27, 2023
Related #500
Related #502

[ci skip]
tegefaulkes added a commit that referenced this issue Jan 30, 2023
Related #500
Related #502

[ci skip]
tegefaulkes added a commit that referenced this issue Jan 30, 2023
Related #500
Related #501
Related #502

[ci skip]
tegefaulkes added a commit that referenced this issue Jan 30, 2023
Related #500
Related #501
Related #502

[ci skip]
tegefaulkes added a commit that referenced this issue Jan 31, 2023
@tegefaulkes
Copy link
Contributor Author

The server side implementation is pretty much done. The only main thing to do beside review and fixes is the raw handlers.

I think pretty much in stead of all handlers are duplex handlers, all handlers become raw duplex handlers. The normal stream handling functions more or less the same way, except when we extract the method name from the first message we then downgrade the stream back to raw and pass that along to the raw handlers. The current duplex handlers are wrapped the same way the other handlers are, except the wrapper is converting the raw streams to the expected streams with the middleware.

There are some fuzzy bits to work out. Such as how to ensure the first message is forwarded along if the handler is is not a raw handler and how to release the transformation without loosing data between the first message and the raw data.

tegefaulkes added a commit that referenced this issue Feb 2, 2023
Looks like it works but the first message is skipping the middleware right now.

- Related #500

[ci skip]
tegefaulkes added a commit that referenced this issue Feb 2, 2023
- Related #500

[ci skip]
tegefaulkes added a commit that referenced this issue Feb 3, 2023
Related #500
Related #501

[ci skip]
tegefaulkes added a commit that referenced this issue Feb 7, 2023
tegefaulkes added a commit that referenced this issue Feb 7, 2023
tegefaulkes added a commit that referenced this issue Feb 9, 2023
tegefaulkes added a commit that referenced this issue Feb 9, 2023
- Related #500
- Related #501

[ci skip]
tegefaulkes added a commit that referenced this issue Feb 10, 2023
tegefaulkes added a commit that referenced this issue Feb 10, 2023
tegefaulkes added a commit that referenced this issue Feb 10, 2023
Related #500
Related #501
Related #502

[ci skip]
tegefaulkes added a commit that referenced this issue Feb 14, 2023
There is now a reasonably enforced hierarchy of `message` => `request` | `response` => `requestMessage` | `requestNotification` | `responseResult` | `responseError`.

Related #500
Related #501

[ci skip]
tegefaulkes added a commit that referenced this issue Feb 14, 2023
tegefaulkes added a commit that referenced this issue Feb 14, 2023
tegefaulkes added a commit that referenced this issue Feb 14, 2023
Mostly a type change, `Buffer` just extended `Uint8Array`.

Related #500

[ci skip]
tegefaulkes added a commit that referenced this issue Feb 14, 2023
Related #502
Related #500
Related #502

[ci skip]
tegefaulkes added a commit that referenced this issue Feb 14, 2023
Related #500
Related #501

[ci skip]
tegefaulkes added a commit that referenced this issue Feb 14, 2023
Related #500
Related #502

[ci skip]
tegefaulkes added a commit that referenced this issue Feb 14, 2023
Related #500
Related #501
Related #502

[ci skip]
tegefaulkes added a commit that referenced this issue Feb 14, 2023
- Related #500
- Related #501
- Related #502

[ci skip]
tegefaulkes added a commit that referenced this issue Feb 14, 2023
- Related #500
- Related #501
tegefaulkes added a commit that referenced this issue Feb 14, 2023
tegefaulkes added a commit that referenced this issue Feb 14, 2023
tegefaulkes added a commit that referenced this issue Feb 14, 2023
tegefaulkes added a commit that referenced this issue Feb 14, 2023
tegefaulkes added a commit that referenced this issue Feb 14, 2023
tegefaulkes added a commit that referenced this issue Feb 14, 2023
Related #500
Related #501
Related #502

[ci skip]
@CMCDragonkai CMCDragonkai added r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy labels Jul 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy
Development

Successfully merging a pull request may close this issue.

2 participants