-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General Data Validation - Boundary IO locations should be doing data validation and marshalling (and the decommissioning of GenericIdTypes for all IDs) #321
Conversation
All these places will need to be updated with boundary validation logic:
|
Validation errors are going to go into their own domain as they may be shared between:
This problem has come up too often. Will be coming up a way to share validation errors, and work with the existing domains as well. |
@tegefaulkes don't use |
Actually the |
Here's an idea. We are going to take inspiration from We can have these functions:
The The
Here this would mean that First we have to understand If we want to accumulate exceptions across the data structure. An exception on the lower part of the tree may be accumulated with an exception at a higher part of the tree. So if we have a data structure like: const data = {
a: 'b',
c: {
d: 'e'
}
}; Then if There are some changes required to make this possible. The Furthermore unlike Instead of accumulating exceptions, we can also fail fast, and throw on the first exception rather than continuing to walk the tree. So different variants can be made for this:
The result is therefore: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/AggregateError. One additional issue is matching validation errors to the location in the data. This is the choice of the If no exceptions are thrown by the This should enable us to use This can be put into our I like the word |
Existence checking can be done by having the The return type of |
At the boundaries of the program we would then do something like: const nodeIdEncoded = call.request.getNodeId();
// assuming synchronous parse
const nodeId = parse(
(keyPath, value) => {
value = nodesUtils.decodeNodeId(value);
if (value == null) {
throw new nodesErrors.ErrorNodeIdInvalid();
}
return value;
},
data
); The location of exceptions is still domain specific, as they should be close to where the utility functions that decode them are, even if those functions aren't themselves throwing those exceptions. This generalises to more sophisticated data structures as well. |
For alot of boundaries, full data input parsing isn't really done, it seems done in piecemeal one at a time. And so validation exceptions end up being thrown in a number of places that later interrupt the call. So in the future, this |
The lack of frontloading, and general validation errors has meant that we end up with small exceptions like This smells bad, right now there's an Which should tell us that in the case of The This should avoid us having to create specific exceptions for every little validation error that could occur. This means exceptions like |
Using The node2nix has finally updated to support v15 and v16, however nixpkgs is stil using the old node2nix: NixOS/nixpkgs#146440. So for now I just have to use a custom I was also thinking that given that parse((keyPath, value) => {
throw new RuleError(keyPath);
}, data); Alternatively we don't pass in Finally the result should be a |
A note the Yes arrow functions cannot have |
First attempt: async function parse(
rules: (keyPath: Array<string>, value: any) => Promise<any>,
data: any,
options: { mode: 'greedy' | 'lazy' } = { mode: 'lazy' }
): Promise<any> {
const errors: Array<ErrorRule> = [];
const parse_ = async (keyPath: Array<string>, value: any, context: any) => {
if (typeof value === 'object' && value != null) {
for (const key in value) {
value[key] = await parse_([...keyPath, key], value[key], value);
}
}
try {
value = await rules.bind(context)(keyPath, value);
} catch (e) {
if (e instanceof ErrorRule) {
e.keyPath = keyPath;
e.value = value;
errors.push(e);
} else {
throw e;
}
}
return value;
};
// The root context is an object containing the root data keyed by empty string
// This matches JSON.parse behaviour
data = await parse_([], data, {'': data});
return data;
} |
Full example: import { CustomError } from 'ts-custom-error';
class ErrorParse extends CustomError {
public readonly errors: Array<ErrorRule>;
constructor(errors: Array<ErrorRule>) {
const message = errors.map((e) => e.message).join('; ');
super(message);
this.errors = errors;
}
}
class ErrorRule extends CustomError {
public keyPath: Array<string>;
public value: any;
public context: object;
constructor(message: string | undefined) {
super(message);
}
}
async function parse(
rules: (keyPath: Array<string>, value: any) => Promise<any>,
data: any,
options: { mode: 'greedy' | 'lazy' } = { mode: 'lazy' }
): Promise<any> {
const errors: Array<ErrorRule> = [];
const parse_ = async (keyPath: Array<string>, value: any, context: object) => {
if (typeof value === 'object' && value != null) {
for (const key in value) {
value[key] = await parse_([...keyPath, key], value[key], value);
}
}
try {
value = await rules.bind(context)(keyPath, value);
} catch (e) {
if (e instanceof ErrorRule) {
e.keyPath = keyPath;
e.value = value;
e.context = context;
errors.push(e);
// If lazy mode, short circuit evaluation
// And throw the error up
if (options.mode === 'lazy') {
throw e;
}
} else {
throw e;
}
}
return value;
};
try {
// The root context is an object containing the root data but keyed with undefined
data = await parse_([], data, { undefined: data });
} catch (e) {
if (e instanceof ErrorRule) {
throw new ErrorParse(errors);
} else {
throw e;
}
}
if (errors.length > 0) {
throw new ErrorParse(errors);
}
return data;
}
async function main () {
try {
const output = await parse(
function (keyPath, value) {
switch (keyPath.join('.')) {
case 'name':
if (typeof value !== 'string' || value.length < 3) {
throw new ErrorRule('name must be 3 characters or longer');
}
break;
case 'age':
if (typeof value !== 'number') {
throw new ErrorRule('age must be a number');
}
break;
}
return value;
},
{
name: 'bc',
age: 'a',
},
{ mode: 'greedy' }
);
console.log(output);
} catch (e) {
console.log(e);
}
}
main(); |
Usage at a function nodesAdd({
nodeManager,
authenticate,
}: {
nodeManager: NodeManager;
authenticate: Authenticate;
}) {
return async (
call: grpc.ServerUnaryCall<nodesPB.NodeAddress, utilsPB.EmptyMessage>,
callback: grpc.sendUnaryData<utilsPB.EmptyMessage>,
): Promise<void> => {
const response = new utilsPB.EmptyMessage();
try {
const metadata = await authenticate(call.metadata);
call.sendMetadata(metadata);
const data: { nodeId: NodeId, host: Host | Hostname, port: Port } = await parse(
(keyPath, value) => {
switch (keyPath.join('.')) {
case 'nodeId':
value = nodesUtils.decodeNodeId(value);
if (value == null) {
throw new utilsErrors.ErrorUtilsRule('nodeId is invalid');
}
break;
case 'host':
if (networkUtils.isValidHost(value)) {
throw new utilsErrors.ErrorUtilsRule('host is invalid');
}
break;
case 'port':
if (value == null) {
throw new utilsErrors.ErrorUtilsRule('port is invalid');
}
break;
}
return value;
},
{
nodeId: call.request.getNodeId(),
host: call.request.getAddress()?.getHost(),
port: call.request.getAddress()?.getPort()
}
);
await nodeManager.setNode(
data.nodeId,
{
host: data.host,
port: data.port,
} as NodeAddress
);
callback(null, response);
return;
} catch (err) {
callback(grpcUtils.fromError(err), null);
return;
}
};
} |
decodeNodeId
should be returning undefined
and ErrorNodeIdInvalid
usage at the boundaries
Exceptions like:
And potentially more can all be removed. The @emmacasolin since you've been doing test splitting, can you comment on any other one-off exceptions and data validation exceptions that we can replace with the generic |
Not from test splitting, but the only thing I can think of that hasn't been mentioned yet is the errors from JSON schema validation, which is done for notifications and claims. /**
* Exceptions raised when validating a Notification against a JSON schema
*/
class ErrorSchemaValidate extends ErrorNotifications {}
class ErrorNotificationsInvalidType extends ErrorSchemaValidate {}
class ErrorNotificationsGeneralInvalid extends ErrorSchemaValidate {}
class ErrorNotificationsGestaltInviteInvalid extends ErrorSchemaValidate {}
class ErrorNotificationsVaultShareInvalid extends ErrorSchemaValidate {}
class ErrorNotificationsValidationFailed extends ErrorSchemaValidate {} /**
* Exceptions arising during schema validation
*/
class ErrorSchemaValidate extends ErrorClaims {}
class ErrorClaimValidationFailed extends ErrorSchemaValidate {}
class ErrorNodesClaimType extends ErrorSchemaValidate {}
class ErrorIdentitiesClaimType extends ErrorSchemaValidate {}
class ErrorSinglySignedClaimNumSignatures extends ErrorSchemaValidate {}
class ErrorDoublySignedClaimNumSignatures extends ErrorSchemaValidate {}
class ErrorSinglySignedClaimValidationFailed extends ErrorSchemaValidate {}
class ErrorDoublySignedClaimValidationFailed extends ErrorSchemaValidate {} |
Schema validation is fine, that won't change. We're more interested in these one off data validation not caught by the CLI arg parsing or from protobuf when going through client service. CLI arg parsing is like a thin layer on top before reaching the general validation system. But there is overlap in duties. |
https://stackoverflow.com/q/42819408/582917 Turns out one can also just extend the Json schema system with custom validators. Then we can just reuse the Json schema validator for general data validation. Even Async validator works. https://ajv.js.org/guide/async-validation.html |
This is a good reference for nested object reference in the future: https://stackoverflow.com/questions/6393943/convert-a-javascript-string-in-dot-notation-into-an-object-reference. Ok so if json schema can do this as well, we might as well extend our existing json schema to apply in these situations. It looks like the only thing we need to do is add custom keywords to the const ajv = new Ajv()
ajv.addKeyword({
keyword: "idExists",
async: true,
type: "number",
validate: checkIdExists,
}) This means such keywords are shared across the same AJV instance. One question is how much sharing do we expect. It does seem verbose that every time we are doing validation we will need to do this in our client handlers:
There's just a lot of boilerplate to setup. This is why in our domains that have json schemas, we have done all the setup prior, and then just export the validation function. However doing this for every possible location where we want to do data validation seems poor form, since there can be lots of one-off validations that is only relevant to the specific API end-point. So therefore to save on boilerplate, what we could do is share an ajv instance with all the keywords already loaded. Then just expect service handlers to do:
Now modifying/sanitising data is also possible with the custom keywords: https://ajv.js.org/guide/modifying-data.html |
Some of the format rules that jsonschema exposes has to be used carefully to prevent Denial of Service attacks: https://ajv.js.org/security.html Mostly it's about setting the max length for any string validators, and also failing on the first exception. |
Bringing in the library https://github.com/ajv-validator/ajv-formats so we can make use of the standard formats that is defined in JSON schema: https://github.com/ajv-validator/ajv-formats |
Current schemas:
@emmacasolin @joshuakarp I believe these should be renamed to have a the |
I just tried to use JSON schema to do these things. It's actually too complicated. Adding custom keywords requires all sorts of configuration, and AJV is very complicated. I think we stick with how Let's say Protobuf and JsonSchema is used as the "first-line" of validation. Doing purely syntactic validation. Logical validation can occur on a second line, using a more flexible |
@joshuakarp @tegefaulkes to look over the test failures and fix them up. Mostly in the bin tests right now:
Other timeout errors may be fixed if those initial things get fixed. Please organise among yourselves how to fix the rest. |
There's a few places where I put a useless
Removing these. |
@tegefaulkes @joshuakarp once you push up test fixes as WIP commits. I can lint, rebase and squash those back into the 8 commits I have spreading them into the correct commits. |
Running all domains:
agent
identities
keys
nodes
notifications
secrets
vaults
standalone tests not in directory
|
There's a lot of domains in |
The status test was due to I've made |
the |
The |
The The |
Ok so remaining test failures to be covered in the 3 subsequent PRs:
|
…nt service handlers and CLI bin parsers
…rder to encode the raw `NodeId` before `toJSON` method is called on `IdInternal`
ae86bfd
to
cf54df9
Compare
I think there's a bug in The problem is that |
Also I didn't really add any new tests involving validation errors at the GRPC level. This will need to be allocated a new PR or integrated into #275. |
A solution for this that appears to do the trick was to move the logic of deconstructing the array of function deconstructErrorParse(errors: Array<validationErrors.ErrorParse>) {
const message = errors.map((e) => e.message).join('; ');
const data = {
errors: errors.map((e) => ({
message: e.message,
keyPath: e.keyPath,
value: e.value.valueOf(),
})),
};
return { message, data };
} And then instead of this line inside throw new validationErrors.ErrorValidation(errors); We have: const errorData = deconstructErrorParse(errors);
throw new validationErrors.ErrorValidation(errorData.message, errorData.data); |
With this change, this is the error that gets thrown by
Where this is the value for the
So it looks like it's performing as expected. Note: I've also now added a field for |
Suggest something like this:
|
Get a new PR so you can describe this issue and merge into master. It's still a hotfix until we get a proper error chaining system. |
New PR here: #331 Will start implementing something along these lines: #321 (comment) |
Description
Boundary validation seems like a general problem that people are hitting over and over. It's best to fix this once and for all across all domains to ensure that we always have a consistent way of doing boundary validation.
#318 was merged with problems involving general data validation.
It's come to attention that data validation is not consistently applied across all of the IO boundaries that PK does.
Data validation is a very crucial task that we've missed, I originally thought that the combination of GRPC protobuf and the domain types would be enough. But it appears it is not, and our IO boundaries require a general validation system.
I've used general validation systems/libraries in the past for many different applications, mostly web SaaS, and I've never liked any of them.
After working with
JSON.parse
, I got some inspiration that combines the ideas fromJSON.parse
'sreviver
parameter and the usage of recursion schemes used in functional languages, and the boundary issue mentioned in the Yesod framework https://www.yesodweb.com/book/introduction to create aparse
function that do triple duty:The lack of tuple structures, tuple-based pattern matching in JS means the composition of graph-operating function fragments is less syntactically nice, however the general principle of recursion schemes is followed.
Example prototype:
The
rules
parameter is a generic function that operates over every element of the data structure graph "bottom-up" using a post-fix depth first search traversal order, this allows one to operate any point in the graph, and the rules are meant to be composed of pattern-matched fragments doing a flexible mix of validation, sanitisation and marshalling.The exact implementation may change as I encounter problems.
Tasks
decodeNodeId
should be returningundefined
and the exception for invalid node id should be thrown on the boundaries, not inside thedecodeNodeId
function. The decoding function needs to match the behaviour that other utilities work with.[ ]ErrorNodeIdInvalid
overErrorInvalidNodeId
to keep it consistent with the rest of the code as well as changing the error message to remove.
.validation
domain withvalidate
andvalidateSync
and the usage ofErrorParse
for generically throwing validation exceptionsisValidHost
-isHost
isValidHostname
-isHostname
isPort
isGestaltAction
isVaultAction
nodes/utils.ts
-encodeNodeId
,decodeNodeId
,vaults/utils.ts
-encodeVaultId
,decodeVaultId
. Any Id that requires an encoded form and a decoded form should do this.NotificationId
andClaimid
andPermid
?NotificationId
andClaimId
are internal onlyClaimId
is exposed externally asClaimIdString
, this will becomeClaimIdEncoded
, and encoded asbase32hex
instead ofbase58btc
to preserve lexicographic orderNotificationId
andPermissionId
may have a "binary string" typeNotificationIdString
andPermissionIdString
, this is different from any encoded form which would beNotifcationIdEncoded
andPermissionIdEncoded
, and there's no encoding/decoding functions, the string from happens during POJO keys usually.PermissionId
needsPermissionIdString
for now. In the futureNotificationId
can haveNotificationIdString
if is being used.IdType | undefined
, they are not supposed to throw exceptions, and for domains where the id is being extracted from a "trusted source of truth", then the decoding is expected to succeed, then using!
can be usedNodeManager
- from the node graphGestaltGraph
- from the graph dbNotificationManager
- from the notifications dbvalidate
orvalidateSync
fromvalidation
domain and validate their input data. Remembervalidate
andvalidateSync
represent the 2nd line of defense. The first line for API is protobuf decoding. And for CLI, that would be commander's arg parsing logic.ErrorValidation
that is thrown byvalidate
orvalidateSync
can be serialised over GRPC and deserialised on the client side to see all parse errors. The list of errors during validation should be available under thedata
property asdata.errors = [ { message, keyPath, value }]
ErrorValidation
can be pretty printed to the STDERR by our root exception handlers, we want to be able to show a multitude of errors, however most uses ofvalidate
andvalidateSync
will belazy
and notgreedy
in order to avoid Denial of Service.validation/utils.ts
likeparseGestaltId
, this was changed to actually return theGestaltId
. Future parsing utilities to be shared between service handlers and CLI parsers should be delegated tovalidation/utils.ts
, the main thing is to catch theErrorParse
and rethrow as a commander error.Status
, it uses thedecodeNodeId
directly because it already has a general error for failure to parse which isErrorStatusParse
, so it doesn't use the validation utilityErrorValidation
.Final checklist