Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Custom keyword validation #394

Closed
wants to merge 2 commits into from
Closed

feat: Custom keyword validation #394

wants to merge 2 commits into from

Conversation

bnjt
Copy link

@bnjt bnjt commented Nov 2, 2022

Add support for user defined custom keyword validation. The user provides and registers custom validator functions when configuring a JSONSchema.

Custom keyword validators may be used when the user wants to enforce constraints that can't, or can't easily, be expressed in JSON schema.

This PR addresses at least some of the features requested in #379 by @tamasfe as well as my own. The same pros and cons described in #379 apply. The PR also follows the structure of #354, but of course uses validation functions as opposed to sub-schema for validation.

We could collapse the two user implementable validation functions into a single trait, mirroring the internal Validate trait. Implementations would have to sit behind an Arc.

Example usage

use jsonschema::{CustomKeywordDefinition, ErrorIterator, JSONSchema, paths::JSONPointer};
use serde_json::{json, Value};
use std::sync::Arc;

fn validate(
    instance: &Value,
    instance_path: JSONPointer,
    schema: Arc<Value>,
    schema_path: JSONPointer,
) -> ErrorIterator {
    // ... validate instance ... 
    Box::new(None.into_iter())
}
fn is_valid(instance: &Value, schema: &Value) -> bool {
    // ... determine if instance is valid ... 
    return true;
}
let definition = CustomKeywordDefinition::Validator { validate, is_valid };
assert!(JSONSchema::options()
    .with_custom_keyword("my-type", definition)
    .compile(&json!({ "my-type": "my-schema"}))
    .expect("A valid schema")
    .is_valid(&json!({ "a": "b"})));

Add support for user defined custom keyword validation. The user
provides and registers custom validator functions when configuring a
JSONSchema.

Custom keyword validators may be used when the user wants to enforce
constraints that can't, or can't easily, be expressed in JSON schema.
@codecov
Copy link

codecov bot commented Nov 10, 2022

Codecov Report

Merging #394 (0fa1689) into master (002edce) will decrease coverage by 0.05%.
The diff coverage is 72.72%.

❗ Current head 0fa1689 differs from pull request most recent head d67eb67. Consider uploading reports for the commit d67eb67 to get more accurate results

@@            Coverage Diff             @@
##           master     #394      +/-   ##
==========================================
- Coverage   79.52%   79.47%   -0.06%     
==========================================
  Files          52       53       +1     
  Lines        4455     4488      +33     
==========================================
+ Hits         3543     3567      +24     
- Misses        912      921       +9     
Impacted Files Coverage Δ
jsonschema/src/compilation/mod.rs 78.94% <57.14%> (-1.74%) ⬇️
jsonschema/src/compilation/options.rs 85.58% <75.00%> (-0.40%) ⬇️
jsonschema/src/keywords/custom_keyword.rs 77.77% <77.77%> (ø)
jsonschema/src/error.rs 57.71% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@bnjt
Copy link
Author

bnjt commented Nov 28, 2022

Hi @Stranger6667, Do you think this feature PR is aligned with the libraries goals? I can appreciate it might not add value to enough library users to make the addition and maintenance of the extra code worth it. However, if it is a use case you're interested supporting, I'd be happy to work on any improvements necessary to make it a better fit.

@Stranger6667
Copy link
Owner

Hey! Thank you for pinging me! I’ll take a look this week

@Stranger6667
Copy link
Owner

Thank you so much and sorry for my late reply!

I like the simplicity of providing two functions & the API, however, having a trait allows the end user to carry some extra data within a custom struct that implements such a trait. It might make this feature more flexible and cover more use cases. What do you think?

@Stranger6667
Copy link
Owner

@bnjt

I like the implementation very much! I have a few bikeshedding ideas regarding naming, but I'd like to discuss them later on. The main two points I'd like to discuss are:

  1. Using trait instead of two functions to allow the users to carry some extra data in a struct. What user-facing API could it have? would it be convenient to use?
  2. Extending the CustomKeywordDefinition enum in the future. What use cases could we think of in order to avoid possible backward-incompatible changes? I guess it is an alternative to having a trait

I don't have many specific thoughts about these topics yet, but I'd like to hear your opinion :)

cc @tamasfe As you created #379 I'd love to hear your thoughts as well :)

Tagging @antouhou as the author of #354 and @hadrien-toma & @Arcahub as you reacted on that PR. It would be amazing if you could share your use cases & whether this PR would cover them.

@Stranger6667
Copy link
Owner

Also cc-in @ryderjgillen as the author of #342 (and @manuschillerdev as you reacted there). Your input is welcome too! :)

@tamasfe
Copy link
Contributor

tamasfe commented Dec 4, 2022

As you created #379 I'd love to hear your thoughts as well :)

Thanks, this is along the lines of what I described more or less, if it was up to me I'd take it a step further and not try to restrict it to custom keywords, I'd just have a factory exposed to users that looks something like Fn(schema, path) -> Option<Box<dyn Validator>> that hooks into the compilation process and works for every schema regardless of keywords.

And Validator is pretty much the trait what already exists internally, one more thing I'd add is a catch-all error type for user-provided validator errors.

@bnjt
Copy link
Author

bnjt commented Dec 14, 2022

Hi @Stranger6667, thank you for the response. Apologies for being late on my side.

Re point 1 to use a trait

Using trait instead of two functions to allow the users to carry some extra data in a struct.

I agree. I don't recall, but I think I might have gone that route and then backed out due to needing a lifetime. But a trait is a cleaner approach and I'm sure whatever issue I ran into is solvable.

What user-facing API could it have?

Much like @tamasfe I was thinking a public version of Validate but narrowed down to just

  • validate(instance, path) -> ErrorIterrator and
  • is_valid(instance) -> bool

would it be convenient to use?

IMHO that's a friendly API. We could also provide a trait default implementation for is_valid().

Simple implementations with immutable state are simple. Implementers who want mutable state, would have expend some effort to provide their own interior mutability. But that seems reasonable to me.

Re. point 2 Extending the CustomKeywordDefinition

Extending the CustomKeywordDefinition enum in the future. What use cases could we think of in order to avoid possible backward-incompatible changes? I guess it is an alternative to having a trait

I don't have any immediate suggestions, but will give this some thought along with @tamasfe's suggestion to generalize.

/// Validate an instance returning any and all detected validation errors
validate: CustomValidateFn,
/// Determine if an instance is valid
is_valid: CustomIsValidFn,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the distinction between validate and is_valid only exists to improve the performance of is_valid, we could allow the user to omit this by making this an Option<CustomIsValidFn>, and have the is_valid() implementation of the Validate trait have a default case if this is missing that calls validate and returns if any errors are present. This would allow the user to make the tradeoff of losing out on is_valid() perf a bit but avoiding duplication of the validation logic.

Copy link
Contributor

@samgqroberts samgqroberts Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see @bnjt noted this in his comment just above, wrt the trait default implementation

/// .expect("A valid schema")
/// .is_valid(&json!({ "a": "b"})));
/// ```
pub fn with_custom_keyword<T>(mut self, keyword: T, definition: CustomKeywordDefinition) -> Self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this does not need to consume self, so maybe make it take in and return &mut self just like with_format() above? This would allow it to chain method calls more easily.

@samgqroberts
Copy link
Contributor

Hi @Stranger6667 and @bnjt. Thank you for all your work here! I took a look myself and wanted to make sure this feature would work for my use case, which I describe over here #379 (comment).

It was 90% of the way there, so I tried to take it the remaining 10% with a PR against this PR: https://github.com/bnjt/jsonschema-rs/pull/1

Please let me know what you think!

@Stranger6667 Stranger6667 force-pushed the master branch 2 times, most recently from c8ee78a to 8eacf2d Compare March 3, 2024 19:32
@bnjt bnjt closed this by deleting the head repository Mar 5, 2024
@platoscave
Copy link

Is this feature available to the public yet?
Which version?

@eirnym
Copy link

eirnym commented Apr 10, 2024

Hi, @bnjt @platoscave @samgqroberts and others

I'm working on a model generator from JSON Schema.

Could you please provide some real world application of such custom fields so I could think how to provide ways to handle such situations in my tool?

@samgqroberts
Copy link
Contributor

@platoscave as far as I know this was never merged to the project. I made a fork a while ago that merged @bnjt's work plus an additional piece I needed for my use case I mention in an earlier comment. The last 3 commits of the fork's main branch show that work https://github.com/samgqroberts/jsonschema-rs/commits/master/. For what it's worth I've been using that version of jsonschema-rs in my production workflow for the past 9 months. I'd be more than happy to re-create @bnjt's PR here based off of my fork – @Stranger6667 would you want that?

@samgqroberts
Copy link
Contributor

samgqroberts commented Apr 10, 2024

Hi, @bnjt @platoscave @samgqroberts and others

I'm working on a model generator from JSON Schema.

Could you please provide some real world application of such custom fields so I could think how to provide ways to handle such situations in my tool?

@eirnym In my use case I use a custom string format called "currency", defined as 1 or more digit characters followed by a period followed by 2 digits. This is used as an encoding for currency values to ensure no floating point precision losses, and in lieu of a fixed point data type in JSON. The way that I use the custom keyword validation requires the feature that I added on my fork that allows overriding existing keyworld validation with custom logic, and I do that to override the traditionally numeric keywords minimum, exclusiveMinimum, maximum, exclusiveMaximum, and multipleOf to apply also to strings if the format is "currency".

@platoscave
Copy link

@platoscave as far as I know this was never merged to the project. I made a fork a while ago that merged @bnjt's work plus an additional piece I needed for my use case I mention in an earlier comment. The last 3 commits of the fork's main branch show that work https://github.com/samgqroberts/jsonschema-rs/commits/master/. For what it's worth I've been using that version of jsonschema-rs in my production workflow for the past 9 months. I'd be more than happy to re-create @bnjt's PR here based off of my fork – @Stranger6667 would you want that?

In my use case objects can have an association with another object represented by an identifier (foreign-key).
I wish to add a custom type to my schema's such that I can verify the existence of identifiers through a table lookup.
So yes, If your PR does what I think it does, I would very much appreciate a new attempt.

@Stranger6667
Copy link
Owner

@platoscave as far as I know this was never merged to the project. I made a fork a while ago that merged @bnjt's work plus an additional piece I needed for my use case I mention in an earlier comment. The last 3 commits of the fork's main branch show that work https://github.com/samgqroberts/jsonschema-rs/commits/master/. For what it's worth I've been using that version of jsonschema-rs in my production workflow for the past 9 months. I'd be more than happy to re-create @bnjt's PR here based off of my fork – @Stranger6667 would you want that?

Yep, it would be amazing to have! :) Thank you

FWIW, I will get back to working on this crate rewrite somewhere mid-April and will incorporate all the new changes in the new version :)

@samgqroberts
Copy link
Contributor

@Stranger6667 I've put a copy of this PR + my additional commit up here #473, happy to work with you to get it in

@eirnym
Copy link

eirnym commented Apr 10, 2024

@samgqroberts it's a nice application. If I'd made it myself using only standard schema, I'd use integers (fixed decimal point emulation) for that role or created a string type with a pattern (as floats won't transfer values exactly for all languages) with extra validation in code. I worked in few various projects where the same problem has ben solved in these 2 ways. The only downside of fixed point values is lack of extensibility if you'd want to add more digits than 2 at the start after the decimal dot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants