-
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Custom keyword validation #473
feat: Custom keyword validation #473
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! The main point I'd like to discuss is whether there should be a way to store extra, user-defined state inside custom validators. What do you think?
jsonschema/src/compilation/mod.rs
Outdated
} | ||
true | ||
} | ||
let definition = CustomKeywordDefinition::Validator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about whether it would give more flexibility if there was a trait instead? This way it would be possible to have some inner state inside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good idea, but a few clarifying questions: do you mean creating a trait like CustomValidator
with the validate
and is_valid
functions as methods on it, and then the user would define a struct that implements CustomValidator
and provide it to CompilationOptions (in like a Box I think)? If so perhaps it should be combined somehow with the Validator
trait?
Also while instinctually this feels like the right move, I'm struggling a bit to come up with a concrete use case example, like for a test case. There's this #394 (comment), but I wonder if that doesn't need state internal to the validator, the validate
/ is_valid
functions could potentially just reference the lookup table. Something like that's probably worth a test case anyway also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean creating a trait like CustomValidator with the validate and is_valid functions as methods on it, and then the user would define a struct that implements CustomValidator and provide it to CompilationOptions (in like a Box I think)? If so perhaps it should be combined somehow with the Validator trait?
Yep, or, maybe exposing the Validator
trait?
Also while instinctually this feels like the right move, I'm struggling a bit to come up with a concrete use case example, like for a test case
An example that I have in mind is a counter. Even though it might sound trivial, it unblocks interesting use cases like collecting usage statistics from different keywords (i.e. if the user "wraps" the existing keyword implementation with such a counter), which in turn is useful for building coverage maps for test cases sent to API endpoints (documented with Open API), which in turn is useful for coverage-guided fuzzing of such endpoints.
use crate::compilation::context::CompilationContext; | ||
use crate::compilation::options::CustomKeywordDefinition; | ||
use crate::keywords::CompilationResult; | ||
use crate::paths::{InstancePath, JSONPointer, PathChunk}; | ||
use crate::validator::Validate; | ||
use crate::ErrorIterator; | ||
use serde_json::{json, Value}; | ||
use std::fmt::{Display, Formatter}; | ||
use std::sync::Arc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got to set rustfmt.toml
to make sure everything is consistently formatted, but generally the imports are grouped in this crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, I merged #474 so, rustfmt
now checks for such cases
|
||
// An Arc<Value> is used so the borrow checker doesn't need explicit lifetime parameters. | ||
// This would pollute dependents with lifetime parameters. | ||
pub(crate) type CustomValidateFn = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the CustomIsValidFn
function uses regular references, do we need Arc
here? I.e. how would the signature look like otherwise as you mention extra lifetimes
jsonschema/src/compilation/mod.rs
Outdated
// define compilation options that include the custom format and the overridden keyword | ||
let validator = Arc::new(Mutex::new(Box::new(CountingValidator { count: 0 }))); | ||
let mut options = JSONSchema::options(); | ||
let options = options.with_custom_keyword("countme", validator.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stranger6667 unfortunately I think I've hit the end of my expertise with trait objects, Arc, Mutex, Box, etc. Everything else compiles, however I can't figure out how to maintain the type of validator
here so that I can access the count
member later on in assertions.
error[E0308]: mismatched types
--> src/compilation/mod.rs:623:62
|
623 | let options = options.with_custom_keyword("countme", validator.clone());
| ------------------- ^^^^^^^^^^^^^^^^^ expected `Arc<Mutex<Box<...>>>`, found `Arc<Mutex<Box<CountingValidator>>>`
| |
| arguments to this method are incorrect
|
= note: expected struct `Arc<std::sync::Mutex<Box<(dyn for<'instance, 'schema> CustomKeywordValidator<'instance, 'schema> + 'static)>>>`
found struct `Arc<std::sync::Mutex<Box<CountingValidator>>>`
= help: `CountingValidator` implements `CustomKeywordValidator` so you could box the found value and coerce it to the trait object `Box<dyn CustomKeywordValidator>`, you will have to change the expected type as well
Also, I'm not thrilled with how the with_custom_keyword
interface requires an Arc of a Mutex of a Box of the thing... for non inner state use cases that's unnecessary (see the other custom keyword tests above), but I'm not sure how to support both. Any ideas for either problem? (including suggestions like I've modeled this completely wrong lol)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it right now. I'll push a few commits to your branch which will this work and then we can continue discussing the interface :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a few thoughts:
Given the inner state, I think that custom_keywords
should store initialization functions (boxed this time, unlike formats
), so such validators won't be shared across multiple locations in the schema. This will remove the need for Arc<Mutex<T>>
, so no unconditional locking when any custom validator is used - stateless validators will be more performant. Alternatively, we can store something that implements Validator + Default
, which is likely a better option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, so, I think again my inexperience is showing. My attempts with Validator + Default fail because that cannot be a trait object, and if we try to store a fn in CompilationOptions we can't capture variables when in the closure and that wouldn't be able to clone the count
in the inner state test, and if we try to store a Fn then that's either not thread safe or not cloneable. Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I haven't thought about the trait object thing. Let me play with the code a bit
P.S. Offtopic - I've put together some ideas about this library's 1.0
API and would appreciate any feedback (see #475)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a240bd5! However, there are a few issues:
- Initialization functions require
&'static
so, it is problematic to pass a value there (like a counter) and then read from it without leaking. - The API does not spark joy :( However, given that there could be multiple occurrences of the same keyword, every one of them should have its separate state, so it is probably the only way to provide this kind of capability. I.e. - each validator is unique, so we need to give a way to initialize each of them. With stateless validators, things could be simpler.
Not 100% sure if it is worth the trouble and if the use case I mentioned above could be implemented differently
jsonschema/src/compilation/mod.rs
Outdated
true | ||
} | ||
} | ||
|
||
// define compilation options that include the custom format and the overridden keyword | ||
let validator = Arc::new(Mutex::new(Box::new(CountingValidator { count: 0 }))); | ||
let count = Arc::new(Mutex::new(0)); | ||
let boxed: Box<dyn CustomKeywordValidator> = Box::new(CountingValidator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that Rust can't preserve that this is a CountingValidator
while passing it to a function that expects a dyn CustomKeywordValidator
. That's what I was getting hung up on. The inner state being behind another Arc<Mutex<T>>
seems like a good alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, Rust preserves this information, but behind an indirection. Dynamic dispatch "knows" at runtime what kind of object is there, so it can do a vtable lookup and find the relevant implementation of methods from CustomKeywordValidator
. It is also possible to downcast dyn CustomKeywordValidator
to CountingValidator
via downcast_ref
, though it is not that convenient to use.
Without this kind of indirection, it would not be clear how much space to allocate on the stack for that type + as we store it in a hashmap, we can only use boxed trait objects.
Let me know if there is anything I can do to help with moving this forward :) |
@Stranger6667 sounds good! day job has taken over this week but I'll dig back into this PR this weekend |
I am thinking about moving the internal state of So, instead of closure without arguments, it could be an equivalent of However, at this point, the |
I have an idea how what the API could look like: #[derive(Debug)]
struct AsciiKeyword {
size: usize
}
impl jsonschema::CustomKeyword for AsciiKeyword {
fn is_valid(&self, instance: &Value) -> bool {
todo!()
}
}
fn ascii_keyword_factory(schema: &Value) -> Arc<dyn jsonschema::CustomKeyword> {
Arc::new(AsciiKeyword { size: 42 })
}
let validator = JSONSchema::options()
.with_keyword(
"ascii",
|schema| -> Arc<dyn jsonschema::CustomKeyword> {
Arc::new(AsciiKeyword { size: 42 })
}
)
.with_keyword("also-ascii", ascii_keyword_factory)
.compile(&schema)
.unwrap(); I've implemented it for the new
With some adjustments (mainly for other arguments like path, etc) it should work P.S. I haven't solved how to use such trait objects just yet, but I think it should be possible |
I'll push some changes to make things work with the API I described above, though there is still enough room to adjust compilation / is_valid / validate signatures so that your use case is still possible |
@Stranger6667 Thanks for running with this. I'll be ready to take another look after your updates. |
I'll add some more changes soon! Here are a few observations:
Otherwise, I think that the path forward is more or less clear |
Update: There are not that many things left:
|
fc4faca
to
0edd619
Compare
Add support for user defined custom keyword validation. The user provides and registers custom validator functions when configuring a JSONSchema. Custom keyword validators may be used when the user wants to enforce constraints that can't, or can't easily, be expressed in JSON schema.
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
0edd619
to
d72b5ed
Compare
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #473 +/- ##
==========================================
+ Coverage 89.70% 90.02% +0.32%
==========================================
Files 57 58 +1
Lines 9643 9923 +280
==========================================
+ Hits 8650 8933 +283
+ Misses 993 990 -3 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
I think it is ready to merge! From what I see it should be possible to implement all the cases mentioned in different issues:
Please, let me know if there is anything missing to cover your use cases. I'd be glad to work on them here, or in a follow-up PR. Thank you all for opening those issues and providing a lot of details and context, I appreciate it a lot! Sorry for the delay, but I hope to get this feature released this week. cc @eirnym @platoscave as you participated in #394, feel free to leave your comments here :) |
👏 Thank you @Stranger6667 – the final draft looks great for my use case |
Thank you @samgqroberts ! Let me know if you need anything else from my side :) I’d be happy to chat |
This was originally put up by @bnjt here #394. That repository was deleted which deleted the PR, so this PR re-creates that PR.
Additionally, this PR adds an improvement to the original PR whereby existing keywords can also have their behavior be overridden by custom logic.
Resolves #379
Resolves #354