validate does not raise errors in a predictable manner #953

ssbarnea · 2022-05-20T16:04:54Z

Based on #481 I was able to build a simple reproducer for this bug:

from jsonschema import validate

schema = {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "properties": {
    "environment": {
      "type": "object",
      "additionalProperties": {
        "type": "string"
      }
    }
  }
}

instance = {
  "environment": {
    "a": False,
    "b": False,
    "c": True,
    "d": "foo"
  }
}

for i in range(1, 20):
    try:
        validate(instance=instance, schema=schema)
    except Exception as exc:
        print(exc)

Desired behavior

Subsequent runs with the same inputs (schema + instance) should produce the same exceptions, not randomly returning different results.

The text was updated successfully, but these errors were encountered:

Julian · 2022-05-20T16:11:31Z

The desired behavior isn't a guarantee validate makes today, and likely not one it can/will make -- if you want the same exception you can use best_match or sort the errors yourself, but I can have a look to see if there's any room for improvement.

ssbarnea · 2022-05-20T16:20:00Z

To be honest, I wanted to use best_match but I was not able to find a simple way to use it similar to validate. As you probably seen calling validate is strait simple: giving only the schema and instance as inputs. Still the examples I seen with best_match required me to already know which schema draft version was, instantiate it,... Maybe I failed to read the right example?

I wonder if it would be possible to adapt validate to attach the list of errors to the exception, so consumer could dig into it.

Now I realise why one of my collegues implemented his own validator at https://github.com/ansible/ansible-navigator/blob/main/src/ansible_navigator/utils/json_schema.py#L72-L118 as that one returns all errors -- I wonder if we could contribute something similar to the library, so it would be easier to get more consistent errors?

Update: apparently his implementation does produce the same result each time, probably due to the use of sorted. I could try to move his implementation into a shared library that we used across our projects, but ideally we should be able to get this library upgraded. I bet others would be interested about the same kind of behavior. WDYT?

Julian · 2022-05-20T16:42:41Z

Docs definitely could be better (as usual :( ), sorry. There's a distinct lack of examples unfortunately.

(n.b. now that I ran your code, here at least, it always gives the same exception)

I'll try to come back in a bit with more guidance, but:

required me to already know which schema draft version was

This is jsonschema.validators.validator_for, which e.g. for your schema gives:

>>> import jsonschema.validators
>>> jsonschema.validators.validator_for(schema)
<class 'jsonschema.validators.Draft7Validator'>

so if you have an arbitrary schema, you'd use e.g.:

validator = jsonschema.validators.validator_for(schema)(schema)

and then when you validate, e.g. jsonschema.exceptions.best_match(validator.iter_errors(instance)) or

sorted(validator.iter_errors(instance), key=jsonschema.exceptions.relevance)

where relevance will sort using the same heuristics as best_match.

I'll look at the code you linked in a bit, but I suspect it basically needs #728 and #698 implemented to give clearer errors, neither of which should be too terrible.

ssbarnea · 2022-05-20T16:57:34Z

What is your feeling about me making a pull request to add a new method called validate_iter that receives the same a parameters as current validate() but returns a sorted list of errors using the logic you mentioned?

Julian · 2022-05-20T17:01:34Z

Not really for it honestly -- I think the current world (where there's jsonschema.validate for simplistic use cases, and the draft-specific validators for normal or performance critical ones) already confuses users who don't realize they shouldn't call validate in a loop with the same schema over and over again.

Open to be convinced, but it's 3 lines to do with the existing API, or even one as just sorted(validator_for(schema)(schema).iter_errors(instance), key=jsonschema.exceptions.by_relevance), so it'd have to be a pretty good argument.

Julian · 2022-05-20T17:03:42Z

I'm slightly more amenable to adding a relevant_errors() method to validator classes which returns the sorted list and would chop a line off, turning it into validator_for(schema)(schema).relevant_errors()

ssbarnea · 2022-05-20T20:23:00Z

You might be surprised to find that even the exceptions.relevance sort function produce random results with the same I gave in this report.

Before doing any optimizations we should probably look into ensuring that the library behavior is reproducible / predictable. Nobody will like a tool that returns different results, even between two identical calls as that will make it unreliable.

Julian · 2022-05-20T21:28:02Z

With all due respect the library is already quite popular, so saying things like "nobody will like a tool that returns different results" isn't really a welcome way to provide feedback.

As I say, I'm happy to look at specific examples and improvements. You're however again expecting a guarantee that isn't provided today, and as I said, I don't yet view as essential -- providing exceptions in a fixed order across runs isn't a current goal. It can undoubtedly be done.

Julian · 2022-05-24T13:46:44Z

Seems like you've addressed this downstream. As I say, as of today, it's not really a goal of the library to guarantee it produces the same exceptions in the same order across program runs, so users are indeed encouraged to call sorted() if that's what you want. We can revisit in the future if that's something a significant number of people have a need for.

cognifloyd · 2022-05-27T14:03:38Z

I think keeping the order of output consistent across runs would be very beneficial.

I wonder what the source of the varied output is. Are the errors getting emitted from a set?

Julian · 2022-05-27T14:10:20Z

No, they are iterated over normally -- PRs to address are certainly welcome, but I'd probably suspect it's due to hash randomization of the dictionary that contains which validators to validate (i.e. the metaschema)?

If you fix a hash seed, does that help?

But in short it just doesn't seem feasible to guarantee -- don't forget this library is not just an implementation of the spec, it's also a way to create extensions to the spec, so one has to deal with a possibly infinite list of additional behavior, and allowing those to configure their ordering.

cognifloyd · 2022-05-27T14:26:42Z

Weird. I thought iterating over a dictionary with .items() always yielded the same order.

But, oh! I see what you're saying. The order of a dict is consistent within a run, not across runs. Hmm.

ssbarnea · 2023-04-30T13:47:41Z

IMHO, this bug should be reopened. The only place where random behaviour is desirable in programming, is random function generators, and even these do often have a configurable seed that makes the behaviour reproducible.

The fact that current implementation has unpredictable behaviour which may prove tricky to fix should not be the reason for fixing it.

I also suspect that testing coverage of the code may prove to be partial as that would be one of the first places where this would be uncovered.

ssbarnea mentioned this issue May 23, 2022

Add module to work with schemas ansible/ansible-compat#143

Merged

Julian closed this as completed May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validate does not raise errors in a predictable manner #953

validate does not raise errors in a predictable manner #953

ssbarnea commented May 20, 2022

Julian commented May 20, 2022

ssbarnea commented May 20, 2022 •

edited

Loading

Julian commented May 20, 2022

ssbarnea commented May 20, 2022

Julian commented May 20, 2022 •

edited

Loading

Julian commented May 20, 2022 •

edited

Loading

ssbarnea commented May 20, 2022

Julian commented May 20, 2022

Julian commented May 24, 2022

cognifloyd commented May 27, 2022

Julian commented May 27, 2022

cognifloyd commented May 27, 2022

ssbarnea commented Apr 30, 2023

validate does not raise errors in a predictable manner #953

validate does not raise errors in a predictable manner #953

Comments

ssbarnea commented May 20, 2022

Desired behavior

Julian commented May 20, 2022

ssbarnea commented May 20, 2022 • edited Loading

Julian commented May 20, 2022

ssbarnea commented May 20, 2022

Julian commented May 20, 2022 • edited Loading

Julian commented May 20, 2022 • edited Loading

ssbarnea commented May 20, 2022

Julian commented May 20, 2022

Julian commented May 24, 2022

cognifloyd commented May 27, 2022

Julian commented May 27, 2022

cognifloyd commented May 27, 2022

ssbarnea commented Apr 30, 2023

ssbarnea commented May 20, 2022 •

edited

Loading

Julian commented May 20, 2022 •

edited

Loading

Julian commented May 20, 2022 •

edited

Loading