Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for referencing subschemas by id. #371

Closed
wants to merge 1 commit into from

Conversation

dfinlay
Copy link

@dfinlay dfinlay commented Nov 17, 2017

Add support for $ref elements using a subschema's declared id value. This is supported by a an additional store in the RefResolver class that stores all id declared subschemas.

…ue. This is supported by a an additional store in the `RefResolver` class that stores all `id` declared subschemas.
@Julian
Copy link
Member

Julian commented Nov 18, 2017

Hey @dfinlay thanks so much!

Would you mind elaborating a bit on what this is doing? Is this functionality you think is in the spec that's missing from jsonschema, or functionality you wish existed, or a convenience helper for functionality that exists already?

@dfinlay
Copy link
Author

dfinlay commented Nov 19, 2017

Sorry, I should have added some more commentary. This pull request adds support accessing subschemas by their id/$id keyword. This specifically addresses #341 and #313 as well.

A note on the implementation, I added the subschema_store to RefResolver, because I didn't want to be too intrusive in the changes. I think it would be better to make RefResolver.store a dict of dicts, and use # to identify the root schema. Thoughts?

I'd like to add some more tests, but I was a bit lost in the the test structure. Guidance on where / how to add them would be appreciated.

@Julian
Copy link
Member

Julian commented Nov 19, 2017

Aha, awesome!

I have to definitely read this carefully -- the one general thing I'd say though is that I definitely want to restrict public changes to the $ref implementation (to RefResolver), because I have strong suspicion that it needs replacing entirely ( python-jsonschema/referencing#3 ). So definitely if this can be done in a way that doesn't even add to the public interface, we should do that I think (so, e.g., is there actually a need to add the additional subschema_store parameter, or can it just be created by not overridable externally?)

I think it would be better to make RefResolver.store a dict of dicts, and use # to identify the root schema.

This sounds likely to be backwards incompatible with what the current thing asks for. But I probably have to read the changes here carefully to understand exactly.

I'd like to add some more tests, but I was a bit lost in the the test structure.

Yeah, sorry bout that :/. The tests for this seem very likely like they'd go in test_validators, alongside the 2 I see you modified, but I can follow up with a specific recommendation after I read the code.

@Julian
Copy link
Member

Julian commented Dec 10, 2017

Hey, haven't forgotten this, just still having some trouble figuring out how much of this is a bug fix and how much is new :/.

It's likely my own fuzzy knowledge of $ref that I never properly sat down to think about.

Still definitely hoping to find some time to think about this PR though. Sorry for the wait.

@elipapa
Copy link

elipapa commented Feb 15, 2018

any news on this?

@dfinlay
Copy link
Author

dfinlay commented Feb 17, 2018

I should probably have been more proactive with this. I needed to add more tests! Also, looking back at this without rose-tinted glasses, there is as a problem with the implementation. You cannot pass an object for validation against a defined subschema. Instead, you can only validate subschema elements if they are children of root schema instance. This should be resolved before this change is merged.

Ideally, one should be able to pass a string id to the validator with an instance and have it validated against the schema w/ that ID, regardless of whether it is a subschema and root.

Here is an example of how you have to validate instances against subschemas now (Ignore the versions here, just an example).

schema = {
  "$schema": "http://json-schema.org/draft-06/schema#",
  "$id": "awesome.json",
  "title": "Awesome",
  "type": "object",
  "properties": {
    "name": {
      "title": "Awesome Name",
      "type": "string",
      "maxLength": 256,
      "default": ""
    },
    "sauce": {
      "$ref": "#awesome-sauce"
    }
  },
  "required": [
    "name",
    "sauce"
  ],
  "definitions": {
    "awesome-sauce": {
      "$id": "#awesome-sauce",
      "title": "Awesome Sauce",
      "description": "Sauce for your food that is quite awesome.",
      "enum": [
        "ranch",
        "french-onion",
        "au-jus",
        "misc",
        "recycled"
      ]
    }
  }
}

validator = jsonschema.Draft4Validator(schema)

data = dict(name="Dave's Homemade Ranch", sauce="ranch")

sauce = dict(sauce="ranch")

validator.validate(data)
> True

validator.validate(sauce)
> False

validator.validate(sauce, schema['definitions']['awesome-sauce'])
> True

@taichi-jp
Copy link

Any updates on this?
I really need to refer to other json files!

@Julian
Copy link
Member

Julian commented Jan 10, 2019

So I thought I'd left this comment somewhere but apparently I never did, so apologies @dfinlay :/ (And of course again thanks so much for even sending the PR)

Two "approach"-y comments though for you or anyone else who's finding this and trying to take it the last mile:

First to fix the $ref issue I'd want to see at least one test in json-schema-org/JSON-Schema-Test-Suite being fixed by the patch. Right now there is one specific bug which leads to ~10 tests being skipped from there in jsonschema.tests.test_jsonschema_test_suite across all drafts. If that's the bug this fixes, this should unskip those and show that they're passing. If it's not, that means there's a test missing upstream, which I'd want to add one for there (though of course I can help with doing that if we isolate the specific case).

And second, unless it is physically impossible because of the way the behavior needs to work, I don't want to change the current approach of ref resolution, whereby no pre-resolution of references happens until they're encountered. That I'm pretty sure this approach changes, so yeah, if that's necessary due to the bug, I'd like to discuss that specifically (and understand why that's the case), but otherwise would want to see the scope fix happen lazily.

@handrews
Copy link

@Julian I think at most you can only defer a resolution pass until the first plain-name fragment reference. At that point, you need to walk potentially all of the schema as you cannot predict where the plain name was defined. And ideally, if someone re-defined it, throw an error (although that's not required IIRC, the spec just says such behavior is undefined).

If you try to avoid walking the whole schema at that point, you either have to remember which parts you have not walked, or start over from the beginning when you find the next reference to a plain name fragment that has not already been resolved.

And actually, it's more complicated than this- if the reference resolves to something where the non-fragment part of the URI is not the same as the root schema's $id, then you have to go find the schema that declares that base URI.

I can see avoiding it up front as a useful optimization since many schemas never do either of these things, but once you need to calculate stuff, it seems easier to just do it all in one pass rather than try to minimize each lookup.

@Julian
Copy link
Member

Julian commented Jan 10, 2019

@handrews thanks, probably eventually I'll understand how this needs to look :)

So, I guess at a basic level you're telling me that it's not necessarily the case that resolution at a given level involves only levels you've already traversed? That's kind of hard for me to picture, any chance you could give a simple example? When you say "plain name" you mean e.g. $ref: "foo"? Is the right mental model not that you then just apply the base URI that you've seen along the way to finding that schema?

@handrews
Copy link

@Julian by plain name I mean the plain name fragment syntax rather than JSON Pointer fragments. So:

{
    "definitions": {
        "foo": {
            "$id": "#foo",
            "type": "string"
        }
    },
    "type": "object",
    "properties": {
        "fooByName": {"$ref": "#foo"},
        "fooByPointer": {"$ref": "#/definitions/foo"}
    }
}

Both fooByName and fooByPointer resolve to the same thing. If you haven't look at definitions yet, you won't know that the URI fragment "#foo" appears there. And you might have a more complicated situation like definitions nested in a schema under definitions. Or someone could use $id to define a plain name fragment in a schema under properties or items or allOf. I would not do that- I would put all re-usable schemas under definitions- but it is legal.

See https://tools.ietf.org/html/draft-handrews-json-schema-01#section-8.2.4 for a fairly comprehensive list of examples of how $id can be used to create various identifiers that could be referenced with $ref.

The way I handled it the one time I implemented it (sadly that code was left behind at a past job), I basically did a tree walk, keeping a stack of relevant base URIs (because as you see in the spec examples, multiple can be usable at the same time), and computed all possible full URI identifiers for each schema. I kept a map of full URIs to schemas so I never had to do this again once a schema was loaded- I think I internally replaced all $refs that had relative URI references with fully resolved URIs, because otherwise you have to keep track of $id again while validating, and that's really annoying.

@Julian
Copy link
Member

Julian commented Jan 11, 2019

Will have a look at the spec section you linked but man that seems really unfortunate to me :/

Would seem way cleaner to just allow one section where you define all your "variables" in, not litter every part of the document with a possible name. Maybe I'll come around as I see more examples on why that's a good idea :(

@dfinlay
Copy link
Author

dfinlay commented Jan 13, 2019

Sorry I ignored following up on the additional tests for this PR, got pulled into other directions and it languished. If there is anything I can do to help close this out and get this part of the spec support I'm happy to. If the design approach I took is suboptimal, I'm not attached to it in anyway. I went for what I saw as the minimally invasive approach.

@Julian
Copy link
Member

Julian commented Jan 14, 2019

@dfinlay deeeeeeefinitely not your fault, apologies for this taking so long with all the other things going on.

If there is anything I can do to help close this out and get this part of the spec support I'm happy to. If the design approach I took is suboptimal, I'm not attached to it in anyway. I went for what I saw as the minimally invasive approach.

Awesome!

From what @handrews said, I might be out of luck on what I want, so I might have to deal with this approach as a general strategy :/

I'm still triple checking whether there's no other way to make this happen, but if we do go this way, basically the main thing here would be to just make everything private (so no additional public API methods introduced, we do the schema traversal in some private method, store off all the mappings between simple names and schemas, and then reference them).

The two other minor things is that there's some small things to update here that now exist now that draft 6/7 support is merged (like for example the key argument you had to subschemas is now a thing that is passed in on jsonschema.validators.create, so you should be able to reuse that rather than taking the argument a second time).

And the last one that yeah I'd love to see the currently-skipped tests in JSON-Schema-Test-Suite be unskipped and confirmed to pass after this change.

If you make progress on any of those do lemme know, otherwise thanks again!

@alvarogzp
Copy link

alvarogzp commented Jun 26, 2019

Hello, what is the status of this?

It has been more than one year since the PR was created and jsonschema still fails parsing a schema with $refs pointing to custom $ids in the same schema (issue #341). It's sad that I'll need to modify my schema to be less expresive in order to use it on Python (is there any other Python implementation with proper $ref support?).

Sorry for being hard, but it's really sad to see such a basic feature stalled for years, it makes me feel the project is dead! It is even referenced in the docs that the Python implementation don't support this: https://json-schema.org/understanding-json-schema/structuring.html#using-id-with-ref.
Imagine how rare it is that this feature isn't supported to be mentioned on the official json schema guide!

@Julian
Copy link
Member

Julian commented Jun 27, 2019

@alvarogzp please feel free to read the comments on this ticket -- notably, the need for an upstream test case to be failing and fixed by this (if the existing ones do so, great).

It's sad that I'll need to modify my schema to be less expresive in order to use it on Python

Considering how composable jsonschema is, you absolutely do not, since you've got about 5 other options to choose from, not least of which being transforming your schema just before handing it to jsonschema, but please do save the guilt trip. It's a bit ridiculous to say such a thing, or create FUD around the project being dead which clearly can be seen not to be the case by just looking at the commit history.

jsonschema is an open source project.

@alvarogzp
Copy link

Sorry about my previous post and language, it was totally inappropriate.
I'll try to be more proactive instead of just complaining.

I've been looking in the JSON-Schema-Test-Suite for any test fixed by this (as you suggested), and certainly there isn't. However, I found json-schema-org/JSON-Schema-Test-Suite#225 which adds tests for it (the technical name is "Location-independent identifiers", and they are defined in https://tools.ietf.org/html/draft-handrews-json-schema-01#section-8.2.3). I'll try to help with that PR too.

Also, I have been testing this branch and found some issues regarding to the handling of the 'id' property (also mentioned on the comments above). I have fixed it, merged with master (there was a merge conflict) and added a small test to TestRefResolver (you can check the new test would fail against master and pass against this branch). The result is here, and the diff of my changes (without the merge commit) can be found here. Feel free to sync this branch with mine, or if you want I can create a new PR.

I tested the branch with the following schema and it validates properly (it doesn't on master):

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "title": "Test",
  "properties": {
    "test": { "$ref": "#test" }
  },
  "definitions": {
    "test": {
      "$id": "#test",
      "title": "A test string",
      "type": "string"
    }
  }
}

If there is more I can do, just tell me.

Thanks for the project, and sorry again!

@Julian
Copy link
Member

Julian commented Jun 28, 2019

Thanks, no hard feelings, and definitely thanks for the offer to help it is very much appreciated.

I think based on what you found upstream, yeah let's get that merged in there then first. I can review that upstream. Can you maybe have a look at that PR in the test suite, leave any comments there that you see, and then I can merge it (and possibly double check with @handrews to make sure he agrees, if he's available)?

Once that's done, we can come back here and have a look at the fix you put in as well, and maybe get this merged!

@jjmurre
Copy link

jjmurre commented Jan 13, 2020

Any chance that this PR will be merged one day?

@Julian
Copy link
Member

Julian commented Jan 13, 2020

Hey @jjmurre -- do you think you'd be willing to help address the issues remaining above?

@jjmurre
Copy link

jjmurre commented Jan 14, 2020

I'll have a look, see if I can find my way around the codebase.

Julian added a commit that referenced this pull request Jun 9, 2020
817b724 add perl implementation and test suite to the user list
ca14e01 Merge branch 'pull/382'
3dabf55 move non-format tests out of the format directory, while keeping all ECMA regex tests together
4121aa5 move format tests to their own directory
4bae8aa Add more idn-hostname tests to draft-2019-09 and draft-07
6d91158 [325] Add some more hostname tests
e593057 Merge pull request #389 from ssilverman/readme-drafts
fb3766d README: Improve issue/PR links
79bef22 README: Update language around drafts
ade47e4 README: Add Snow to the list of supporting Java validators
fc0c14e README: Update simple JSON example
1167669 README: Update structure, consistency, spacing, and wrapping
9514122 Merge pull request #388 from json-schema-org/ether/maxProperties=0
7646490 test that maxProperties = 0 means the object is empty
c3f4319 Merge pull request #384 from ChALkeR/chalker/unique
7766f48 Improve uniqueItems validation
7555d41 Add unnormalized $id tests
11f70eb [300] Add tests for valid use of empty fragments in "$id"
b106ff0 [299] Add tests for invalid use of fragments in "$id"
4a74d45 Fix "tilde" spelling
3eca41b Merge pull request #379 from json-schema-org/ether/remove-wrapped-refs
d61bae8 remove wrapped refs
536ec07 [359] Add unevaluatedProperties/unevaluatedItems cousin tests
ac63eb7 Small README update that introduces the concept of directories
697944e Merge pull request #374 from karenetheridge/ether/allOf-anyOf-oneOf
33f8549 test all the *Of keywords together
44b99ed Merge pull request #373 from karenetheridge/ether/items-and-contains
4a2b52f some tests of items + contains
7f00cc8 add test that was present for other drafts but not 2019-09
a3f9e2e Merge pull request #371 from karenetheridge/ether/if-then-else-boolean
aeeaecc some tests with if/then/else and boolean schemas
b8a083c Merge pull request #372 from nomnoms12/unique-false-and-zero
85728f1 Add tests for uniqueness [1] and [true]
fd01a60 Add tests for uniqueness [1] and [true]
0a8823c Merge pull request #370 from karenetheridge/ether/nul-char
fa6f4dd add tests for the NUL character in strings
8bf2f7f Merge pull request #369 from ssilverman/data-desc
2ba7a76 Add data description
4f66078 Merge pull request #367 from karenetheridge/ether/additionalItems
283da7c some more tests for additionalItems
7ba95f3 add tests for keywords maxContains, minContains
2f2e7cf Merge pull request #365 from karenetheridge/ether/move-ecma-regex
8388f27 move ECMA regex tests under format/

git-subtree-dir: json
git-subtree-split: 817b724b7a64d7c18a8232aa32b5f1cc1d6dd153
@Julian Julian closed this Oct 20, 2020
@Julian Julian deleted the branch python-jsonschema:master October 20, 2020 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants