Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate schema with relative paths and within-document references #343

Closed
tillahoffmann opened this issue Jul 4, 2017 · 20 comments
Closed

Comments

@tillahoffmann
Copy link

tillahoffmann commented Jul 4, 2017

I would like to use a schema base.json that defines basic properties together with a schema derived.json which adds further validation to the base schema. The derived schema needs to reference the base schema using a relative path as discussed in #98. However, I cannot get both the relative reference and within-document references to work at the same time. Here's a minimum example to illustrate the problem.

base.json

{
    "definitions": {
        "string_alias": {
            "type": "string"
        }
    },
    "properties": {
        "my_string": {
            "$ref": "#/definitions/string_alias"
        }
    }
}

derived.json

{
    "allOf": [
        {
            "$ref": "base.json"
        },
        {
            "properties": {
                "my_number": {
                    "type": "number"
                }
            }
        }
    ]
}

data.json

{
    "my_string": "Hello world!",
    "my_number": 42
}

validate.py

import json
import os
import sys
import jsonschema


if len(sys.argv) > 1:
    resolver = jsonschema.RefResolver('file://%s/' % os.path.abspath(os.path.dirname(__file__)), None)
else:
    resolver = None


with open('base.json') as fp:
    base = json.load(fp)

with open('derived.json') as fp:
    derived = json.load(fp)

with open('data.json') as fp:
    data = json.load(fp)


try:
    jsonschema.validate(data, base, resolver=resolver)
    print("Passed base schema.")
except Exception as ex:
    print("Failed base schema: %s" % ex)

try:
    jsonschema.validate(data, derived, resolver=resolver)
    print("Passed derived schema.")
except Exception as ex:
    print("Failed derived schema: %s" % ex)

If I use the resolver, the relative path gets resolved correctly but the within-document reference fails. If I don't use the resolver, the within-document reference succeeds but the relative path does not get resolved.

# With resolver
python validate.py resolver
Failed base schema: Unresolvable JSON pointer: 'definitions/string_alias'
Passed derived schema.

# Without resolver
python validate.py         
Passed base schema.
Failed derived schema: unknown url type: 'base.json'

Any suggestions on how to get both to work?

Edit: Looks like this may be related to #306.
Edit: A temporary workaround is to have base.json reference itself explicitly, i.e. use base.json#/definitions/string_alias instead of #/definitiosn/string_alias.

@smittysmee
Copy link

smittysmee commented Aug 1, 2017

I agree, this seems to be an issue when the base uri is not an HTTP or HTTPS scheme.
When the item is other than these, reference is lost.

The workaround is to have the reference to self explicitly defined. Instead of #/definitions/variable_name one must use <current_id>#/definitions/variable_name

@Lordnibbler
Copy link

+1 on this

@chimeno
Copy link

chimeno commented Oct 4, 2017

+1

2 similar comments
@joecabezas
Copy link

+1

@gaoxinyang
Copy link

+1

@SamuelePilleri
Copy link

Any news?

@handrews
Copy link

handrews commented Nov 26, 2018

@Julian looking through this and the related #98 it seems that the problem is uncertainty over how to set the initial base URI in the absence of a root $id (or id for older drafts).

The way I've handled this using other implementations is to give all schemas a root $id, which is the RECOMMENDED approach in the spec (more on that at the end). But in the Python implementation I worked on for a bit (before realizing you were still active and dropping it :-), I interpreted RFC 3986 §5.1.3's rules around establishing a base URI from the "retrieval" URI as meaning that any document loaded from a local files system should have a base URI that is the file:// URI of the local file.

  1. Does this seem like a reasonable interpretation to you?
  2. Should I clarify this in the spec?
  3. As an implementor, do you see any problems with implementing that interpretation?

This only affects resolving URIs, and imposes no requirements on whether the implementation can automatically load the files or not.

My assumption would be that your command-line script would construct the file:// URI and pass it to the library, although I don't care how it's actually implemented. However, I do not think that an implementation is responsible for determining a retrieval base URI when it is instantiated as a library and just handed a data structure. The library has no way of knowing what file:// URI to construct, or if there even is a filesystem involved. In that case, I believe that RFC 3986 §5.1.4 applies, which basically says "eh, whatever".

More specifically, it reads in part:

A sender of a representation containing relative references is responsible for ensuring that a base URI for those references can be established.

The proper way to do this is to set $id (or id for draft-04). The most recent work I did was with a JavaScript implementation, and I just set $id with https:// URIs for every schema document, pre-loaded all of them as documented by the library, and references worked even though no HTTP requests were ever made.

@tillahoffmann, @SamuelePilleri, etc: JSON Schema documents are intended to have $schema and $id (or id) set in the root schema objects specifically to avoid this problem.

I think we should ensure that the spec makes clear whether or how to set a base URI when a file path is known, but the proper solution is to set your own base URI.

@Julian
Copy link
Member

Julian commented Nov 29, 2018

@handrews have been meaning to respond in some more detail but haven't had the moment to sit down and do it, so might as well throw a short response up in the meanwhile :)

Does this seem like a reasonable interpretation to you?

Yes!

Should I clarify this in the spec?

Maybe, though I'd not expect this to be different for file / the filesystem than any other URI, yeah? So if there's what to clarify [I don't remember the current language] maybe it's just saying "if users forget to specify an $id and you loaded the schema from somewhere, assume one using the URI you used to fetch the document".

As an implementor, do you see any problems with implementing that interpretation?

Nope should be reasonable!

@handrews
Copy link

@Julian

"if users forget to specify an $id and you loaded the schema from somewhere, assume one using the URI you used to fetch the document".

Yeah that's basically what I'm going for. Somewhere there's a section on establishing a base URI, so I'll look at how that's worded. Perhaps including a note that a local filesystem is one such "somewhere" rather than calling out the file:// scheme.

@topher515
Copy link

I had a hard time figuring this out myself, and I kept googling into this github issue. (The solution is pretty much what @clenk says above.)

Here's a SO Q&A which covers it: https://stackoverflow.com/questions/53968770/how-to-set-up-local-file-references-in-python-jsonschema-document

Hopefully this helps someone in the future.

@joecabezas
Copy link

joecabezas commented Jan 1, 2019 via email

@handrews
Copy link

handrews commented Jan 2, 2019

@joecabezas I did add json-schema-org/json-schema-spec#686 for the next draft which should nudge people a bit more in the right direction. The problem is that there are endless ways to store and find "external" documents. The spec cannot cover every single filesystem, IoT device, network protocol, etc. etc. We have to rely on people understanding URIs (and how they don't necessarily directly reflect storage) or implementing a loading mechanism that works for their environment.

In the last draft we did add a section on Loading Reference Schemas and Dereferencing in an effort to make this more clear, but that was after the initial "draft-07" version so I'm not sure how many implementations have made use of it.

@joecabezas
Copy link

joecabezas commented Jan 3, 2019 via email

@mellertson
Copy link

mellertson commented Sep 30, 2019

Great info, thanks! I just wanted to add the $ref section from the Swagger 3.0 docs:

# $ref Syntax
According to RFC3986, the $ref string value (JSON Reference) should contain a URI, 
which identifies  location of the JSON value you are referencing to. If the string value 
does not conform URI syntax rules, it causes an error during the resolving. Any members 
other than $ref in a JSON Reference  are ignored. Check this list for example values 
of a JSON reference in specific cases:

## Local Reference 
* $ref: '#/definitions/myElement' # means go to the root of the current and find elements 
definitions and myElement one after one.
## Remote Reference 
* $ref: 'document.json' Uses the whole document located on the same server and 
 the same location. 
The element of the document located on the same server 
* $ref: 'document.json#/myElement'
The element of the document located in the parent folder 
* $ref: '../document.json#/myElement'
The element of the document located in another folder 
* $ref: '../another-folder/document.json#/myElement'
## URL Reference 
* $ref: 'http://path/to/your/resource' Uses the whole document located on the different 
server.
The specific element of the document stored on the different server 
* $ref: 'http://path/to/your/resource.json#myElement'
The document on the different server, which uses the same protocol 
(for example, HTTP or HTTPS) 
* $ref: '//anotherserver.com/files/example.json'
**Note:** When using local references such as #/components/schemas/User in 
YAML, enclose the value in quotes: '#/components/schemas/User'. Otherwise it 
will be treated as a comment.

@handrews
Copy link

handrews commented Oct 1, 2019

@mellertson OpenAPI (formerly known as Swagger but seriously they changed it in 2015 it's been four years already) uses a subset of $ref functionality, as they do not support $id. It is not advisable to apply their rules to JSON Schema as they omit several important cases.

@doubler

This comment was marked as spam.

@Julian
Copy link
Member

Julian commented Feb 23, 2023

Hello there!

This, along with many many other $ref-related issues, is now finally being handled in #1049 with the introduction of a new referencing library which is fully compliant and has APIs which I hope are a lot easier to understand and customize.

The next release of jsonschema (v4.18.0) will contain a merged version of that PR, and should be released shortly in beta, and followed quickly by a regular release, assuming no critical issues are reported.

It looks from my testing like indeed this specific example works there! If you still care to, I'd love it if you tried out the beta once it is released, or certainly it'd be hugely helpful to immediately install the branch containing this work (https://github.com/python-jsonschema/jsonschema/tree/referencing) and confirm. You can in the interim find documentation for the change in a preview page here.

I'm going to close this given it indeed seems like it is addressed by #1049, but feel free to follow up with any comments. Sorry for the delay in getting to these, but hopefully this new release will bring lots of benefit!

Here's a modified example of your code which seems to work, in case it helps show how to use the new release:

import json
import jsonschema
from referencing import Registry
from referencing.jsonschema import DRAFT7


with open('base.json') as fp:
    base = DRAFT7.create_resource(json.load(fp))

with open('derived.json') as fp:
    derived = DRAFT7.create_resource(json.load(fp))

registry = Registry().with_resources(
    [("base.json", base), ("derived.json", derived)],
)

with open('data.json') as fp:
    data = json.load(fp)


jsonschema.validate(data, base, registry=registry)
jsonschema.validate(data, derived, registry=registry)

@Julian Julian closed this as completed Feb 23, 2023
@krystof-k
Copy link

I can't make this work. The code above returns TypeError: argument of type 'Resource' is not iterable.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[37], line 21
     17 with open('documents.json', 'r') as file:
     18     data = json.loads(file.read())
---> 21 jsonschema.validate(data, base, registry=registry)
     22 jsonschema.validate(data, derived, registry=registry)

File ~/.pyenv/versions/3.10.13/envs/papask/lib/python3.10/site-packages/jsonschema/validators.py:1302, in validate(instance, schema, cls, *args, **kwargs)
   1243 """
   1244 Validate an instance under the given schema.
   1245 
   (...)
   1299     `jsonschema.validators.validates`
   1300 """
   1301 if cls is None:
-> 1302     cls = validator_for(schema)
   1304 cls.check_schema(schema)
   1305 validator = cls(schema, *args, **kwargs)

File ~/.pyenv/versions/3.10.13/envs/papask/lib/python3.10/site-packages/jsonschema/validators.py:1371, in validator_for(schema, default)
   1312 """
   1313 Retrieve the validator class appropriate for validating the given schema.
   1314 
   (...)
   1367 
   1368 """
   1369 DefaultValidator = _LATEST_VERSION if default is _UNSET else default
-> 1371 if schema is True or schema is False or "$schema" not in schema:
   1372     return DefaultValidator
   1373 if schema["$schema"] not in _META_SCHEMAS and default is _UNSET:

TypeError: argument of type 'Resource' is not iterable

@Julian
Copy link
Member

Julian commented Dec 20, 2023

Please open a discussion with your code that reproduces (it sounds like you're passing a Resource where you should be passing a schema).

@krystof-k
Copy link

Thanks, got it, fixing your example from above:

import json
import jsonschema
from referencing import Registry
from referencing.jsonschema import DRAFT7


with open('base.json') as fp:
    base = json.load(fp)
    base_resource = DRAFT7.create_resource(base)

with open('derived.json') as fp:
    derived = json.load(fp)
    derived_resource = DRAFT7.create_resource(derived)

registry = Registry().with_resources(
    [("base.json", base_resource), ("derived.json", derived_resource)],
)

with open('data.json') as fp:
    data = json.load(fp)


jsonschema.validate(data, base, registry=registry)
jsonschema.validate(data, derived, registry=registry)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests