Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep object key order when resolving a schema #481

Closed
marmeladema opened this issue Oct 24, 2018 · 14 comments
Closed

Keep object key order when resolving a schema #481

marmeladema opened this issue Oct 24, 2018 · 14 comments
Labels
Enhancement Some new desired functionality

Comments

@marmeladema
Copy link

As explained in the title, when resolving a schema on Python version (< 3.6) where dict implementation does not keep key insertion ordering, the object key order can be lost. This can be especially problematic for the "properties" object that is usually found in schemas.

I would be glad to provide a PR to fix this. My approach would be to provide a json_kwargs as RefResolver.init argument and forward it to json.loads in resolve_remote, so that users that need this functionality (at least me) could pass the object_pairs_hook argument with an OrderedDict value.

Would you be ok with that ?

@Julian
Copy link
Member

Julian commented Oct 24, 2018

Hey -- RefResolver is the only place that really deals with deserialization yeah -- this is probably related to python-jsonschema/referencing#4, since I imagine a solution for the first would solve this too, though obviously that ticket is more general. I haven't yet fully thought through what a decent interface looks like there, but maybe have a look, since I suspect there's one decent solution that covers both use cases.

Some random brainstorming:

What I'd initially imagine is an argument for a deserializer, and that deserializer being injectable with your version of JSON loading.

But, even better I suspect would be to have <HTTP layer> wrapped by <HTTP to Python Object Layer>, and then only the latter gets passed to RefResolver.

There are after all at least 3 distinct tasks here:

  • Take URL, find handler that knows how to retrieve it
  • Take handler (e.g. HTTP), make request, return response (this is unfortunately right now either requests or the urllib2 module)
  • Take (e.g. HTTP) response, decide how to deserialize it based on configuration + context

And yeah how many objects that is, and whether any or all of them is RefResolver is what I still haven't fully thought through, so comments definitely welcome :)

@marmeladema
Copy link
Author

The other issue is way more general than the problem i am facing right now. Also i am using the resolver locally with file:// scheme, no HTTP server involved.
But as a first step, we could add a deserializer argument to RefResolver and call it in resolve_remote. Default value would still be json.loads.
This way, you could solve the yaml problem and this issue at the same time.
The .json() method from the requests module will not be used anymore though, as all deserialization will use the same code path (which i think if safer anyway).

@Julian
Copy link
Member

Julian commented Oct 24, 2018 via email

@Julian
Copy link
Member

Julian commented Oct 24, 2018 via email

@marmeladema
Copy link
Author

It is, yes, but I don't want 2 (which undoubtedly eventually becomes 3 and
4 and ..) when a decently designed interface solves both issues and seems
likely to solve future ones.

Do you think a PR as a "work-in-progress" base of discussion could help here ? Not meant to be merged of course. Just so that people can comment, and we can go toward finding a solution.

And by the way, requests' json method cannot just be replaced with json.loads unfortunately, which is almost the whole reason we use requests. It implements what the json spec says is a correct unicode decode dance, which json.loads does not.

We can still use the guess_json_utf function from the requests module to handle this dance properly.

@Julian
Copy link
Member

Julian commented Oct 25, 2018 via email

@tobywf
Copy link

tobywf commented Nov 13, 2018

I had a similar use-case; I wanted to add a timeout to the request. Based on RefResolver.resolve_remote, I replaced all calls to Draft7Validator with a factory method that created a validator + RefResolver with custom handlers (they take precedence):

def make_validator(schema, ..., timeout=TIMEOUT_IN_SECONDS):
    # ..., figure out base_uri etc

    def get_with_timeout(uri):
        return requests.get(uri, timeout=timeout).json()

    resolver = RefResolver(
        base_uri=base_uri,
        referrer=schema,
        handlers={"http": get_with_timeout, "https": get_with_timeout},
    )
    return Draft7Validator(schema, resolver=resolver)

This seems to work fine, although if the API settles down, I would just inherit from RefResolver and override resolve_remote to avoid weird fall-through cases.

Your problem is similar. Luckily, on recent requests version (don't know how recent), you can specify extra keyword arguments for .json() and still get all the sanity-preserving encoding handling:

from __future__ import absolute_import, print_function, unicode_literals
import requests
from collections import OrderedDict

url = "https://api.github.com/users/octocat/repos"
requests.get(url).json(object_pairs_hook=OrderedDict)

So you could use also use a function and custom handlers, e.g.:

def get_with_order(uri):
    return requests.get(uri).json(object_pairs_hook=OrderedDict)

Sure, it isn't super clean or optimal, and personally, I'd dump legacy Python and move to Python 3, but maybe this unblocks you?

(Also, if there's a reason not use do this, I'd like to know)

For the broader discussion, being able to mess around with RefResolver would be super useful because it does a lot, but I can understand the reluctance of having to support old designs.

@Julian
Copy link
Member

Julian commented Nov 13, 2018

(Also, if there's a reason not use do this, I'd like to know)
This seems to work fine, although if the API settles down, I would just inherit from RefResolver and override resolve_remote to avoid weird fall-through cases.

Subclassing isn't a supported public API for any classes in jsonschema :), but otherwise this looks reasonable.

For the broader discussion, being able to mess around with RefResolver would be super useful because it does a lot, but I can understand the reluctance of having to support old designs.

Definitely keen to hear if you have any use cases that wouldn't be solved by what's in python-jsonschema/referencing#4!

@marmeladema
Copy link
Author

It would not work for me as i don't not want to provide a handler for a specific protocol, i need ordering whatever the protocol. I can not even fake the handlers argument because it is transformed as dict in the constructor.

@tobywf
Copy link

tobywf commented Nov 13, 2018

Fair enough, was hoping one of those approaches helps, but guess it's a niche use-case. My understanding is JSON specifies objects are unordered.

Anyway, I'll add my thoughts to python-jsonschema/referencing#4 , didn't mean to hijack the issue!

@Julian Julian added the Enhancement Some new desired functionality label Feb 16, 2019
Julian added a commit that referenced this issue Apr 28, 2021
09fd353f Merge pull request #481 from kylef/kylef/time
0ed2e79b Fix negative time test to only fail on a single rule
2edc74b1 Add valid time with different second fractions
7bde0bf7 Add valid time with leap second including offset
ee83f464 Stricter time format constraints
5732904a Merge pull request #480 from json-schema-org/ether/better-test-names
c2994271 better test names for schema-items + additionalItems
6bc53e60 Merge pull request #479 from json-schema-org/fix-non-id-in-enum-for-drafts-6-and-7
3f783d9c fixing draft 6 & 7 non-id tests
5768c68d Merge pull request #476 from json-schema-org/ether/readme-updates
0c8bfc06 add mention of JSON::Schema::Tiny
e4c10c6b fix markdown for underscores in package names
eeb4db18 mention draft2020-12 in readme
dff69dcb Merge pull request #474 from marksparkza/unevaluatedItems-depends-on-contains
51b4977c Merge pull request #478 from sorinsarca/patch-1
dfcd4a19 fix bad comma
4cb100a5 Merge pull request #465 from json-schema-org/ether/more-naive-ref
31dc86bc add another test of naive $ref replacement
f858c613 Merge pull request #477 from json-schema-org/ether/more-items-tests
4e266c34 test that array-items/prefixItems adjusts the starting position for schema-items/additionalItems
b7fced33 Merge pull request #473 from json-schema-org/ether/more-default-tests
eadb9be7 test that a missing property is not populated by the default in the actual instance data
839b95d8 Added opis/json-schema
7cf78800 Add missing comma
3390c871 Update tests/draft2020-12/unevaluatedItems.json
d3b88001 Update tests/draft2020-12/unevaluatedItems.json
84e1d5a9 Add another test case for unevaluatedItems-contains interaction
f400802c Add tests for unevaluatedItems interaction with contains

git-subtree-dir: json
git-subtree-split: 09fd353fc44ab22e7e8998d096b3d6d83287e5e6
@ssbarnea

This comment was marked as off-topic.

@Julian

This comment was marked as off-topic.

@ssbarnea

This comment was marked as off-topic.

@Julian
Copy link
Member

Julian commented Nov 1, 2022

We (the library) no longer support <3.6 so I think this is closeable. If that's not the case and there's still what to address here feel free to follow up.

@Julian Julian closed this as completed Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Some new desired functionality
Projects
None yet
Development

No branches or pull requests

4 participants