Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonschema 4.10.0 has broken ASDF #981

Closed
WilliamJamieson opened this issue Aug 16, 2022 · 12 comments
Closed

jsonschema 4.10.0 has broken ASDF #981

WilliamJamieson opened this issue Aug 16, 2022 · 12 comments

Comments

@WilliamJamieson
Copy link

The release today of jsonschema 4.10.0 seems to have broken ASDF in quite badly, see asdf-format/asdf#1172. Everything appears to function correctly with jsonschema 4.9.1, and none of the changes between the versions stand out to me as a likely cause of the issue.

There are at least two (possibly related errors) that I am seeing:

  1. A brand new deprecation warning about a meta schema not being found originating here:
    warn(
    (
    "The metaschema specified by $schema was not found. "
    "Using the latest draft to validate, but this will raise "
    "an error in the future."
    ),
    DeprecationWarning,
    stacklevel=2,
    )
    ASDF makes extensive use of a metaschema and it seems that jsonschema is no longer able to find the metaschema ASDF is using.
  2. More seriously, we get the error:
AttributeError: 'list' object has no attribute 'get'

originating here:

def _id_of(schema):
"""
Return the ID of a schema for recent JSON Schema drafts.
"""
if schema is True or schema is False:
return ""
return schema.get("$id", "")
Which again indicates to me that something has gone wrong with how ASDF is communicating schema information with jsonschema.

I suspect there is a common cause of the issues ASDF is experiencing. I appreciate any help that can be offered.

@WilliamJamieson
Copy link
Author

Looking more deeply at the new warning issue I think the change that is breaking ASDF is the change from [attr.evolve](https://github.com/python-attrs/attrs/blob/9f745505190973f4e3ebc8464aee0d1eedbb11cb/src/attr/_funcs.py#L344-L370) to this slightly different evolve method:

def evolve(self, **changes):
schema = changes.setdefault("schema", self.schema)
NewValidator = validator_for(schema, default=Validator)
# Essentially reproduces attr.evolve, but may involve instantiating
# a different class than this one.
for field in attr.fields(Validator):
if not field.init:
continue
attr_name = field.name # To deal with private attributes.
init_name = attr_name if attr_name[0] != "_" else attr_name[1:]
if init_name not in changes:
changes[init_name] = getattr(self, attr_name)
return NewValidator(**changes)

In particular, I suspect the issue is that by using the validator_for:

def validator_for(schema, default=_LATEST_VERSION):
"""
Retrieve the validator class appropriate for validating the given schema.
Uses the :kw:`$schema` keyword that should be present in the given
schema to look up the appropriate validator class.
Arguments:
schema (collections.abc.Mapping or bool):
the schema to look at
default:
the default to return if the appropriate validator class
cannot be determined.
If unprovided, the default is to return the latest supported
draft.
"""
if schema is True or schema is False or "$schema" not in schema:
return default
if schema["$schema"] not in _META_SCHEMAS:
warn(
(
"The metaschema specified by $schema was not found. "
"Using the latest draft to validate, but this will raise "
"an error in the future."
),
DeprecationWarning,
stacklevel=2,
)
return _META_SCHEMAS.get(schema["$schema"], _LATEST_VERSION)
method to get a new class instance, ASDF is getting a Draft202012Validator rather than a Draft4Validator (hence the warning). Currently, ASDF is only compatible with the Draft4 specification.

Are there any suggestions on how I can fix this issue?

@Julian
Copy link
Member

Julian commented Aug 16, 2022

Hi. Please include some information about what reproduces the issue being reported. I'm not familiar with asdf, so ideally it should just be some Python code using this library, in which case I'd be happy to help diagnose.

@WilliamJamieson
Copy link
Author

WilliamJamieson commented Aug 16, 2022

Hi. Please include some information about what reproduces the issue being reported. I'm not familiar with asdf, so ideally it should just be some Python code using this library, in which case I'd be happy to help diagnose.

To reproduce the warning:

from asdf.tests import helpers
import warnings
import asdf

snippit = """
example: !core/complex-1.0.0 1-1j
""".strip()

buff = helpers.yaml_to_asdf(snippit)
af = asdf.AsdfFile(ignore_unrecognized_tag=False, ignore_version_mismatch=False)
with warnings.catch_warnings():
    warnings.simplefilter("error")
    af._open_impl(af, buff, mode="rw")

To reproduce the error:

import asdf

af = asdf.AsdfFile({'a': 3 + 2j})
af.write_to('test.asdf')

ASDF can be installed with

pip install asdf

@Julian
Copy link
Member

Julian commented Aug 17, 2022

Thanks -- I was hoping you'd include code specifically for this library, which makes it easier than having to diagnose asdf itself, which as I mentioned I don't really have any familiarity with. You may also want to have a look at #982 -- are you perhaps subclassing validator classes for some reason?

@Julian
Copy link
Member

Julian commented Aug 17, 2022

OK I got a minute to look, and indeed that seems to be the issue: https://github.com/asdf-format/asdf/blob/e0a6081e82bf69b9ce5f8dfa5b3dee5e85000e26/asdf/schema.py#L261

That's honestly not something that's really ever been supported, though of course without an error saying so it's unsurprising some folks did it :/ (here and #982).

I believe there should be a minor fix that will probably make it work again for you, but I do intend to put the warning in finally and eventually make it error to subclass these -- as I say it's not something I've ever really wanted to support and isn't something I check carefully when making changes. Whatever is being done there should be possible without subclassing, though I'm certainly happy to help figure out how if need be, feel free to ping me. Perhaps I'll close this and we can centralize in #982 if that's OK with you, as I say, I hope I can make what you have work regardless.

@WilliamJamieson
Copy link
Author

Copying comment from #982 (comment)

Unfortunately, even with the latest changes (a8c3b14), ASDF is still broken by jsonschema (CI report: asdf-format/asdf/runs/7884285535?check_suite_focus=true). I think the main error ASDF is getting is:

self = ASDFValidator(schema={}, format_checker=None, _context=<asdf.schema._ValidationContext object at 0x7f48f96dfee0>, ctx=...df.AsdfFile object at 0x7f48f96add30>, serialization_context=<asdf.asdf.SerializationContext object at 0x7f48f96df640>)
changes = {'context': <asdf.schema._ValidationContext object at 0x7f48f96dfee0>, 'ctx': <asdf.asdf.AsdfFile object at 0x7f48f96add30>, 'format_checker': None, 'resolver': <jsonschema.validators.RefResolver object at 0x7f48f96df310>, ...}
cls = <class 'asdf.schema._create_validator.<locals>.ASDFValidator'>
schema = {'$schema': 'http://stsci.edu/schemas/yaml-schema/draft-01[](http://stsci.edu/schemas/yaml-schema/draft-01)', 'additionalProperties': True, 'definitions': {'history-1....ray'}}, 'type': 'object'}}, 'description': 'This schema contains the top-level attributes for every ASDF file.\n', ...}
NewValidator = <class 'jsonschema.validators.Draft202012Validator'>
field = Attribute(name='serialization_context', default=None, validator=None, repr=True, eq=True, eq_key=None, order=True, ord...None, init=True, metadata=mappingproxy({}), type=None, converter=None, kw_only=False, inherited=False, on_setattr=None)
attr_name = 'serialization_context', init_name = 'serialization_context'

    def evolve(self, **changes):
        # Essentially reproduces attr.evolve, but may involve instantiating
        # a different class than this one.
        cls = self.__class__
    
        schema = changes.setdefault("schema", self.schema)
        NewValidator = validator_for(schema, default=cls)
    
        for field in attr.fields(cls):
            if not field.init:
                continue
            attr_name = field.name  # To deal with private attributes.
            init_name = attr_name if attr_name[0] != "_" else attr_name[1:]
            if init_name not in changes:
                changes[init_name] = getattr(self, attr_name)
    
>       return NewValidator(**changes)
E       TypeError: __init__() got an unexpected keyword argument 'context'

as the last item in the traceback. Still with the emitted warning:

DeprecationWarning: The metaschema specified by $schema was not found. Using the latest draft to validate, but this will raise an error in the future.

I am happy to discuss this in a separate issue and/or discuss how to migrate ASDF away from using the unintended API. I

@robherring
Copy link
Contributor

I think I'm seeing similar issues as well. Even if I rework things to not subclass the validator, it seems evolve() looses my custom resolver. That's about as far as I've gotten debugging it.

@robherring
Copy link
Contributor

I think the issue is related to validator_for() not finding the right validator if $schema URI doesn't quite match. Either a trailing '#' or http vs. https or both causes a mismatch. For dtschema, we're on draft2019-09 and now draft2020-12 validator gets selected at some point. I guess now that the validator is selected for each schema file instead of being fixed by the caller, a mismatched URI causes problems.

@Julian
Copy link
Member

Julian commented Aug 18, 2022

Do you have anything I can run to reproduce? The default for validator_for is to reuse the same validator class, so even with broken schemas the behavior should be the same, but clearly there's something happening.

Julian added a commit that referenced this issue Aug 18, 2022
Previously it was only used when $schema was not present, but clearly
it should be used even when it is but the URI is not recognized.

Combined with #981 this *hopefully* now handles the remaining downstream
users with subclasses.
@Julian
Copy link
Member

Julian commented Aug 18, 2022

OK never mind I see the second issue -- which has to do with the subclasses also not using jsonschema.validators.validates to decorate the class (which registers the $schema value). Without it jsonschema doesn't know that that URI is handled by your validator class. Normally this is handled automatically if you don't subclass, but if you do, you'd need to use it.

But, even without that, there was a second bug ("unrelated") which was validator_for not using a default validator when $schema was present in an instance -- in other words combined with the above, this meant that the latest version was being used because jsonschema didn't know what validator was responsible for your $schema.

I've fixed the latter, though yeah you should still stop inheriting from validators :)

@Julian Julian closed this as completed Aug 18, 2022
@WilliamJamieson
Copy link
Author

WilliamJamieson commented Aug 18, 2022

Your fix latest fix seems to fix most of asdf. Though it still breaks a feature (setting default values for objects in schemas) that has been depreciated for awhile. It was slated for removal in the next major asdf version anyways.

The warnings that are now emitted by jsonschema when ASDF is using it will break the CI testing for some of ASDF's important downstream packages. So it looks I need to at least refactor this part of asdf: https://github.com/asdf-format/asdf/blob/16a29afbd9a4044eab0beae5f21e60a669e4dd51/asdf/schema.py#L245 to not inherit from the validators. I'll likely ping you you if I have further questions once I start that work. If you have any suggestions on how to approach this please let me know.

Thank you for your quick assistance.

@Julian
Copy link
Member

Julian commented Aug 19, 2022

Cool. The first step should I think be to decouple your class from Validator -- i.e. if you want a method called iter_errors just start with your own class that looks more like:

@attr.s
class ASDFValidator:
    _validator = attr.ib(factory=lambda: jsonschema.Draft4Validator(whatever))

    def iter_errors(self, *args, **kwargs):
        do_stuff()
        result = self._validator.iter_errors(*args, **kwargs)
        do_more_stuff()
        return asdfresult

but I'd have to look closer to give more specific advice. I may have some time next week, so if you get stuck do ping me (here or on your repo).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants