Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Allow oneOf in JSON schemas (with limited support) #982

Merged
merged 8 commits into from
Sep 3, 2024

Conversation

hudson-ai
Copy link
Collaborator

  • oneOf is allowed when only a single schema is provided in the oneOf list
    • should be uncontroversial
  • Fall-back to anyOf if multiple schemas are provided, raising a warning to the user.
    • @riedgar-ms has already voiced direct opposition to this, but I think it warrants more discussion. In particular, the user may be using a JSON schema found "in the wild" and it may be difficult to reach down inside of them and "fix them" according to our limitations. Furthermore, I suspect that treating oneOf as anyOf will not cause any problems in a majority of use-cases. @Harsha-Nori do you have any opinions here?
  • Add id and discriminator to ignored keys, expanding the schemas we support.
    • I believe this is without consequence, but I am not 100% sure for the discriminator case. Take a look at the pydantic documentation for discriminated unions. Note that the model in the example provided on this page uses both oneOf and discriminator when converted to a JSON schema.

@codecov-commenter
Copy link

codecov-commenter commented Aug 14, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.23%. Comparing base (8d63c79) to head (026a885).
Report is 4 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #982       +/-   ##
===========================================
+ Coverage   44.74%   61.23%   +16.49%     
===========================================
  Files          62       62               
  Lines        4392     4401        +9     
===========================================
+ Hits         1965     2695      +730     
+ Misses       2427     1706      -721     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@riedgar-ms
Copy link
Collaborator

As a side note, we can at least have our tests check that the correct warning was issued, but doing this does still make me a bit queasy.

with pytest.warns() as record:
generate_and_check(target_obj, schema_obj)
assert len(record) == 1
assert record[0].message.args[0].startswith("oneOf not fully supported")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for including this check (wrote my 'main' comment before seeing the code).

Copy link
Collaborator

@riedgar-ms riedgar-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that id should be ignored completely. I think that

{
    "type": "object",
    "properties": {
        "id" : {"type": "integer"}
    }
}

should be a perfectly valid schema.

@@ -55,12 +57,14 @@ class Keyword(str, Enum):
IGNORED_KEYS = {
"$schema",
"$id",
"id",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that our problem with id (in the FHIR schema) is only at top level. I think that id is a perfectly fine name for a property on an object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_json_node_keys is only being called on dictionaries that represent full (sub)schemas, which doesn't include the dictionary specified by the properties key of a (sub)schema. Therefore the set of ignored keys should have no impact on what property names are valid -- worth checking if you have doubts :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest adding a test with an object with an id property.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do so :)

"$comment",
"title",
"description",
"default",
"examples",
"required", # TODO: implement and remove from ignored list
"discriminator", # TODO: alternatively we could implement this in a stateful way
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MIght want to add a link to:
https://json-schema.org/blog/posts/validating-openapi-and-json-schema
which mentions how discriminator is part of the OpenAPI dialect of JSON-Schema.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Additionally added a longer comment about how disambiguating the grammar on this field could possibly improve performance or quality.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a discriminator, oneOf is equivalent to anyOf (since options are exclusive).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice observation and lets us more selectively warn the user. Might have to check that the discriminator is actually unique, but otoh maybe it's just up to the user to actually give us a valid schema...

@@ -1981,3 +1981,13 @@ def test_no_additionalProperties(self, compact):
maybe_whitespace=True,
compact=compact,
)

def test_ignored_keys_allowed_as_properties():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@riedgar-ms is this test acceptable to you here? It doesn't explicitly test id; rather it checks against all ignored keys. Happy to specialize it to id if you like that better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better to check them all

@hudson-ai hudson-ai merged commit 958145c into guidance-ai:main Sep 3, 2024
100 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants