Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support required properties in JSON schemas #1009

Merged
merged 11 commits into from
Sep 6, 2024

Conversation

hudson-ai
Copy link
Collaborator

@hudson-ai hudson-ai commented Sep 5, 2024

We currently treat all properties as required, whether or not they are actually in the required list.

Implementing a grammar that allows an arbitrary list of grammars to be comma-joined where any subset of the grammars are allowed to be omitted based on a provided sequence of booleans wasn't very trivial... leading and trailing commas were hard to avoid without a recursive definition that's O(2^N) in the worst case (where all properties are optional). This is obviously a bad naive behavior.

Caching brings the O(2^N) to something that I think is closer to O(N) (need to verify).

Need to add some extensive tests too.

@codecov-commenter
Copy link

codecov-commenter commented Sep 5, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 96.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 61.25%. Comparing base (5eaf240) to head (db01450).

Files with missing lines Patch % Lines
guidance/library/_json.py 96.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1009      +/-   ##
==========================================
- Coverage   70.21%   61.25%   -8.97%     
==========================================
  Files          62       62              
  Lines        4422     4421       -1     
==========================================
- Hits         3105     2708     -397     
- Misses       1317     1713     +396     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hudson-ai
Copy link
Collaborator Author

Definitely feel like I'm overthinking here. The "worst case" I outlined above should be representable with O(N) terms, e.g.

a(,b)?(,c)?
b(,c)?
c?

) + "}"

@guidance(stateless=True, cache=True)
def _gen_list(lm, *, elements: tuple[GrammarFunction, ...], required: tuple[bool, ...], prefixed: bool = False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason for these being Tuples and not Lists?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, one of those little should-never-happen-but.... checks, if I'm reading this correctly, elements and required must be the same length?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're specifically tuples because tuples are immutable and therefore hashable iff their elements are. My recursive solution needs hashable args in order to support caching, which is in turn necessary to prevent the O(2^N) behavior. I suspect this can all be sidestepped with more of a direct dynamic programming approach.

Copy link
Collaborator

@riedgar-ms riedgar-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure about the test changes.

Also, would it be worth adding a test which should drive the naive O(2^n) 'powerset' solution which you're aiming to avoid. Could that grab the actual grammar (rather than the generation), and check that the size hasn't blown up, but has kept to something more reasonable? And then for extra 'fun' work through the powerset of possible combinations?

) + "}"

@guidance(stateless=True, cache=True)
def _gen_list(lm, *, elements: tuple[GrammarFunction, ...], required: tuple[bool, ...], prefixed: bool = False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, one of those little should-never-happen-but.... checks, if I'm reading this correctly, elements and required must be the same length?

@@ -1545,7 +1545,8 @@ class TestAdditionalProperties:
},
"additionalProperties": {
"type": "integer"
}
},
"required": ["mystr"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this addition mean that our previous generations weren't quite correct? If required is not required but was always implicitly there, then shouldn't this test have two variants (possibly via an include_required pytest parameter) with and with out this? Same for the other test below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe (need to check json schema docs) that not including a required array implicitly says "nothing is required". Our previous behavior was misaligned with this, instead treating all properties as required no matter what was passed (including passing nothing). This isn't technically wrong, as doing so will always produce json that is valid with respect to the schema.

In any case, some of our tests were misspecifications if we actually want to respect the required argument, namely the negative tests that assert that some properties exist without passing those properties as required.

I only "fixed" the tests that were failing, and I probably missed a few...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely want to explicitly test required though (e.g. via parameterizing a required array). Haven't gotten there yet.

@hudson-ai
Copy link
Collaborator Author

Not quite sure about the test changes.

Also, would it be worth adding a test which should drive the naive O(2^n) 'powerset' solution which you're aiming to avoid. Could that grab the actual grammar (rather than the generation), and check that the size hasn't blown up, but has kept to something more reasonable? And then for extra 'fun' work through the powerset of possible combinations?

Definitely :)

@hudson-ai
Copy link
Collaborator Author

hudson-ai commented Sep 5, 2024

@riedgar-ms @Harsha-Nori Note that OpenAI's structured output API requires that all properties be... required. It's probably really difficult to avoid the 2^N blowup with an indexed regular grammar (if their approach looks anything like outlines).

See here:
https://platform.openai.com/docs/guides/structured-outputs/all-fields-must-be-required

Note that for the big FHIR schema, there are many non-required properties, and the test object you've been working with doesn't have all of the optional fields. So we need something like this PR to make generating that object a valid test.

@hudson-ai
Copy link
Collaborator Author

hudson-ai commented Sep 5, 2024

My last commit just made the scaling far nicer, even in the uncached case. Inspired by my

a(,b)?(,c)?
b(,c)?
c?

comment, notice that for each optional object, we only have to add a branch to the grammar IF it is the first non-empty element. Otherwise we can just wrap it in an optional. In other words, this diagram only gets a new row for each optional element that can possibly "go first". Things are improved further with required keys, e.g. if the first one is required, we only need a single row.

I hope that made sense. In any case, new behavior should be to reproduce the "diagram" above if all elements are optional.

Caching furthermore reduces the size of this example from N+(N-1)+...+1 = N(N-1)/2 = O(N^2) nodes (already a big improvement on O(2^N)) down to N + (N-1) = 2N-1 = O(N) nodes (N nodes in the first row, one additional cache-miss node in each of the subsequent N-1 rows). Somebody want to sanity check me on this?

@hudson-ai
Copy link
Collaborator Author

Most of my above discussion can be ignored -- really just notes to myself while working this out. I am happy with the current scaling behavior (@riedgar-ms I'll add a test that counts the number of unique nodes in the worst-case grammar and asserts that it's "small enough").

Let's focus on behavior now. Will write some tests.

@hudson-ai
Copy link
Collaborator Author

@riedgar-ms added some tests -- would love your feedback. I simplified the "bad" tests because writing them out would otherwise be extremely tedious, although I admit that asserting failures without asserting something more precise about WHAT is failing is bad form...

"additionalProperties": True
}
ALL_REQUIRED = ["a", "b", "c"]
SOME_REQUIRED_SUBSETS = [[], ["a"], ["b"], ["c"], ["a", "b"], ["a", "c"], ["b", "c"], ["a", "b", "c"]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing, but this looks like an empty list counts as 'some' ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Looks like the "some" positive tests make the other positive tests redundant, yeah?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree that SOME_REQUIRED_SUBSETS looks a lot like the full powerset, so probably does cover everything. I don't feel strongly, but it might be worth having the 'all' and 'none' cases as separate tests, to make those most basic cases easier to find. As I said, not a very strong feeling.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that explicit is better than implicit, and these tests are pretty quick. I'll leave things as they are. Thanks!

schema_obj = {**self.schema_obj, "required": self.NONE_REQUIRED}
generate_and_check({**target_obj, **extra_items}, schema_obj)

class TestRequiredPropertiesScaling:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertions about N*(N-1)/2 and 2N-1 come from some back of the envelope theoretical calculations I made in the above thread. I seem to be off by a small constant factor in each. Tests are framed as <= expected just in case someone wants to step in and improve things. These tests may be a bit fragile and overly dependent on private details, but they definitely gave me some peace of mind. Thoughts @riedgar-ms ?

@hudson-ai hudson-ai changed the title [WIP] support required properties in JSON schemas [Feature] Support required properties in JSON schemas Sep 6, 2024
@hudson-ai hudson-ai merged commit ed7e8a7 into guidance-ai:main Sep 6, 2024
100 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants