-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support required
properties in JSON schemas
#1009
Conversation
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1009 +/- ##
==========================================
- Coverage 70.21% 61.25% -8.97%
==========================================
Files 62 62
Lines 4422 4421 -1
==========================================
- Hits 3105 2708 -397
- Misses 1317 1713 +396 ☔ View full report in Codecov by Sentry. |
Definitely feel like I'm overthinking here. The "worst case" I outlined above should be representable with O(N) terms, e.g. a(,b)?(,c)? |
) + "}" | ||
|
||
@guidance(stateless=True, cache=True) | ||
def _gen_list(lm, *, elements: tuple[GrammarFunction, ...], required: tuple[bool, ...], prefixed: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason for these being Tuples and not Lists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, one of those little should-never-happen-but.... checks, if I'm reading this correctly, elements
and required
must be the same length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're specifically tuples because tuples are immutable and therefore hashable iff their elements are. My recursive solution needs hashable args in order to support caching, which is in turn necessary to prevent the O(2^N) behavior. I suspect this can all be sidestepped with more of a direct dynamic programming approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure about the test changes.
Also, would it be worth adding a test which should drive the naive O(2^n) 'powerset' solution which you're aiming to avoid. Could that grab the actual grammar (rather than the generation), and check that the size hasn't blown up, but has kept to something more reasonable? And then for extra 'fun' work through the powerset of possible combinations?
) + "}" | ||
|
||
@guidance(stateless=True, cache=True) | ||
def _gen_list(lm, *, elements: tuple[GrammarFunction, ...], required: tuple[bool, ...], prefixed: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, one of those little should-never-happen-but.... checks, if I'm reading this correctly, elements
and required
must be the same length?
@@ -1545,7 +1545,8 @@ class TestAdditionalProperties: | |||
}, | |||
"additionalProperties": { | |||
"type": "integer" | |||
} | |||
}, | |||
"required": ["mystr"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this addition mean that our previous generations weren't quite correct? If required
is not required but was always implicitly there, then shouldn't this test have two variants (possibly via an include_required
pytest parameter) with and with out this? Same for the other test below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe (need to check json schema docs) that not including a required
array implicitly says "nothing is required". Our previous behavior was misaligned with this, instead treating all properties as required no matter what was passed (including passing nothing). This isn't technically wrong, as doing so will always produce json that is valid with respect to the schema.
In any case, some of our tests were misspecifications if we actually want to respect the required
argument, namely the negative tests that assert that some properties exist without passing those properties as required.
I only "fixed" the tests that were failing, and I probably missed a few...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely want to explicitly test required
though (e.g. via parameterizing a required
array). Haven't gotten there yet.
Definitely :) |
@riedgar-ms @Harsha-Nori Note that OpenAI's structured output API requires that all properties be... required. It's probably really difficult to avoid the 2^N blowup with an indexed regular grammar (if their approach looks anything like See here: Note that for the big FHIR schema, there are many non-required properties, and the test object you've been working with doesn't have all of the optional fields. So we need something like this PR to make generating that object a valid test. |
My last commit just made the scaling far nicer, even in the uncached case. Inspired by my
comment, notice that for each optional object, we only have to add a branch to the grammar IF it is the first non-empty element. Otherwise we can just wrap it in an optional. In other words, this diagram only gets a new row for each optional element that can possibly "go first". Things are improved further with required keys, e.g. if the first one is required, we only need a single row. I hope that made sense. In any case, new behavior should be to reproduce the "diagram" above if all elements are optional. Caching furthermore reduces the size of this example from |
Most of my above discussion can be ignored -- really just notes to myself while working this out. I am happy with the current scaling behavior (@riedgar-ms I'll add a test that counts the number of unique nodes in the worst-case grammar and asserts that it's "small enough"). Let's focus on behavior now. Will write some tests. |
@riedgar-ms added some tests -- would love your feedback. I simplified the "bad" tests because writing them out would otherwise be extremely tedious, although I admit that asserting failures without asserting something more precise about WHAT is failing is bad form... |
"additionalProperties": True | ||
} | ||
ALL_REQUIRED = ["a", "b", "c"] | ||
SOME_REQUIRED_SUBSETS = [[], ["a"], ["b"], ["c"], ["a", "b"], ["a", "c"], ["b", "c"], ["a", "b", "c"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing, but this looks like an empty list counts as 'some' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Looks like the "some" positive tests make the other positive tests redundant, yeah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd agree that SOME_REQUIRED_SUBSETS
looks a lot like the full powerset, so probably does cover everything. I don't feel strongly, but it might be worth having the 'all' and 'none' cases as separate tests, to make those most basic cases easier to find. As I said, not a very strong feeling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that explicit is better than implicit, and these tests are pretty quick. I'll leave things as they are. Thanks!
schema_obj = {**self.schema_obj, "required": self.NONE_REQUIRED} | ||
generate_and_check({**target_obj, **extra_items}, schema_obj) | ||
|
||
class TestRequiredPropertiesScaling: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertions about N*(N-1)/2 and 2N-1 come from some back of the envelope theoretical calculations I made in the above thread. I seem to be off by a small constant factor in each. Tests are framed as <=
expected just in case someone wants to step in and improve things. These tests may be a bit fragile and overly dependent on private details, but they definitely gave me some peace of mind. Thoughts @riedgar-ms ?
required
properties in JSON schemasrequired
properties in JSON schemas
We currently treat all properties as required, whether or not they are actually in the
required
list.Implementing a grammar that allows an arbitrary list of grammars to be comma-joined where any subset of the grammars are allowed to be omitted based on a provided sequence of booleans wasn't very trivial... leading and trailing commas were hard to avoid without a recursive definition that's O(2^N) in the worst case (where all properties are optional). This is obviously a bad naive behavior.
Caching brings the O(2^N) to something that I think is closer to O(N) (need to verify).
Need to add some extensive tests too.