-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Losslessly represent omitted fields #344
Comments
Thanks for opening this, this seems like a well thought out feature, and shouldn't be too hard to implement. I think the only open question here would be the naming. I don't like using the name For typing, we could do something like the following to make spelling these fields easier: # This type annotation helper would be in `msgspec`:
# MaybeUnset = Union[T, UnsetType]
from msgspec import Struct, MaybeUnset, UNSET
class SearchFilter(Struct):
title: MaybeUnset[str] = UNSET
assignee: MaybeUnset[str | None] = UNSET I don't love the name Alternatively we could use Thoughts? |
The more I think about this, the more I like using "undefined" here. This would also let us mirror Still not sure about the singleton/type naming/casing. Thoughts here would be very welcome. Possible singleton names: Possible type names: # Just playing around with spellings here
class Example(msgspec.Struct):
field_1: str | msgspec.UndefinedType = msgspec.UNDEFINED
field_2: str | msgspec.Undefined = msgspec.undefined
field_3: str | msgspec.UndefinedType = msgspec.Undefined |
Thanks for your quick response!
👍 Agreed with all of this, if
I worry that calling it "undefined" could lure library users and contributors into pursuing JavaScript semantics, where those semantics aren't necessarily appropriate for a strict and fast serialization and validation library in Python. JSON isn't JavaScript, basically. For instance, doing
Another case in point—this sounds confusing to me. (At least, that's my knee-jerk reaction). If I wanted
Despite all of the above, I'd be totally happy to try out any naming scheme. Like you said, the library's still not at v1.0, so names can change later if practical experience shows that our initial choices were confusing. Thanks again for being responsive to this! |
I've pushed up #350 to fix this. The semantics are pretty much what you describe above, with the singleton named |
Description
I sometimes have JSON objects where the presence or absence of a field is semantically distinct from whether that field's value is
null
.In keeping with
msgspec
's principles of strictness and correctness, I'd like for the library to be able to losslessly and reversibly encode and decode these objects.Example use case
Suppose you have a database of software bugs. Each bug has a
title
, and, optionally, a singleassignee
.You want to use JSON to represent search filters.
This filter would mean "find all issues titled 'App crashes' that are assigned to
@JohnDoe
":And this would mean "find all issues titled 'App crashes' that don't have an assignee":
And this would mean "find all issues titled 'App crashes' regardless of assignee":
In JSON Schema, I would represent this like this:
Proposed API design
Using
msgspec
, I would want to implement the example above something like this:msgspec
:OMITTED
andOMITTED_TYPE
.OMITTED
is a unique sentinel value, distinct fromNone
.OMITTED_TYPE
is the type ofOMITTED
, probably an alias oftyping.Literal[OMITTED]
.OMITTED
evaluate to falsey, likeNone
does.encode()
encounters a value ofOMITTED
, it omits the entire key-value pair from the output.decode()
behavior is unchanged. When it decodes a message that has a certain field missing, it returns that field's default value. In this case, that value can happen to be the sentinel valueOMITTED
.Prior art in other libraries
PEP 655
PEP 655 discusses the same problem for
TypedDict
s.They solve it in a different way. Instead of having a special sentinel value like I'm proposing, they introduce the wrapping types
typing.Required[T]
andtyping.NotRequired[T]
. (These appear to be pass-throughs toT
at run time. I guess you're supposed to use thein
operator to gate any potentially unsafe accesses, but mypy doesn't enforce this today.)They also explicitly reject the name "omittable."
It feels to me like a lot of their arguments don't make sense when applied to
msgspec.Struct
s, as opposed todict
s. But I haven't spent the time to try these things out, or to read the mailing lists and dig into their thinking. Maybe they're right.Pydantic
Pydantic has historically badly conflated these concepts. Planned changes for v2.0 look like they'll bring Pydantic to parity with
msgspec
as it exists today, but they won't address the problem I'm describing here.The text was updated successfully, but these errors were encountered: