-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate constrained mapping keys, JSON Schema patternProperties #576
Conversation
Adding a schema for constraining the keys makes sense to me! I wonder if we should use |
No worries at all about any delays, and indeed, thanks for the quick turn on the fix! Again, this PR is just investigating how close I can get to some existing schema patterns, where
Perhaps in addition to: while the This work against StringishContent = Annotated[str | list[str], Meta(description="mimetype output (e.g. text/plain), represented as either an array of strings or a string.")]
StringishMimeBundle = dict[str, StringishContent]
msgspec.json.schema(StringishMimeBundle)
{'type': 'object',
'additionalProperties': {'description': 'mimetype output (e.g. text/plain), represented as either an array of strings or a string.',
'anyOf': [{'type': 'string'},
{'type': 'array', 'items': {'type': 'string'}}]}} In this PR, this works: JsonishMimeType = Annotated[str, Meta(pattern="^application/(.*\\+)?json$")]
JsonishContent = Annotated[dict[str, Any], Meta(description="Mimetypes with JSON output, can be any type")]
JsonishMimeBundle = dict[JsonishMimeType, JsonishContent]
msgspec.json.schema(JsonishMimeBundle)
{'type': 'object',
'patternProperties': {'^application/(.*\\+)?json$': {'description': 'Mimetypes with JSON output, can be any type',
'type': 'object'}}} But trying the union of the two: MimeBundle = StringishMimeBundle | JsonishMimeBundle
msgspec.json.schema(MimeBundle)
# ...
File ~/projects/msgspec_/msgspec/msgspec/inspect.py:725, in _Translator.run(self)
721 def run(self):
722 # First construct a decoder to validate the types are valid
723 from ._core import MsgpackDecoder
--> 725 MsgpackDecoder(Tuple[self.types])
726 return tuple(self.translate(t) for t in self.types)
TypeError: Type unions may not contain more than one dict type - type `dict[str, typing.Annotated[str | list[str], msgspec.Meta(description='mimetype output (e.g. text/plain), represented as either an array of strings or a string.')]] | dict[typing.Annotated[str, msgspec.Meta(pattern='^application/(.*\\+)?json$')], typing.Annotated[dict[str, typing.Any], msgspec.Meta(description='Mimetypes with JSON output, can be any type')]]` is not supported |
I've added support for forwarding string key constraints as
I see. The pattern you're asking for isn't really something that can be spelled with standard python type annotations, since a union outside the dict type specifying means "this dict type OR the other dict type", not a union of their key/value pairs. Using your example above: MimeBundle = StringishMimeBundle | JsonishMimeBundle
# This would accept a dict of stringish mime types, OR a dict of jsonish mimetypes, but NOT one that mixes them
valid1 = {"application/json": jsonish_content1, "application/foo+json": jsonish_content2}
valid2 = {"text/plain": stringish_content1, "other": stringish_content2}
invalid = {"application/json": jsonish_content1, "text/plain": stringish_content1} # IIUC you want to support this The best way to support this in msgspec today would be to define a custom type representing the MimeType container, then encode/decode the contents using extension hooks. Here's a hacked together complete example: import re
from typing import ClassVar, Any
from collections.abc import MutableMapping, Iterator
import msgspec
class MimeBundle(MutableMapping):
patterns: ClassVar[list[tuple[str, Any]]] = [
("^application/(.*\\+)?json$", dict[str, Any]),
("^.*$", str | list[str]),
]
def __init__(self, data: dict[str, Any] | None = None, **kwargs: Any):
self._data = dict(data) if data else {}
self._data.update(kwargs)
def __repr__(self):
return f"MimeBundle({self._data})"
def __getitem__(self, key: str) -> Any:
return self._data[key]
def __setitem__(self, key: str, value: Any) -> None:
self._data[key] = value
def __delitem__(self, key: str) -> None:
del self._data[key]
def __iter__(self) -> Iterator[str]:
return iter(self._data)
def __len__(self) -> int:
return len(self._data)
def __msgspec_encode__(self):
return self._data
@classmethod
def __msgspec_decode__(cls, obj: Any):
if not isinstance(obj, dict):
raise ValueError("Expected an object")
res = cls()
for key, value in obj.items():
for pattern, schema in cls.patterns:
if re.search(pattern, key):
try:
res[key] = msgspec.convert(value, schema)
except msgspec.ValidationError as exc:
raise ValueError(f"Invalid value for {key!r}: {exc}")
break
else:
raise ValueError(f"{key} is not a valid mimetype")
return res
@classmethod
def __msgspec_json_schema__(cls):
return {
"type": "object",
"patternProperties": {
pattern: msgspec.json.schema(schema)
for pattern, schema in cls.patterns
}
}
# Define some hooks to support the custom type.
#
# The implementations for these hooks is completely up to you - here I've opted
# to have them dispatch to methods on the type to keep the implementations
# local to the `MimeBundle` type, but you could just as easily inline the
# implementations here. Currently msgspec doesn't look for any custom methods
# on the types themselves, so the method names used below are arbitrary and
# unique to this example.
def enc_hook(value):
try:
return value.__msgspec_encode__()
except AttributeError:
raise NotImplementedError
def dec_hook(type, obj):
try:
return type.__msgspec_decode__(obj)
except AttributeError:
raise NotImplementedError
def schema_hook(type):
try:
return type.__msgspec_json_schema__()
except AttributeError:
raise NotImplementedError
# --------------------------------------------
# A demo using the functionality defined above
# --------------------------------------------
valid = """
{
"application/json": {"fizz": "buzz"},
"application/foo+json": {"hello": "world"},
"text/plain": "some text",
"other": ["some", "more", "text"]
}
"""
invalid = """
{
"text/plain": "some text",
"other": ["a string", "another string", 123]
}
"""
# Decode into a MimeBundle type
bundle = msgspec.json.decode(valid, type=MimeBundle, dec_hook=dec_hook)
print(bundle)
#> MimeBundle(
#> {
#> 'application/json': {'fizz': 'buzz'},
#> 'application/foo+json': {'hello': 'world'},
#> 'text/plain': 'some text',
#> 'other': ['some', 'more', 'text']
#> }
#> )
# Raise a nice error on an invalid MimeBundle
try:
msgspec.json.decode(invalid, type=MimeBundle, dec_hook=dec_hook)
except Exception as exc:
print(repr(exc))
#> ValidationError('Invalid value for 'other': Expected `str`, got `int` - at `$[2]`')
# Encode a MimeBundle type
encoded = msgspec.json.encode(bundle, enc_hook=enc_hook)
print(encoded)
#> b'{"application/json":{"fizz":"buzz"},"application/foo+json":{"hello":"world"},"text/plain":"some text","other":["some","more","text"]}'
# Generate the JSON Schema for the MimeBundle type
schema = msgspec.json.schema(MimeBundle, schema_hook=schema_hook)
print(schema)
#> {
#> 'type': 'object',
#> 'patternProperties': {
#> '^application/(.*\\+)?json$': {'type': 'object'},
#> '^.*$': {'anyOf': [{'type': 'string'}, {'type': 'array', 'items': {'type': 'string'}}]}
#> }
#> } Hopefully that's enough to get you going. If you have further questions on how to implement patterns like this in msgspec, please don't hesitate to open an issue and ask. For now though I'm going to close this PR. |
Elevator Pitch
This PR explores the current support for constraining the keys of
dict
-like objects, where the key enables some level of "uniqueness" across a single attribute, used as the key of a mapping.Changes
patternProperties
to JSON Schema generationpatternProperties
patternProperties
+additionalProperties
?.json.decode
already Just Worked 🎉.msgpack.decode
decoder does not raise the validation warningMotivation
This mostly revolves around the following kind of extension to the JSON schema example:
Which yields something like:
I've been looking into applying
msgspec
to e.g. the Jupyter notebook format. It makes use of a number of patterns, such as MIME types, etc. as the keys of mappings for constrained, but extensible metadata. I'm not entirely certain what this pattern (ha!) looks like yet, from anannotated-types
(+/-msgspec.Meta
) perspective.