Typed dicts with missing keys #2632

JukkaL · 2017-01-03T21:08:16Z

We don't have a complete story for typed dicts when some keys may be missing. Support for get (#2612) is necessary, but it may not be sufficient. For example, consider code like this:

A = TypedDict('A', {'x': int, 'y': str})
a: A = {'x': 1}

Should it be possible to make this type check without a cast?

The text was updated successfully, but these errors were encountered:

JukkaL · 2017-01-04T13:01:24Z

There are many reasonable use cases where some typed dict keys are only defined some of the time:

If we have a JSON config file, it's reasonable to have some optional keys there, and it's also reasonable that later versions of a program add additional keys that may be missing if an earlier program version created the config file.
Similarly, if JSON is used in a network protocol, some keys may be optional to save some bandwidth. Also, older versions of a client may be missing some keys introduced in later clients.
If JSON is used for serialization, different versions of the serialized object may have different attributes, and we may want to omit some keys with default values to save space.
If we use TypedDict as an argument type, later API versions may support additional keys that are optional to retain backward compatibility.

A simple approach would be have a way to annotate potentially missing keys. By default, a key would still be required. We can't use Optional[...] for this since it has another meaning. Some ideas below:

A = TypedDict('A', {'x': OptionalKey[int]})
A = TypedDict('A', {'x': MayExist[int]})
A = TypedDict('A', {'x': NotRequired[int]})

class A(TypedDict):
    a: OptionalKey[int]
    b: MayExist[int]

This doesn't work:

A = TypedDict('A', {'x?': int})  # doesn't work with the alternative syntax

If some key is optional, it must be accessed using get() or guarded with an in check:

if 'x' in d:
    d['x']   # okay if 'x' is optional
d.get('x')  # okay if 'x' is optional
d['x']  # failure if 'x' is optional

Maybe we also need to track setting optional keys in the conditional type binder:

if 'x' not in d:
    d['x'] = 1
d['x']  # okay even if 'x' is optional, since we set it above

JukkaL · 2017-01-04T13:06:26Z

When receiving JSON objects from an untrusted source, we'd want to check that it conforms to our schema before using it as a typed dict. Otherwise bad things could happen. If optional key constraints are available for introspection, we could implement a helper that checks whether a runtime dictionary object is compatible with a typed dict type. This wouldn't be part of mypy or typing, but it could be a useful separate module.

rowillia · 2017-01-06T21:37:03Z

FWIW hacklang conflates keys with Nullable with Optional Keys - https://docs.hhvm.com/hack/shapes/introduction#accessing-fields__nullable-fields-are-optional

I wouldn't worry too much about schema validation in Mypy. Marshmallow is already great at that (And the hypothetical plugin system would allow them to play nice together 😄)

JukkaL · 2017-01-10T14:18:32Z

@rowillia An approach similar to Hack sounds pretty reasonable for mypy as well. Here's how it could work:

d['x'] works no matter what type 'x' has.
d.get('x') is only allowed if 'x' has an optional type.
Keys with an optional type can be left out when constructing a typed dict instance.

This would clearly be unsafe, though. d['x'] could fail at runtime, in case 'x' is optional. Also, if some external library or service actually expects an explicit None value, it's easy to accidentally omit a required key and cause a runtime error.

Maybe mypy should be able to help catch these issues. Some ideas:

Have a flag for each typed dict specifying how optional types work. There could be three options (or maybe just two of these): 1) Optional keys can be left out, but do not enforce get (perhaps the default). 2) Optional keys can be left out, but enforce using get with optional keys. 3) Keys with optional types cannot be left out.
Have a config file and command line option for enabling strict get checks for all typed dicts. When this option is enabled, get must be used with optional keys. This is a pretty coarse-grained tool, but it might still work reasonably well.
Let users mark fields with optional types as required. The syntax could be {'x': Required[Optional[int]]}. Such a key must always be present, and using get is not allowed. By default we could fall back to the Hack-like approach. Alternative spelling: ValueRequired[Optional[...]].
Like (3), but enforce using get with optional keys (no d['x']).

I'm leaning towards (4), since it seems to provide sufficient flexibility with reasonable defaults, and it would also be safe. Also, I'd probably allow get to be used with arbitrary keys to make checking legacy code easier -- after all, the get call won't fail at runtime even if a key is required.

@davidfstr @gvanrossum What do you think?

davidfstr · 2017-01-11T02:52:38Z

A few quick thoughts:

Agreed that enabling external format checkers to check JSON blobs against TypedDict definitions would be useful.
Distinguishing between Optional items (that can be None) and items that can be omitted altogether feels valuable to me. I don't like the idea of conflating by default.

I don't have a lot of first-hand experience with JSON blobs that have omittable keys, so I don't have especially strong opinions on the semantics for working with them.

JukkaL · 2017-01-17T17:26:36Z

Okay, here's another proposal:

By default, keys must exist. Optional[...] as a value type is not special.
get can be used with arbitrary keys, even those that are declared to always exist. This is for convenience and it doesn't affect safety. The return type for the single-argument version of get is always an optional type, even if the key is supposed to always exist.
Use OptionalItem[t] for keys that may be missing. This is unrelated to Optional[...].
It's possible to use OptionalItem[Optional[t]] for an optional item that may be None.
OptionalItem[...] is only valid if it wraps a typed dict value type. It's an error to use it in any other context.

I chose the name OptionalItem as the name for a few reasons:

It seems better to reuse the term 'optional' instead of inventing a synonym. This is just a different flavor of optional.
This declares an optional item -- both the key and the corresponding value may be missing.

More rationale:

Being an optional item is conceptually orthogonal to accepting a None value, even though the dict.get method somewhat conflates these through the default None value for missing keys.
Having two independent concepts gives the most flexibility.
Having two similar concepts is going to be a little confusing. However, OptionalItem is a very searchable name and we can make sure all official documentation about it mentions how it's different from Optional.

Additional notes:

The join of typed dicts {'x': int} and {'x': int, 'y': str} probably should be {'x': int, 'y': OptionalItem[str]} for convenience. This is not safe, but I doubt that it matters in practice. I can create a separate issue for this with some rationale if we decide to move forward with the rest of this proposal.

gvanrossum · 2017-01-17T18:15:08Z

Hm, that's pretty verbose, and many people probably don't even know whether their code cares about explicit None values vs. missing keys. How about intentionally conflating the two? The rules would be:

Items marked up without Optional are mandatory; d[k] is okay and has the stated type; d.get(k) is okay and has an Optional type
Items marked up with Optional may be None or missing; d[k] is not okay for these; d.get(k) must be used and has an Optional` type
If you really want a mandatory key with an optional type you can write it as Union[None, T]
For two-argument d.get(k, default) the return type is a union of None, the type of default, and the nominal type of d[k]

The one use case that this doesn't cover is when you want to allow the key to be missing, but when present you don't want the value to be None. Is that an important use case? If it is, I will withdraw this and then I am okay with the OptionalItem proposal.

JukkaL · 2017-01-17T18:23:33Z

My gut feeling is that a missing key with the value never being None is not rare. For example, some code I looked at today had code like d.get('x', {}).get('y', 1) which would fail if the return type of get would always include None.

JukkaL · 2017-01-17T18:26:39Z

Schema evolution or versioning is a typical case where it seems natural to have missing keys with non-Optional values. If we add a new item to a JSON dictionary, but we need to be prepared to accept older objects with a missing key (see above for a discussion about potential use cases), the most likely result seems to be a missing key with a non-optional type.

gvanrossum · 2017-01-17T18:38:24Z

OK, I withdraw my proposal. IIRC @rowillia also posted some example code that used d.get(k1, {}).get(k2) so apparently this is a common assumption that we should be able to express. And for that case we would just write OptionalItem[...] so that's quite fine. +1!

davidfstr · 2017-01-17T18:41:24Z

I like the OptionalItem proposal, mainly on the rationale that explicit is better than implicit.

JukkaL · 2017-06-06T14:11:09Z

@gvanrossum I think that this would be nice to have before we make TypedDict official. I looked at Dropbox internal codebases and calls like d.get('foo', ...) were very common (over 10k hits), and I'd expect that they often imply an optional TypedDict item.

ilevkivskyi · 2017-06-06T14:21:06Z

@JukkaL
Sorry for an off-topic question, but there is a very similar question for protocols, we decided to omit this for now, but I am just curious how frequent is something like if hasattr(...) in Dropbox internal codebases?

JukkaL · 2017-06-06T14:48:21Z

@ilevkivskyi We have around 1k instances of if ... hasattr, but some of them probably are unrelated to protocols.

ilevkivskyi · 2017-06-06T14:50:57Z

@JukkaL
Thanks! This is already much less than for TypedDicts, so it looks like the decision to postpone this for protocols is justified.

JukkaL · 2017-06-07T12:13:25Z

We had an offline discussion about the syntax and @ddfisher wasn't happy with Optionaltem[...] because it's so close to Optional[...]. I came up with another alternative, Checked[...]. The idea behind the name is that the user must check for the existence of the item before accessing (or do it indirectly through get). It would look like this:

A = TypedDict('A', {'x': Checked[int]})

class A(TypedDict):
    a: Checked[int]

Pros:

Arguably communicates that you have to check for the existence of the item instead of just using it.
Less likely to be confused with Optional[...]. It's easy to talk about checked vs. optional dictionary items as separate things in docs, error messages and such.
As an adjective it's consistent with Optional[...] (unlike OptionalItem[...] which is a noun phrase).

Cons:

Can potentially be confused with the dictionary object doing the checking, instead of the user of the dictionary. However, if the user understands that typed dicts are just normal dictionaries at runtime, this is probably not too likely.
Checked is a less commonly used term than Optional (though there are precedents, such as checked exceptions in Java). However, the use case is also less common/typical than union-with-None.

gvanrossum · 2017-06-07T15:17:09Z

Hm, Checked[int] looks weird to me. I wonder if we could just have a flag that makes all of a given TypedDict's items optional? The app should be allowed to write td[key] if they're certain that the key exists, or td.get(key [, default]) if they're not.

JukkaL · 2017-06-07T15:43:07Z

Having a per-TypedDict flag might be a reasonable compromise. (Note that in my current implementation td.get('key'[, default]) is always valid, but there's no way to require it to be always used.)

Here are possible semantics in more detail:

If the flag is false (this is the default), both td[key] and td.get(key[, default]) are always accepted.
If the flag is true, then td['key'] would always be rejected without an explicit 'key' in td check. td.get(key[, default]) is always valid.

Possible syntax:

T = TypedDict('T', {'x': int}, partial=True)

def f(t: T) -> None:
    t['x']  # Invalid
    t.get('x')  # Ok
    assert 'x' in t
    t['x']  # Ok

f({})  # Ok
f({'x': 2})  # Ok

Other ideas for the name of the flag:

allow_partial
allow_missing
allow_missing_keys
allow_missing_items
missing_keys
missing_items

Not sure what's the best way to support the class-based syntax. We could perhaps have __partial__ = True in the class body. Alternatively, we could use a different base class such as PartialTypedDict, or another base class such as Partial. Finally, we could use a class decorator such as @partial. My current favorite is @partial:

@partial
class T(TypedDict):
    x: int

ilevkivskyi · 2017-06-07T15:47:36Z

Since the class based syntax works only in Python 3.6 anyway, I could propose another option (more similar to the functional syntax):

class T(TypedDict, partial=True):
    x: int

gvanrossum · 2017-06-07T15:49:09Z

I'm not sure I like "partial" (possible confusion with functools.partial) but I like the class decorator. It even lets users specify a series of required and a series of optional fields, using subclassing (only one would use the class decorator).

JukkaL · 2017-06-07T15:56:36Z

What about 'incomplete' instead of 'partial'? Or total=False or complete=False instead of partial=True? Another option is 'checked', but it doesn't sound quite right in this context.

I kind of like the idea of using the class T(..., keyword=value) syntax -- I had forgotten about it.

gvanrossum · 2017-06-07T16:16:09Z

I like total=False, using the class keyword.

@rowillia

…#3501) Implement a general-purpose way of extending type inference of methods. Also special case TypedDict get and `int.__pow__`. Implement a new plugin system that can handle both module-level functions and methods. This an alternative to #2620 by @rowillia. I borrowed some test cases from that PR. This PR has a few major differences: * Use the plugin system instead of full special casing. * Don't support `d.get('x', {})` as it's not type safe. Once we have #2632 we can add support for this idiom safely. * Code like `f = foo.get` loses the special casing for get. Fixes #2612. Work towards #1240.

JukkaL · 2017-06-08T10:22:15Z

Ok I'll go with total=False as a class keyword for now. This is something we can still iterate on if we ultimately aren't happy with it.

JukkaL · 2017-09-11T09:38:44Z

This was implemented in #3558.

ppo · 2021-01-28T12:29:38Z

So in the end no way to specify which keys can be missing? It's all required or any can be missing. 😞

This seems to work…

Point2D = Union[
    TypedDict('Point2D', {'x': int, 'y': int}),
    TypedDict('Point2D', {'x': int, 'y': int, 'label': str})
]

davidfstr · 2021-01-29T03:27:37Z

@ppo Right now the way to specify some keys as required and some as optional is to have a total=False TypedDict inherit from a total=True TypedDict:

class _Point2DBase(TypedDict):
    x: int
    y: int

class Point2D(_Point2DBase, total=False):
    label: str  # optional

On the typing-sig mailing list there is an active discussion right now about new syntax to mark individual keys as required keys and others as optional keys.

ladyrick · 2021-09-05T15:16:45Z

@davidfstr Thank you for your code. But there are still no error when I try to get an optional key. Any idea about this?

class _Point2DBase(TypedDict):
    x: int
    y: int

class Point2D(_Point2DBase, total=False):
    label: str  # optional

def func(point: Point2D):
    a = point["label"] # suppose to get an error but not

davidfstr · 2021-09-09T00:07:48Z

@ladyrick Offhand I’m not sure why that access to an optional key is allowed.

Kamforka · 2021-11-16T18:21:30Z

Indeed I tried @ladyrick 's snippet and mypy doesn't raise an error to me either.
According to the thread it seems like to be fixed, however in practice it is failing at the moment.

Environment:

ubuntu==20.04
python==3.9.2
mypy==0.910

JukkaL added the topic-typed-dict label Jan 3, 2017

JukkaL mentioned this issue Jan 4, 2017

Implement type-aware get for TypedDict #2620

Closed

JukkaL added needs discussion priority-1-normal labels Jan 16, 2017

JukkaL removed the needs discussion label Jan 18, 2017

JukkaL mentioned this issue Jan 19, 2017

Generate optional items when joining typed dicts #2713

Closed

JukkaL mentioned this issue Jun 6, 2017

Refactor plugin system and special case TypedDict get and int.__pow__ #3501

Merged

JukkaL added priority-0-high and removed priority-1-normal labels Jun 7, 2017

JukkaL self-assigned this Jun 7, 2017

This was referenced Jun 15, 2017

TypedDict syntax variant with keyword arguments #2492

Closed

Support TypedDicts with missing keys (total=False) #3558

Merged

JukkaL closed this as completed Sep 11, 2017

tuukkamustonen mentioned this issue Dec 31, 2020

Inline syntax for defining potentially missing fields in TypedDict #9867

Closed

villebro mentioned this issue Aug 9, 2021

feat(cross-filters): add support for temporal filters apache/superset#16139

Merged

8 tasks

bsmedberg-xometry mentioned this issue Jan 28, 2022

Disallow x["foo"] for NotRequired TypedDict access #12094

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typed dicts with missing keys #2632

Typed dicts with missing keys #2632

JukkaL commented Jan 3, 2017

JukkaL commented Jan 4, 2017

JukkaL commented Jan 4, 2017

rowillia commented Jan 6, 2017

JukkaL commented Jan 10, 2017

davidfstr commented Jan 11, 2017

JukkaL commented Jan 17, 2017

gvanrossum commented Jan 17, 2017

JukkaL commented Jan 17, 2017

JukkaL commented Jan 17, 2017

gvanrossum commented Jan 17, 2017

davidfstr commented Jan 17, 2017

JukkaL commented Jun 6, 2017

ilevkivskyi commented Jun 6, 2017

JukkaL commented Jun 6, 2017

ilevkivskyi commented Jun 6, 2017

JukkaL commented Jun 7, 2017

gvanrossum commented Jun 7, 2017 via email

JukkaL commented Jun 7, 2017

ilevkivskyi commented Jun 7, 2017

gvanrossum commented Jun 7, 2017 via email

JukkaL commented Jun 7, 2017

gvanrossum commented Jun 7, 2017 via email

JukkaL commented Jun 8, 2017

JukkaL commented Sep 11, 2017

ppo commented Jan 28, 2021 •

edited

Loading

davidfstr commented Jan 29, 2021

ladyrick commented Sep 5, 2021

davidfstr commented Sep 9, 2021

Kamforka commented Nov 16, 2021

Typed dicts with missing keys #2632

Typed dicts with missing keys #2632

Comments

JukkaL commented Jan 3, 2017

JukkaL commented Jan 4, 2017

JukkaL commented Jan 4, 2017

rowillia commented Jan 6, 2017

JukkaL commented Jan 10, 2017

davidfstr commented Jan 11, 2017

JukkaL commented Jan 17, 2017

gvanrossum commented Jan 17, 2017

JukkaL commented Jan 17, 2017

JukkaL commented Jan 17, 2017

gvanrossum commented Jan 17, 2017

davidfstr commented Jan 17, 2017

JukkaL commented Jun 6, 2017

ilevkivskyi commented Jun 6, 2017

JukkaL commented Jun 6, 2017

ilevkivskyi commented Jun 6, 2017

JukkaL commented Jun 7, 2017

gvanrossum commented Jun 7, 2017 via email

JukkaL commented Jun 7, 2017

ilevkivskyi commented Jun 7, 2017

gvanrossum commented Jun 7, 2017 via email

JukkaL commented Jun 7, 2017

gvanrossum commented Jun 7, 2017 via email

JukkaL commented Jun 8, 2017

JukkaL commented Sep 11, 2017

ppo commented Jan 28, 2021 • edited Loading

davidfstr commented Jan 29, 2021

ladyrick commented Sep 5, 2021

davidfstr commented Sep 9, 2021

Kamforka commented Nov 16, 2021

ppo commented Jan 28, 2021 •

edited

Loading