Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typed dicts with missing keys #2632

Closed
JukkaL opened this issue Jan 3, 2017 · 29 comments
Closed

Typed dicts with missing keys #2632

JukkaL opened this issue Jan 3, 2017 · 29 comments

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 3, 2017

We don't have a complete story for typed dicts when some keys may be missing. Support for get (#2612) is necessary, but it may not be sufficient. For example, consider code like this:

A = TypedDict('A', {'x': int, 'y': str})
a: A = {'x': 1}

Should it be possible to make this type check without a cast?

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 4, 2017

There are many reasonable use cases where some typed dict keys are only defined some of the time:

  • If we have a JSON config file, it's reasonable to have some optional keys there, and it's also reasonable that later versions of a program add additional keys that may be missing if an earlier program version created the config file.
  • Similarly, if JSON is used in a network protocol, some keys may be optional to save some bandwidth. Also, older versions of a client may be missing some keys introduced in later clients.
  • If JSON is used for serialization, different versions of the serialized object may have different attributes, and we may want to omit some keys with default values to save space.
  • If we use TypedDict as an argument type, later API versions may support additional keys that are optional to retain backward compatibility.

A simple approach would be have a way to annotate potentially missing keys. By default, a key would still be required. We can't use Optional[...] for this since it has another meaning. Some ideas below:

A = TypedDict('A', {'x': OptionalKey[int]})
A = TypedDict('A', {'x': MayExist[int]})
A = TypedDict('A', {'x': NotRequired[int]})

class A(TypedDict):
    a: OptionalKey[int]
    b: MayExist[int]

This doesn't work:

A = TypedDict('A', {'x?': int})  # doesn't work with the alternative syntax

If some key is optional, it must be accessed using get() or guarded with an in check:

if 'x' in d:
    d['x']   # okay if 'x' is optional
d.get('x')  # okay if 'x' is optional
d['x']  # failure if 'x' is optional

Maybe we also need to track setting optional keys in the conditional type binder:

if 'x' not in d:
    d['x'] = 1
d['x']  # okay even if 'x' is optional, since we set it above

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 4, 2017

When receiving JSON objects from an untrusted source, we'd want to check that it conforms to our schema before using it as a typed dict. Otherwise bad things could happen. If optional key constraints are available for introspection, we could implement a helper that checks whether a runtime dictionary object is compatible with a typed dict type. This wouldn't be part of mypy or typing, but it could be a useful separate module.

@rowillia
Copy link
Contributor

rowillia commented Jan 6, 2017

FWIW hacklang conflates keys with Nullable with Optional Keys - https://docs.hhvm.com/hack/shapes/introduction#accessing-fields__nullable-fields-are-optional

I wouldn't worry too much about schema validation in Mypy. Marshmallow is already great at that (And the hypothetical plugin system would allow them to play nice together 😄)

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 10, 2017

@rowillia An approach similar to Hack sounds pretty reasonable for mypy as well. Here's how it could work:

  • d['x'] works no matter what type 'x' has.
  • d.get('x') is only allowed if 'x' has an optional type.
  • Keys with an optional type can be left out when constructing a typed dict instance.

This would clearly be unsafe, though. d['x'] could fail at runtime, in case 'x' is optional. Also, if some external library or service actually expects an explicit None value, it's easy to accidentally omit a required key and cause a runtime error.

Maybe mypy should be able to help catch these issues. Some ideas:

  1. Have a flag for each typed dict specifying how optional types work. There could be three options (or maybe just two of these): 1) Optional keys can be left out, but do not enforce get (perhaps the default). 2) Optional keys can be left out, but enforce using get with optional keys. 3) Keys with optional types cannot be left out.
  2. Have a config file and command line option for enabling strict get checks for all typed dicts. When this option is enabled, get must be used with optional keys. This is a pretty coarse-grained tool, but it might still work reasonably well.
  3. Let users mark fields with optional types as required. The syntax could be {'x': Required[Optional[int]]}. Such a key must always be present, and using get is not allowed. By default we could fall back to the Hack-like approach. Alternative spelling: ValueRequired[Optional[...]].
  4. Like (3), but enforce using get with optional keys (no d['x']).

I'm leaning towards (4), since it seems to provide sufficient flexibility with reasonable defaults, and it would also be safe. Also, I'd probably allow get to be used with arbitrary keys to make checking legacy code easier -- after all, the get call won't fail at runtime even if a key is required.

@davidfstr @gvanrossum What do you think?

@davidfstr
Copy link
Contributor

A few quick thoughts:

  • Agreed that enabling external format checkers to check JSON blobs against TypedDict definitions would be useful.
  • Distinguishing between Optional items (that can be None) and items that can be omitted altogether feels valuable to me. I don't like the idea of conflating by default.

I don't have a lot of first-hand experience with JSON blobs that have omittable keys, so I don't have especially strong opinions on the semantics for working with them.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 17, 2017

Okay, here's another proposal:

  • By default, keys must exist. Optional[...] as a value type is not special.
  • get can be used with arbitrary keys, even those that are declared to always exist. This is for convenience and it doesn't affect safety. The return type for the single-argument version of get is always an optional type, even if the key is supposed to always exist.
  • Use OptionalItem[t] for keys that may be missing. This is unrelated to Optional[...].
  • It's possible to use OptionalItem[Optional[t]] for an optional item that may be None.
  • OptionalItem[...] is only valid if it wraps a typed dict value type. It's an error to use it in any other context.

I chose the name OptionalItem as the name for a few reasons:

  • It seems better to reuse the term 'optional' instead of inventing a synonym. This is just a different flavor of optional.
  • This declares an optional item -- both the key and the corresponding value may be missing.

More rationale:

  • Being an optional item is conceptually orthogonal to accepting a None value, even though the dict.get method somewhat conflates these through the default None value for missing keys.
  • Having two independent concepts gives the most flexibility.
  • Having two similar concepts is going to be a little confusing. However, OptionalItem is a very searchable name and we can make sure all official documentation about it mentions how it's different from Optional.

Additional notes:

  • The join of typed dicts {'x': int} and {'x': int, 'y': str} probably should be {'x': int, 'y': OptionalItem[str]} for convenience. This is not safe, but I doubt that it matters in practice. I can create a separate issue for this with some rationale if we decide to move forward with the rest of this proposal.

@gvanrossum
Copy link
Member

Hm, that's pretty verbose, and many people probably don't even know whether their code cares about explicit None values vs. missing keys. How about intentionally conflating the two? The rules would be:

  • Items marked up without Optional are mandatory; d[k] is okay and has the stated type; d.get(k) is okay and has an Optional type
  • Items marked up with Optional may be None or missing; d[k] is not okay for these; d.get(k) must be used and has an Optional` type
  • If you really want a mandatory key with an optional type you can write it as Union[None, T]
  • For two-argument d.get(k, default) the return type is a union of None, the type of default, and the nominal type of d[k]

The one use case that this doesn't cover is when you want to allow the key to be missing, but when present you don't want the value to be None. Is that an important use case? If it is, I will withdraw this and then I am okay with the OptionalItem proposal.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 17, 2017

My gut feeling is that a missing key with the value never being None is not rare. For example, some code I looked at today had code like d.get('x', {}).get('y', 1) which would fail if the return type of get would always include None.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 17, 2017

Schema evolution or versioning is a typical case where it seems natural to have missing keys with non-Optional values. If we add a new item to a JSON dictionary, but we need to be prepared to accept older objects with a missing key (see above for a discussion about potential use cases), the most likely result seems to be a missing key with a non-optional type.

@gvanrossum
Copy link
Member

OK, I withdraw my proposal. IIRC @rowillia also posted some example code that used d.get(k1, {}).get(k2) so apparently this is a common assumption that we should be able to express. And for that case we would just write OptionalItem[...] so that's quite fine. +1!

@davidfstr
Copy link
Contributor

I like the OptionalItem proposal, mainly on the rationale that explicit is better than implicit.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jun 6, 2017

@gvanrossum I think that this would be nice to have before we make TypedDict official. I looked at Dropbox internal codebases and calls like d.get('foo', ...) were very common (over 10k hits), and I'd expect that they often imply an optional TypedDict item.

@ilevkivskyi
Copy link
Member

@JukkaL
Sorry for an off-topic question, but there is a very similar question for protocols, we decided to omit this for now, but I am just curious how frequent is something like if hasattr(...) in Dropbox internal codebases?

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jun 6, 2017

@ilevkivskyi We have around 1k instances of if ... hasattr, but some of them probably are unrelated to protocols.

@ilevkivskyi
Copy link
Member

@JukkaL
Thanks! This is already much less than for TypedDicts, so it looks like the decision to postpone this for protocols is justified.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jun 7, 2017

We had an offline discussion about the syntax and @ddfisher wasn't happy with Optionaltem[...] because it's so close to Optional[...]. I came up with another alternative, Checked[...]. The idea behind the name is that the user must check for the existence of the item before accessing (or do it indirectly through get). It would look like this:

A = TypedDict('A', {'x': Checked[int]})

class A(TypedDict):
    a: Checked[int]

Pros:

  • Arguably communicates that you have to check for the existence of the item instead of just using it.
  • Less likely to be confused with Optional[...]. It's easy to talk about checked vs. optional dictionary items as separate things in docs, error messages and such.
  • As an adjective it's consistent with Optional[...] (unlike OptionalItem[...] which is a noun phrase).

Cons:

  • Can potentially be confused with the dictionary object doing the checking, instead of the user of the dictionary. However, if the user understands that typed dicts are just normal dictionaries at runtime, this is probably not too likely.
  • Checked is a less commonly used term than Optional (though there are precedents, such as checked exceptions in Java). However, the use case is also less common/typical than union-with-None.

@JukkaL JukkaL self-assigned this Jun 7, 2017
@gvanrossum
Copy link
Member

gvanrossum commented Jun 7, 2017 via email

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jun 7, 2017

Having a per-TypedDict flag might be a reasonable compromise. (Note that in my current implementation td.get('key'[, default]) is always valid, but there's no way to require it to be always used.)

Here are possible semantics in more detail:

  • If the flag is false (this is the default), both td[key] and td.get(key[, default]) are always accepted.

  • If the flag is true, then td['key'] would always be rejected without an explicit 'key' in td check. td.get(key[, default]) is always valid.

Possible syntax:

T = TypedDict('T', {'x': int}, partial=True)

def f(t: T) -> None:
    t['x']  # Invalid
    t.get('x')  # Ok
    assert 'x' in t
    t['x']  # Ok

f({})  # Ok
f({'x': 2})  # Ok

Other ideas for the name of the flag:

  • allow_partial
  • allow_missing
  • allow_missing_keys
  • allow_missing_items
  • missing_keys
  • missing_items

Not sure what's the best way to support the class-based syntax. We could perhaps have __partial__ = True in the class body. Alternatively, we could use a different base class such as PartialTypedDict, or another base class such as Partial. Finally, we could use a class decorator such as @partial. My current favorite is @partial:

@partial
class T(TypedDict):
    x: int

@ilevkivskyi
Copy link
Member

Since the class based syntax works only in Python 3.6 anyway, I could propose another option (more similar to the functional syntax):

class T(TypedDict, partial=True):
    x: int

@gvanrossum
Copy link
Member

gvanrossum commented Jun 7, 2017 via email

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jun 7, 2017

What about 'incomplete' instead of 'partial'? Or total=False or complete=False instead of partial=True? Another option is 'checked', but it doesn't sound quite right in this context.

I kind of like the idea of using the class T(..., keyword=value) syntax -- I had forgotten about it.

@gvanrossum
Copy link
Member

gvanrossum commented Jun 7, 2017 via email

JukkaL added a commit that referenced this issue Jun 7, 2017
…#3501)

Implement a general-purpose way of extending type inference of
methods. Also special case TypedDict get and `int.__pow__`.
Implement a new plugin system that can handle both module-level 
functions and methods.

This an alternative to #2620 by @rowillia. I borrowed some test
cases from that PR. This PR has a few major differences:

* Use the plugin system instead of full special casing.
* Don't support `d.get('x', {})` as it's not type safe. Once we
  have #2632 we can add support for this idiom safely.
* Code like `f = foo.get` loses the special casing for get.

Fixes #2612. Work towards #1240.
@JukkaL
Copy link
Collaborator Author

JukkaL commented Jun 8, 2017

Ok I'll go with total=False as a class keyword for now. This is something we can still iterate on if we ultimately aren't happy with it.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Sep 11, 2017

This was implemented in #3558.

@ppo
Copy link

ppo commented Jan 28, 2021

So in the end no way to specify which keys can be missing? It's all required or any can be missing. 😞

This seems to work…

Point2D = Union[
    TypedDict('Point2D', {'x': int, 'y': int}),
    TypedDict('Point2D', {'x': int, 'y': int, 'label': str})
]

@davidfstr
Copy link
Contributor

@ppo Right now the way to specify some keys as required and some as optional is to have a total=False TypedDict inherit from a total=True TypedDict:

class _Point2DBase(TypedDict):
    x: int
    y: int

class Point2D(_Point2DBase, total=False):
    label: str  # optional

On the typing-sig mailing list there is an active discussion right now about new syntax to mark individual keys as required keys and others as optional keys.

@ladyrick
Copy link

ladyrick commented Sep 5, 2021

@davidfstr Thank you for your code. But there are still no error when I try to get an optional key. Any idea about this?

class _Point2DBase(TypedDict):
    x: int
    y: int

class Point2D(_Point2DBase, total=False):
    label: str  # optional

def func(point: Point2D):
    a = point["label"] # suppose to get an error but not

@davidfstr
Copy link
Contributor

@ladyrick Offhand I’m not sure why that access to an optional key is allowed.

@Kamforka
Copy link

Indeed I tried @ladyrick 's snippet and mypy doesn't raise an error to me either.
According to the thread it seems like to be fixed, however in practice it is failing at the moment.

Environment:

ubuntu==20.04
python==3.9.2
mypy==0.910

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants