-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type for heterogeneous dictionaries with string keys #28
Comments
Even though something like this might be useful, after discussing this with several people it seems that everybody agrees that this should be left out from the PEP, but this is a potential feature to add in a later Python release (after experimenting with an actual prototype implementation, etc.). |
We (@JukkaL, @vlasovskikh and I) agreed that this is best left until 3.6, or dropped all together. |
Riffing on the right syntax, I wonder if the way to spell such types shouldn't look more like NamedTuple than like Dict. How about:
Using PEP 526 we could write it like this:
but that syntax remains a pipe dream for most of us (since it only works in Python 3.6). Also I propose not to bother with initial values and not to express in the type whether a given key is mandatory or can be left out. (Note that Optional means it can be None, not that it can be left out.) |
I was going to mention a NamedTuple-style syntax myself, since the original semantics I had in mind were nearly equivalent to NamedTuple:
These semantics are more restrictive than those originally proposed by @JukkaL.
In case it's a useful data point, my current Django codebase wouldn't make use of such subtypes. |
Consider:
Sometimes code defines the actual dict value using
Edit: Added secondary syntax that uses |
The same argument could be made for NamedTuple or TypeVar or NewType. But
when you print them (e.g. when introspecting) the name is handy.
|
That's true. If the type object were to remain available at runtime (as NamedTuple, TypeVar, and NewType do) that would make perfect sense. (In a previous iteration of my thinking, the So I'm onboard with providing the type name as a constructor argument. |
OK, then it's time to play around with the mypy code to see if you can
implement it!
|
Yes indeed. I've spent a good chunk of today on reverse-engineering how mypy works in general and how it handles NamedTuple. Once I have a more detailed implementation proposal, I intend to bring it up in python/mypy#985 since this would be a mypy implementation proposal rather than a typing proposal per-se. |
@JukkaL and I spent some time offline brainstorming about this. Jukka has a lot of additional ideas.
|
There is no need to chose. |
Agreed, but the pre-3.6 syntax needs to be reasonable since that's what almost everybody will see for the forseeable future. |
By partially defined structs, I meant struct with some missing keys. We have at least these options:
|
Structural subtyping would likely also imply that Using this to support adding new keys and automatically inferring a new type is problematic:
The latter example might be beyond the scope of PEP 484, but it's perhaps worth at least considering here. Maybe struct 'extension' as in
|
In addition to the proposed syntax, I think something even more similar to Foo = Struct('Foo', [('x', Optional[List[str]]), ('y', int)]) Although it is a bit verbose, new features could be easily added, e.g. default values: Foo = Struct('Foo', [('x', Optional[List[str]]), ('y', int, 42)])
class Foo(Struct):
x: Optional[List[str]]
y: int = 42 Then also specifying a default value to something special (ellipsis |
Do you propose that in addition to or instead of the Struct('A', x=int)
syntax?
|
I am not sure. On one hand the |
How is the NamedTuple-derived syntax more flexible?
|
It is more flexible in the sense that we can accept 3-tuples (for example with default values) in the list of fields. Such option could be added later as a minimal change. While I don't see how this could be done with the dict-like syntax. |
I guess it could be done using an extra wrapper, e.g. Struct('A',
x=Required[int], y=WithDefault[str, '']).
Although default values seem hard to map to the dict implementation -- you
have to explicitly pass the default to get() if it's not None. A "required"
flag might be useful; it would decide whether a dict with some keys missing
is acceptable or not. (I guess the checker would also have to flag d['x']
as invalid unless you've already checked whether 'x' in d; d.get('x') would
always be valid.)
|
Default values don't seem to fit in seamlessly with our philosophy, as I don't see how they would be useful without a runtime effect, and elsewhere we try to avoid type annotations having a runtime effect. Removing typing-related stuff from a program generally shouldn't affect behavior. |
So the list-of-tuples form has little to recommend it.
…--Guido (mobile)
|
OK, then I agree that we should go with A = Struct('A', x=int, y=Facultative[int])
d = {'x': 1} # type: A # OK
if 'y' in d:
d['y'] += 1 # Also OK |
That a nice syntax with variable annotations. One slight difficulty is Here is a similar syntax that could be implemented with zero-cost semantics:
The preceding would be equivalent to
I could potentially see type-instances of TypedDict (ex: Point2D) being However with a runtime presence you lose the advantages of being able to |
From the have-your-cake-and-eat-it department: maybe it's possible to have
the class structure as suggested but make instantiations just return plain
dicts? That's almost zero runtime cost: if we inherit from dict and don't
define the constructor, the only runtime cost would be looking up the
constructor in the class hierarchy (a few dict lookups at most).
|
I could see that. It occurs to me that my "Should TypedDict support subtyping?" question above was ambiguous. Originally I was only considering the question of structural subtyping. That is, whether or not the following would be accepted:
Again, prior discussion suggests yes. But here are a lot of other crazy ideas that also come to mind in the realm of "subtyping": (1) Extending with an "extends" keyword to TypedDict
(2) Extending with class-based syntax
Although the preceding syntaxes have the feel of nominal subtyping, it is just syntactic suger for fully spelling out all fields. The type system would still use the more general structural subtyping when checking type compatibility. |
As Guido already mentioned, there are various kinds of manipulations with class TDMeta(type):
def __new__(cls, name, bases, ns, *, _root=False):
if _root:
return super().__new__(cls, name, bases, ns)
return lambda x: x
class TypedDict(metaclass=TDMeta, _root=True): ...
class Point2D(TypedDict):
x: int
y: int
assert Point2D({'x': 1, 'y': 2}) == {'x': 1, 'y': 2} However, one will be not able to subclass this (as one cannot subclass |
(Also see the next stage of the implementation, WIP here: python/mypy#2342) |
In regards to subclassing, I don't like the idea of an "extends" keyword or a class decorator, but I think we could rig a metaclass so that at runtime, after class Point1D(TypedDict):
x: int
class Point2D(Point1D):
y: int both class Point2D(TypedDict):
x: int
y: int The type checker should reject isinstance(x, cls) calls using a TypedDict subclass, since at runtime those would all be equivalent to isinstance(x, dict)... |
+1 |
Would a generic version of TypedDict be feasible? There are some strong use cases for this for scientific computing / data science applications:
In both cases, it is quite common to define ad-hoc "types" in applications analogous to |
@shoyer Can you give a few concrete code examples of where this could be useful? |
@shoyer Note that you can already have generic protocols. Together with literal types (that are proposed in another issue) you can just overload class MyFrame(Protocol[T]):
@overload
def __getitem__(self, item: Literal['name']) -> str:
@overload
def __getitem__(self, item: Literal['value']) -> T: or similar. In principle |
The use cases for To build off the example in mypy's docs, you would use As is the case for TypedDict, most of these use cases would also work fine with a dataclass or namedtuple (in this case, where the entries are 1-dimensional arrays), but there are advantages to standardizing on common types and APIs, and using types that can be defined dynamically when desired. In the PyData ecosystem, @ilevkivskyi Yes, I suppose protocols with literals would work for some use cases, but that wouldn't be a very satisfying solution. There are a long list of methods beyond indexing that only take column names found in the DataFrame as valid entries, e.g., to group by a column, plot a column, set an index on a column, data, rename columns, etc. I only have a vague idea what support for writing custom key-value types would look like, but perhaps it would pay dividends, because in some sense this is a generalized version of typing for K = TypeVar('K')
V = TypeVar('V')
class TypedDict(Enumerated[K, V], Dict[str, Any]):
def __getitem__(self, key: K) -> V: ...
class NamedTuple(Enumerated[K, V], namedtuple):
def __getattr__(self, name: K) -> V: ... (Feel free to declare this out of scope for now or push it to another issue -- I don't want to pull |
@shoyer Generalizing |
@shoyer I agree with Jukka here. This should be done via a plugin to mypy. Note that there is a PR python/mypy#4328 that extends the current plugin system (to allow special casing of decorators/base classes/metaclasses). With this new plugin system, a user will be able to write something like this (very approximately): class MyTable(pandas.DataFrame):
id: int
name: str
table: MyTable
table.id # OK, inferred type is 'array[int]'
table['name'] # Also OK, inferred type is 'array[str]' Currently, the author of the mentioned PR is also working on the plugin for |
Hi. Anybody knows what's happening with this( |
@DrPyser it is supported by mypy for a year and half, but it leaves in |
Maybe we should close this issue now? We can create follow-up issues about remaining work that isn't covered by other issues. |
OK, let us close this. I think we have issues about missing features on mypy tracker and the pandas question is unrelated (btw @shoyer the plugin hook I mentioned was added to mypy and |
Hi @DrPyser, I was originally providing a lot of the organizational energy around the original TypedDict design and implementation but my life has gotten super busy over the past year or so. I've been out of the loop long enough that I don't know if anyone else has stepped up to lead the charge on polishing TypedDict to a state that's solid enough to standardize. If not, that would be a valuable role for someone to take on that cares and has time. |
Is there a plan for some required fields on TypedDict? Is there already a dedicated issue? |
We did give this some thought and decided to punt on it. Maybe a volunteer can implement this. The main problem is how to spell it -- we currently have the If you're interested in pursuing this idea I recommend filing a new issue. |
I've recently been reading Python code where heterogeneous dictionary objects are used a lot. I mean cases like this:
The value for key
'x'
must be an integer and the value for key'z'
must be a string. Currently there is no precise way of specifying the type of the argument tofoo
in the above example. In general, we'd have to fall back toDict[str, Any]
,Mapping[str, Union[int, str]]
or similar. This loses a lot of information.However, we could support such types. Here is a potential syntax:
Of course, we could also allow
Dict[dict(x=int, y=str)]
as an equivalent. I don't really love either syntax, though.Alternatively, we could omit
Dict[...]
as redundant:Using type aliases would often be preferred:
These types would use structural subtyping, and missing keys could plausibly be okay. So
Dict[dict(x=int, y=str)]
could be a subtype ofDict[dict(x=int)]
, and vice versa (!).Maybe there should also be a way of deriving subtypes of heterogeneous dictionary types (similar to inheritance) to avoid repetition.
Maybe we'd also want to support
Mapping[...]
variants (for read-only access and covariance).Some existing languages have types resembling these (at least Hack and TypeScript, I think).
The text was updated successfully, but these errors were encountered: