-
Notifications
You must be signed in to change notification settings - Fork 768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support changing a variable's type #257
Comments
The current behavior is correct. You are declaring that the symbol The correct way to code this is to choose a new symbol name. Even without type checking, this good coding practice. |
Closing old issue, please reopen if this is still an issue. |
I have just hit this issue as well. I understand @erictraut 's reasoning. It makes sense and also many languages like C# and Typescript is designed with this reasoning in mind where redeclaration of variables in the same scope is forbidden. On the other hand, languages like F# and Rust not only allow shadowing local declarations but encourage this design (unsure about Rust, but totally sure about F#). Naming variables is hard, and coming up with new names in cases where different types are involved to represent the same concept can also be error prone. I have the following use-case in mind (it's more tricky in reality, I have simplified for brevity): |
To be clear, if you provide no type declaration for a variable, pyright allows it to take on any value. For example, this is fine: def parse_int(x):
x = int(x)
return x But once you specify a variable's type, it's the job of a type checker to verify that any assignment to that value matches that type. def parse_int(x: str): # x is defined as str
x = int(x) # Error: not a str!
return x In the use case you described (validating untrusted input), you can rely on type narrowing: def process_input(input: object): # Input can be anything
if isinstance(input, str):
reveal_type(input) # Type is narrowed to str You may also be interested in user-defined type guards, which are defined in draft PEP 647 and are already implemented in pylance/pyright. |
I have this issue as well. I don't want the type checker to complain when I am explicitly reassigning a new value to the variable. As @vlaci mentioned above, both F# and Rust allow us to do this:
And the type checker is smart enough to treat x as int before but String later. |
If a variable or parameter does not have an explicit type annotation, pyright will allow you to assign (and reassign) any type to it. So this is allowed by pyright: x = 5
x = "Some String" I'll note that mypy (another popular Python type checker) does not allow this. It always "fixes" the variable's type based on the first assignment. There's no good reason to be this restrictive (that is, it doesn't provide any additional type safety), so pyright allows more flexibility here. However, when a variable or parameter is explicitly annotated, it's the job of a type checker to enforce type consistency based on that annotation. x: int = 5
x = "Some String" # Type consistency error |
The problem with the first way (no annotations) is that when defining functions, not giving type annotations is equivalent to just not doing type checking at all (and pylance will complain about Unknown types). I still think that the default choice of giving type errors in these cases is fine. But there should be a way to disable that by a setting or maybe by explicit type annotation like this: x: int = 5
x = "Some string" # Type consistency error
x: str = "Some string" # Not a type consistency error because I have explicitly annotated But the third line also gives error saying that the previous declaration x is obscured by a new declaration of same name. This is reported as the diagnostic type "reportGeneralTypeIssues". Disabling that will disable almost all type checking like wrong parameter type passed to function. Maybe this should be a separate type of issue that we can disable in the settings. |
It doesn't make sense to declare a variable as one type and then declare it as a different type within the same scope. Code flow is not always linear. The presence of loops and conditional statements would make the meaning ambiguous. Consider the following: x: int = 3
if some_condition:
x: str = "Some String"
# What type should be enforced at this point? Should this
# be considered legal or illegal? It's ambiguous.
x = 5
while some_other_condition:
x: bytes = b""
# Same thing here. This is ambiguous.
x = "hi" For these reasons, it doesn't make sense to allow a variable to be declared with different types within the same scope. |
Oh these are good cases that I didn't think about. Thanks. However I just found that mypy has a flag --allow-redefinition |
Yes, you're correct that most other languages have support for nested scopes. Python's scoping rules are very baroque and internally inconsistent (e.g. most loops don't introduce a new scope but list comprehensions do), but they are what they are. The "--allow-redefinition" mode was added to mypy as a way to work around the fact that it normally "fixes" the variable's type after the first assignment. Pyright doesn't do that, so there's no reason to work around that issue. As for allowing a variable to be annotated multiple times with conflicting types, we don't have any intention of supporting that in pyright. It would violate type safety, go against the rules of PEP 484, and undermine a bunch of other assumptions within pyright's core type checking engine. |
This feature would be really helpful as Pyright is the best language server and type checker for interactive IDE use. Mypy is just too slow for that. I often come across code like
It'd be really helpful to be able to disable these warnings on more granular level so that one could use Pyright for their development without trying to convince colleagues to change coding style (when it's not compliant with PEP 484 and other PEPs). |
@erictraut could you quote the relevant portion of PEP 484? I read through the text and tried a few searches but was unable to find any discussion of this subject. |
PEP 484 makes it clear that the job of a type checker is to verify the consistency of types as indicated by type annotations associated with specific symbols.
If a symbol has a type annotation, all uses of that symbol must be consistent with the annotated type. It makes no sense for a symbol to have different, inconsistent types. If your intent is for a symbol to accommodate different types at different points in the code flow, PEP 484 allows for union types. Here are some examples that demonstrate why it would be nonsensical to allow for annotating a symbol with different, inconsistent types: g: int = 4
def func1():
# What type is g here?
global g
g: str = "hi"
# What type is g here? def func2(cond: bool):
x: int = 0
if cond:
x: str = "hi"
# What type is x here? def func3(val: int):
x: int = 0
for y in range(val):
# What type is x here?
x: str = "hi"
# What type is x here? PEP 484 also indicates that it is "strongly inspired by mypy". Mypy is a reference implementation of PEP 484, so in cases where the PEP is somewhat opaque, it's useful to look at what mypy does. As one would expect, mypy doesn't allow an annotated symbol to be redeclared elsewhere with a different, incompatible type annotation. This is true even when the "--allow-redefinition" option is used because this option doesn't apply to explicit annotations. |
All three of your examples are indeed nonsensical. However, I do not think they represent the spirit of the original request, or the common use cases in similar veins. def func1(val: str):
# type(val) is obviously str
val: bytes = val.encode()
# type(val) is obviously and provably bytes, but we get a typecheck complaint. from typing import Union
def func2(val: Union[str, bytes]):
# It's impossible to deduce from the signature that val can only be str
# This should be reported as an error by the typechecker since bytes does not have an .encode() method
val = val.encode() def func3(val):
# type(val) is unknown
val: bytes = val.encode()
# type(val) is bytes, but we can't be sure this will work def func4(val: str):
# type(val) is str
# have to find a new, potentially suboptimal name
val_bytes: bytes = val.encode()
# type(val_bytes) is bytes;
# but val is still floating around polluting the namespace;
# and we'll use extra memory for the indefinite remainder of the function,
# rather than time it takes to reassign the variable.
make_api_call(val_bytes) There are of course ways to work around these things; it's even possible to split the function into two parts, one doing the type conversion and the other doing the remaining work, but form should follow function rather than vice versa. |
Pylance flags all four of your examples as errors, as it should. Mypy also flags them as errors (although the third one is omitted because mypy skips analysis of a function if its signature is not annotated; if you add a The mental model you should use is this... A symbol is a named entity within a scope. If a symbol is annotated with a type, all uses of that symbol must conform to that type. If a symbol is given more than one annotation and they conflict, that is an error. If a symbol has no type annotation, its type is inferred from all assignments to it. If you want more details, please refer to this documentation. If you adopt this mental model, it all makes sense and is consistent. Other interpretations would lead to inconsistencies and false negatives in the type checker. |
Mypy supports this using |
I would say def func1(val_str: str):
val_bytes: bytes = val.encode()
del val_str
... This looks completely nonsensical to me. |
Well, it is nonsensical with the errors you introduced, but it we fix the errors and remove redundant information, it's not nonsensical or difficult to understand at all. def func1(val: str):
val_bytes = val.encode()
del val
... It's less nonsensical than changing the type of a variable, so people have to wonder which type it is at different places in a potentially large function. Someone might wonder why def func1(val: str):
val_bytes = val.encode()
del val # val is expected to be large
... Wondering why the |
You might be right there. |
Would it be possible to re-consider this issue? Rationale: Changing the type of a variable is idiomatic Python. Not being able to re-declare a variable with a different type leads to ugly work-arounds resembling Hungarian notation Type checking should not get in the way with idiomatic Python, and mypy fully supports this. We have a large code base fully type annotated, and we deliberately use the "allow redefinition" as a global setting in mypy. It is quite annoying that our IDEs don't support that and the code is full of false positive type errors in the IDE only. |
I fully agree with @bluenote10 ! |
Python cannot have the same shadowing pattern as Rust, because Rust creates new scopes for
This was already pointed out earlier in this thread. |
If you want the proposal to be reconsidered, you should respond to the problems that have been pointed out with this proposal. |
If a local variable has no declared type, pyright allows you to assign any value to it (unlike mypy, which treats the first assignment as an implied type declaration of sorts). However, once you declare a type for a local variable, pyright will enforce that declared type anywhere the variable is used. This is the fundamental job of a type checker, and this behavior is consistently enforced in pyright. Type declarations do not (and should not) depend on code order. If re-declaration were allowed, the enforced type would require code flow analysis to determine which declaration was "in effect" at any point in the code flow. And even then it would be ambiguous at times because multiple declarations would be reachable from a particular point in the code flow, such as in the example above. The rules about type declaration enforcement are baked deeply into pyright's implementation and architecture, so it's not something we will be changing. |
If this is impossible because of Pyright's internal architecture, I'm really confused how this works fine: test = str()
print(test)
test = int()
print(test) Type info will correctly show that test: str = str()
print(test)
test: int = int()
print(test) Everything breaks, and apparently this is unavoidable. They're exactly the same, I just provided the type in the second case instead of relying on inference. Even this example that supposedly demonstrates how impossible this is doesn't make sense: a: int = 3
if random() < 0.5:
a: str = "3"
fun(a) # what type is a? Take off the explicit annotations and Pyright handles it just fine -- |
@mrozekma, I recommend that you read this documentation. It might help clarify in your mind the difference between type declarations and type inference. When you declare the type of a symbol, it's the job of the type checker to enforce that declared type. Any attempt to assign an incompatible value is a type violation. That's the core purpose a static type checking. In your examples above, you have attempted to declare the type of |
@erictraut I second what @bluenote10 wrote. Your argumentation is a bit frustrating tbh. To me, it's very clear what the real world problem is that people want to see addressed, and I also cannot see how fixing this could have any negative effects. The only argument against solving this issue that I've read that I can sympathize with is that this would be complicated to implement given pyright's architecture, but that only leads me to think that pyright's architecture may need to change. Outline of solution:
|
@niklasekstrom, I think we're going to need to agree to disagree here. The notion of treating the same symbol as two different symbols based on the "shape" of the code flow graph would be confusing and difficult to explain, as well as much slower for code analysis. We're unlikely to ever adopt a change like this in pyright. Pyright is a tool that helps you write bug-free code that is easier to maintain. If you buy into those benefits, you should be willing to make some small adjustments to the way you write your code. One of those accommodations is that symbols with different declared types need to be distinct symbols with different names. That's generally good for readability and maintainability because two symbols with different types are, by definition, used for different purposes and should be named accordingly. |
It's not only more work for the type checker, slowing it down. It's also more work for humans reading the code, having to look at more of the code to figure out what type a variable is. |
Consider the following fragment from a Flask web app:
The account_id argument is passed as a string value by the framework. Would you say that it would help a human to understand the code if the re-declaration of account_id had a different name? What would you suggest that the new (or old) variable name should be? I think this very quickly devolves into what @bluenote10 describes as "ugly work-arounds resembling Hungarian notation".
Yes, I agree. :)
Ok that's good to know. This makes me think that someone should start a new project that aims to do type checking while embracing idiomatic Python.
I certainly disagree with this; see the example in the top of this post for a counter example. Furthermore, I agree with the comment by @bluenote10, "Changing the type of a variable is idiomatic Python." |
If the account id needs to be int, that means the string version is not an account id, and therefore should not be named |
You are looking at this squarely at the type system abstraction level. The point of having a dynamic language is to avoid this kind of rigid reasoning. If you raise the abstraction level then you'll see that both the string "123456" and the integer 123456 are two representations of the same account id. |
It's not a different level of abstraction. You're assuming too early that it's an account id. You're calling it an account id before you've even checked to see if all the characters are digits. Even if you had already checked that, I'm pretty sure that not every string of digits represents an account id. Assumptions like this lead to bugs and security holes. Good programming requires "this kind of rigid reasoning." It can't be avoided. |
Note that |
Mypy is highly inconsistent in how it handles type declarations for variables. By default, it doesn't allow redeclaration of variables, and it often treats the first assignment as an implied declaration! Pyright is consistent in that it enforces a type declaration if it is present but otherwise allows any value to be assigned to the variable. See this description for details. If your intent is to allow any value to be assigned to a local variable, omit a type declaration for that variable, and pyright will allow it — and do so in a type safe manner. The only place where this approach breaks down is when dealing with input parameters. There's no syntax available to tell a type checker "enforce this type declaration for callers, but within the function's implementation do not enforce the type declaration". So you need to choose whether you want to omit the type declaration for the parameter (which has downsides) or avoid reusing the parameter name for a different (incompatible) type within the function body. |
I just created a new issue #4260 and I think it's related to this one. |
If I have a variable of known type and want that name to now reference an object of another type, pylance complains. E.g.:
gives
I can see how this would sometimes be useful to catch, but I would personally prefer for this not to be seen as an issue. I did try to explicitly declare the type the second time, to show that I'm intentionally changing it, but that just gives a different error:
I believe mypy supports an option to allow such type changes; could support for this be added to pylance?
(I'm not sure if this request should be on pylance or pyright; let me know if I'm in the wrong place or if there's already a configuration option I can change somewhere to achieve this)
The text was updated successfully, but these errors were encountered: