Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support changing a variable's type #257

Closed
neighthan opened this issue Aug 18, 2020 · 36 comments
Closed

Support changing a variable's type #257

neighthan opened this issue Aug 18, 2020 · 36 comments
Labels
enhancement New feature or request

Comments

@neighthan
Copy link

If I have a variable of known type and want that name to now reference an object of another type, pylance complains. E.g.:

def parse_int(x: str):
    x = int(x)
    return x

gives

Expression of type "int" cannot be assigned to declared type "str"
  "int" is incompatible with "str" Pylance (reportGeneralTypeIssues)

I can see how this would sometimes be useful to catch, but I would personally prefer for this not to be seen as an issue. I did try to explicitly declare the type the second time, to show that I'm intentionally changing it, but that just gives a different error:

def parse_int(x: str):
    x: int = int(x)
    return x
Parameter declaration "x" is obscured by a declaration of the same name Pylance (reportGeneralTypeIssues)

I believe mypy supports an option to allow such type changes; could support for this be added to pylance?

(I'm not sure if this request should be on pylance or pyright; let me know if I'm in the wrong place or if there's already a configuration option I can change somewhere to achieve this)

@erictraut
Copy link
Contributor

The current behavior is correct. You are declaring that the symbol x is type str. Any attempt to assign it a value that is not compatible with this type must be considered a type violation.

The correct way to code this is to choose a new symbol name. Even without type checking, this good coding practice.

@judej judej added enhancement New feature or request needs investigation Could be an issue - needs investigation labels Aug 19, 2020
@github-actions github-actions bot removed the triage label Aug 19, 2020
@judej judej removed enhancement New feature or request needs investigation Could be an issue - needs investigation labels Jan 13, 2021
@judej judej added enhancement New feature or request and removed triage labels Jan 13, 2021
@judej
Copy link
Contributor

judej commented Jan 13, 2021

Closing old issue, please reopen if this is still an issue.

@judej judej closed this as completed Jan 13, 2021
@vlaci
Copy link

vlaci commented Feb 3, 2021

I have just hit this issue as well.

I understand @erictraut 's reasoning. It makes sense and also many languages like C# and Typescript is designed with this reasoning in mind where redeclaration of variables in the same scope is forbidden.

On the other hand, languages like F# and Rust not only allow shadowing local declarations but encourage this design (unsure about Rust, but totally sure about F#). Naming variables is hard, and coming up with new names in cases where different types are involved to represent the same concept can also be error prone.

I have the following use-case in mind (it's more tricky in reality, I have simplified for brevity):
I have types like X to represent untrusted input, and e.g. ProcessedX to represent values which passed validation, and transformed to a canonical representation, etc. I'd like to name such Xs x because they represent the same concept. Shadowing the previous declaration in this case removes the possibility to use the untrusted value after the validation occurred.

@erictraut
Copy link
Contributor

To be clear, if you provide no type declaration for a variable, pyright allows it to take on any value. For example, this is fine:

def parse_int(x):
    x = int(x)
    return x

But once you specify a variable's type, it's the job of a type checker to verify that any assignment to that value matches that type.

def parse_int(x: str): # x is defined as str
    x = int(x) # Error: not a str!
    return x

In the use case you described (validating untrusted input), you can rely on type narrowing:

def process_input(input: object): # Input can be anything
    if isinstance(input, str):
       reveal_type(input) # Type is narrowed to str

You may also be interested in user-defined type guards, which are defined in draft PEP 647 and are already implemented in pylance/pyright.

@saaketp
Copy link

saaketp commented Apr 15, 2021

I have this issue as well. I don't want the type checker to complain when I am explicitly reassigning a new value to the variable.
I want it to complain only if it happens implicitly like when passing arguments to functions.

As @vlaci mentioned above, both F# and Rust allow us to do this:

let x = 5;
let x = "Some string";

And the type checker is smart enough to treat x as int before but String later.
I guess one problem with python is that it doesn't have a "let" keyword to declare that we are explicitly reassigning, so by default it is a good choice to complain like what pyright/pylance does.
But it would also be good to have an setting to allow changing type on explicit reassignment but still check on implicit reassignment.

@erictraut
Copy link
Contributor

If a variable or parameter does not have an explicit type annotation, pyright will allow you to assign (and reassign) any type to it. So this is allowed by pyright:

x = 5
x = "Some String"

I'll note that mypy (another popular Python type checker) does not allow this. It always "fixes" the variable's type based on the first assignment. There's no good reason to be this restrictive (that is, it doesn't provide any additional type safety), so pyright allows more flexibility here.

However, when a variable or parameter is explicitly annotated, it's the job of a type checker to enforce type consistency based on that annotation.

x: int = 5
x = "Some String" # Type consistency error

@saaketp
Copy link

saaketp commented Apr 15, 2021

The problem with the first way (no annotations) is that when defining functions, not giving type annotations is equivalent to just not doing type checking at all (and pylance will complain about Unknown types).
It is good only for local variables that do not depend on the function parameters.

I still think that the default choice of giving type errors in these cases is fine. But there should be a way to disable that by a setting or maybe by explicit type annotation like this:

x: int = 5
x = "Some string"  # Type consistency error
x: str = "Some string"  # Not a type consistency error because I have explicitly annotated

But the third line also gives error saying that the previous declaration x is obscured by a new declaration of same name. This is reported as the diagnostic type "reportGeneralTypeIssues". Disabling that will disable almost all type checking like wrong parameter type passed to function. Maybe this should be a separate type of issue that we can disable in the settings.

@erictraut
Copy link
Contributor

It doesn't make sense to declare a variable as one type and then declare it as a different type within the same scope. Code flow is not always linear. The presence of loops and conditional statements would make the meaning ambiguous.

Consider the following:

x: int = 3

if some_condition:
   x: str = "Some String"

# What type should be enforced at this point? Should this
# be considered legal or illegal? It's ambiguous.
x = 5

while some_other_condition:
    x: bytes = b""

# Same thing here. This is ambiguous.
x = "hi"

For these reasons, it doesn't make sense to allow a variable to be declared with different types within the same scope.

@saaketp
Copy link

saaketp commented Apr 15, 2021

Oh these are good cases that I didn't think about. Thanks.
So the problem is that python conditionals and loops don't create a new scope.
This is a problem Rust and F# didn't have because they create a nested scope for conditionals and loops.

However I just found that mypy has a flag --allow-redefinition
python/mypy#6197
python/mypy#6871
It gets around the problem of loops and conditionals by allowing redefinition only at the same indentation level.
Plus it add extra restriction that the variable should be read once before redefining.
Do you think it would be good to add something like that in pylance?

@erictraut
Copy link
Contributor

Yes, you're correct that most other languages have support for nested scopes. Python's scoping rules are very baroque and internally inconsistent (e.g. most loops don't introduce a new scope but list comprehensions do), but they are what they are.

The "--allow-redefinition" mode was added to mypy as a way to work around the fact that it normally "fixes" the variable's type after the first assignment. Pyright doesn't do that, so there's no reason to work around that issue.

As for allowing a variable to be annotated multiple times with conflicting types, we don't have any intention of supporting that in pyright. It would violate type safety, go against the rules of PEP 484, and undermine a bunch of other assumptions within pyright's core type checking engine.

@Jaakkonen
Copy link

This feature would be really helpful as Pyright is the best language server and type checker for interactive IDE use. Mypy is just too slow for that. I often come across code like

def handle_data(data: bytes) -> List[MyItem]:
  data = data.decode('utf-8').strip()
  # do something with the data that is now a string
  return my_items

It'd be really helpful to be able to disable these warnings on more granular level so that one could use Pyright for their development without trying to convince colleagues to change coding style (when it's not compliant with PEP 484 and other PEPs).

@pydsigner
Copy link

As for allowing a variable to be annotated multiple times with conflicting types, we don't have any intention of supporting that in pyright. It would violate type safety, go against the rules of PEP 484, and undermine a bunch of other assumptions within pyright's core type checking engine.

@erictraut could you quote the relevant portion of PEP 484? I read through the text and tried a few searches but was unable to find any discussion of this subject.

@erictraut
Copy link
Contributor

PEP 484 makes it clear that the job of a type checker is to verify the consistency of types as indicated by type annotations associated with specific symbols.

A type checker is expected to check the body of a checked function for consistency with the given annotations.

If a symbol has a type annotation, all uses of that symbol must be consistent with the annotated type. It makes no sense for a symbol to have different, inconsistent types. If your intent is for a symbol to accommodate different types at different points in the code flow, PEP 484 allows for union types.

Here are some examples that demonstrate why it would be nonsensical to allow for annotating a symbol with different, inconsistent types:

g: int = 4

def func1():
    # What type is g here?
    global g
    g: str = "hi"

# What type is g here?
def func2(cond: bool):
    x: int = 0
    if cond:
        x: str = "hi"
    # What type is x here?
def func3(val: int):
    x: int = 0
    for y in range(val):
       # What type is x here?
        x: str = "hi"

    # What type is x here?

PEP 484 also indicates that it is "strongly inspired by mypy". Mypy is a reference implementation of PEP 484, so in cases where the PEP is somewhat opaque, it's useful to look at what mypy does. As one would expect, mypy doesn't allow an annotated symbol to be redeclared elsewhere with a different, incompatible type annotation. This is true even when the "--allow-redefinition" option is used because this option doesn't apply to explicit annotations.

@pydsigner
Copy link

All three of your examples are indeed nonsensical. However, I do not think they represent the spirit of the original request, or the common use cases in similar veins.

def func1(val: str):
    # type(val) is obviously str
    val: bytes = val.encode()
    # type(val) is obviously and provably bytes, but we get a typecheck complaint.
from typing import Union

def func2(val: Union[str, bytes]):
    # It's impossible to deduce from the signature that val can only be str
    # This should be reported as an error by the typechecker since bytes does not have an .encode() method
    val = val.encode()
def func3(val):
    # type(val) is unknown
    val: bytes = val.encode()
    # type(val) is bytes, but we can't be sure this will work
def func4(val: str):
    # type(val) is str
    # have to find a new, potentially suboptimal name
    val_bytes: bytes = val.encode()
    # type(val_bytes) is bytes;
    # but val is still floating around polluting the namespace;
    # and we'll use extra memory for the indefinite remainder of the function, 
    # rather than time it takes to reassign the variable.
    make_api_call(val_bytes)

There are of course ways to work around these things; it's even possible to split the function into two parts, one doing the type conversion and the other doing the remaining work, but form should follow function rather than vice versa.

@erictraut
Copy link
Contributor

Pylance flags all four of your examples as errors, as it should. Mypy also flags them as errors (although the third one is omitted because mypy skips analysis of a function if its signature is not annotated; if you add a -> None annotation to the end of the signature, it flags all four errors the same as pylance).

The mental model you should use is this... A symbol is a named entity within a scope. If a symbol is annotated with a type, all uses of that symbol must conform to that type. If a symbol is given more than one annotation and they conflict, that is an error. If a symbol has no type annotation, its type is inferred from all assignments to it.

If you want more details, please refer to this documentation.

If you adopt this mental model, it all makes sense and is consistent. Other interpretations would lead to inconsistencies and false negatives in the type checker.

@zroug
Copy link

zroug commented Dec 10, 2021

Mypy supports this using --allow-redefinition https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-allow-redefinition. (For common cases where it makes sense.)

@moi90
Copy link

moi90 commented Jan 21, 2022

I would say func1 is a perfectly reasonable pattern and it should be possible without error. The alternative is (given that the parameter val_str could be large and it is required to free memory as early as possible):

def func1(val_str: str):
    val_bytes: bytes = val.encode()
    del val_str
    ...

This looks completely nonsensical to me.

@beauxq
Copy link

beauxq commented May 7, 2022

def func1(val_str: str):
    val_bytes: bytes = val.encode()
    del val_str
    ...

This looks completely nonsensical to me.

Well, it is nonsensical with the errors you introduced, but it we fix the errors and remove redundant information, it's not nonsensical or difficult to understand at all.

def func1(val: str):
    val_bytes = val.encode()
    del val
    ...

It's less nonsensical than changing the type of a variable, so people have to wonder which type it is at different places in a potentially large function.

Someone might wonder why del is there. But for that, a comment would be appropriate.

def func1(val: str):
    val_bytes = val.encode()
    del val  # val is expected to be large
    ...

Wondering why the del is there is a lot better than wondering which type the variable currently is.

@moi90
Copy link

moi90 commented May 9, 2022

Wondering why the del is there is a lot better than wondering which type the variable currently is.

You might be right there.

@bluenote10
Copy link

bluenote10 commented Jul 18, 2022

Would it be possible to re-consider this issue?

Rationale: Changing the type of a variable is idiomatic Python. Not being able to re-declare a variable with a different type leads to ugly work-arounds resembling Hungarian notation foo_orig_type vs foo_other_type, which certainly isn't idiomatic Python. This makes it unnecessarily harder to type annotate existing Python code. Also it can lead to genuine bugs if you are forced to e.g. introduce foo_validated and you cannot shadow foo -- but you want to prevent access to the unvalidated foo. For these reasons many modern language (e.g. Rust) encourage this shadowing pattern and disallowing type re-declaration feels like an unnecessary remnant of old-school type systems.

Type checking should not get in the way with idiomatic Python, and mypy fully supports this. We have a large code base fully type annotated, and we deliberately use the "allow redefinition" as a global setting in mypy. It is quite annoying that our IDEs don't support that and the code is full of false positive type errors in the IDE only.

@moi90
Copy link

moi90 commented Jul 18, 2022

I fully agree with @bluenote10 !

@beauxq
Copy link

beauxq commented Jul 18, 2022

Python cannot have the same shadowing pattern as Rust, because Rust creates new scopes for if and loops.

a: int = 3
if random() < 0.5:
    a: str = "3"
fun(a)  # what type is a?

This was already pointed out earlier in this thread.

@beauxq
Copy link

beauxq commented Jul 18, 2022

If you want the proposal to be reconsidered, you should respond to the problems that have been pointed out with this proposal.
Don't just state a rationale without addressing the problems that have been pointed out.

@erictraut
Copy link
Contributor

If a local variable has no declared type, pyright allows you to assign any value to it (unlike mypy, which treats the first assignment as an implied type declaration of sorts). However, once you declare a type for a local variable, pyright will enforce that declared type anywhere the variable is used. This is the fundamental job of a type checker, and this behavior is consistently enforced in pyright.

Type declarations do not (and should not) depend on code order. If re-declaration were allowed, the enforced type would require code flow analysis to determine which declaration was "in effect" at any point in the code flow. And even then it would be ambiguous at times because multiple declarations would be reachable from a particular point in the code flow, such as in the example above.

The rules about type declaration enforcement are baked deeply into pyright's implementation and architecture, so it's not something we will be changing.

@mrozekma
Copy link

If this is impossible because of Pyright's internal architecture, I'm really confused how this works fine:

test = str()
print(test)
test = int()
print(test)

Type info will correctly show that test on line 2 is a str and on line 4 is an int, but if you do:

test: str = str()
print(test)
test: int = int()
print(test)

Everything breaks, and apparently this is unavoidable. They're exactly the same, I just provided the type in the second case instead of relying on inference. Even this example that supposedly demonstrates how impossible this is doesn't make sense:

a: int = 3
if random() < 0.5:
    a: str = "3"
fun(a)  # what type is a?

Take off the explicit annotations and Pyright handles it just fine -- a's type is int | str. Trying to code with Pyright enabled has shown me that I redefine variables all the time, but usually because they're being used in two unrelated blocks and are obviously independent; it's only Python's dumb scoping that causes them to theoretically collide. It's gotten to the point where I just try to live without annotations, and if that means the variable ends up Any because the inference can't figure it out, I guess it's going to stay Any.

@erictraut
Copy link
Contributor

@mrozekma, I recommend that you read this documentation. It might help clarify in your mind the difference between type declarations and type inference.

When you declare the type of a symbol, it's the job of the type checker to enforce that declared type. Any attempt to assign an incompatible value is a type violation. That's the core purpose a static type checking. In your examples above, you have attempted to declare the type of test (and a) to be two different contradictory types. From the perspective of static type checking, that doesn't make sense.

@niklasekstrom
Copy link

@erictraut I second what @bluenote10 wrote.

Your argumentation is a bit frustrating tbh. To me, it's very clear what the real world problem is that people want to see addressed, and I also cannot see how fixing this could have any negative effects. The only argument against solving this issue that I've read that I can sympathize with is that this would be complicated to implement given pyright's architecture, but that only leads me to think that pyright's architecture may need to change.

Outline of solution:

  • find points in the code that are "hourglass necks"; any point such that any execution through the code will reach this point, and it will only be reached once (no looping)
  • if a variable is re-declared at such a point then any reference to that variable below that point is for all intents and purposes to a new variable (you could create a new internal shadow variable with a different name)
  • any re-declaration that are done at points that are not hourglass necks points will be flagged as problems

@erictraut
Copy link
Contributor

erictraut commented Jan 20, 2023

@niklasekstrom, I think we're going to need to agree to disagree here. The notion of treating the same symbol as two different symbols based on the "shape" of the code flow graph would be confusing and difficult to explain, as well as much slower for code analysis. We're unlikely to ever adopt a change like this in pyright.

Pyright is a tool that helps you write bug-free code that is easier to maintain. If you buy into those benefits, you should be willing to make some small adjustments to the way you write your code. One of those accommodations is that symbols with different declared types need to be distinct symbols with different names. That's generally good for readability and maintainability because two symbols with different types are, by definition, used for different purposes and should be named accordingly.

@beauxq
Copy link

beauxq commented Jan 20, 2023

I also cannot see how fixing this could have any negative effects

It's not only more work for the type checker, slowing it down. It's also more work for humans reading the code, having to look at more of the code to figure out what type a variable is.

@niklasekstrom
Copy link

It's not only more work for the type checker, slowing it down. It's also more work for humans reading the code, having to look at more of the code to figure out what type a variable is.

Consider the following fragment from a Flask web app:

def get_users(account_id: str):
  if not account_id.isdigit():
    raise RuntimeError("account_id must be a number")
  account_id: int = int(account_id)
  ...

The account_id argument is passed as a string value by the framework. Would you say that it would help a human to understand the code if the re-declaration of account_id had a different name? What would you suggest that the new (or old) variable name should be? I think this very quickly devolves into what @bluenote10 describes as "ugly work-arounds resembling Hungarian notation".

@niklasekstrom, I think we're going to need to agree to disagree here.

Yes, I agree. :)

We're unlikely to ever adopt a change like this in pyright.

Ok that's good to know. This makes me think that someone should start a new project that aims to do type checking while embracing idiomatic Python.

(...) because two symbols with different types are, by definition, used for different purposes and should be named accordingly.

I certainly disagree with this; see the example in the top of this post for a counter example. Furthermore, I agree with the comment by @bluenote10, "Changing the type of a variable is idiomatic Python."

@beauxq
Copy link

beauxq commented Jan 20, 2023

If the account id needs to be int, that means the string version is not an account id, and therefore should not be named account_id.

@niklasekstrom
Copy link

If the account id needs to be int, that means the string version is not an account id, and therefore should not be named account_id.

You are looking at this squarely at the type system abstraction level. The point of having a dynamic language is to avoid this kind of rigid reasoning.

If you raise the abstraction level then you'll see that both the string "123456" and the integer 123456 are two representations of the same account id.

@beauxq
Copy link

beauxq commented Jan 20, 2023

It's not a different level of abstraction.

You're assuming too early that it's an account id. You're calling it an account id before you've even checked to see if all the characters are digits. Even if you had already checked that, I'm pretty sure that not every string of digits represents an account id.

Assumptions like this lead to bugs and security holes. Good programming requires "this kind of rigid reasoning." It can't be avoided.

@bluenote10
Copy link

This makes me think that someone should start a new project that aims to do type checking while embracing idiomatic Python.

Note that mypy does support changing the type of variables, i.e., in contrast to pyright it does not get in the way of idiomatic Python. Since many projects use mypy in their CI pipeline this leads to the awkward situation that their code shows "red wiggly error lines" in VSCode, and yet is correctly typed Python 😞

@erictraut
Copy link
Contributor

Mypy is highly inconsistent in how it handles type declarations for variables. By default, it doesn't allow redeclaration of variables, and it often treats the first assignment as an implied declaration! Pyright is consistent in that it enforces a type declaration if it is present but otherwise allows any value to be assigned to the variable. See this description for details.

If your intent is to allow any value to be assigned to a local variable, omit a type declaration for that variable, and pyright will allow it — and do so in a type safe manner. The only place where this approach breaks down is when dealing with input parameters. There's no syntax available to tell a type checker "enforce this type declaration for callers, but within the function's implementation do not enforce the type declaration". So you need to choose whether you want to omit the type declaration for the parameter (which has downsides) or avoid reusing the parameter name for a different (incompatible) type within the function body.

@PabloLION
Copy link

I just created a new issue #4260 and I think it's related to this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests