Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type guards as assertions #8548

Merged
merged 6 commits into from
May 11, 2016
Merged

Type guards as assertions #8548

merged 6 commits into from
May 11, 2016

Conversation

ahejlsberg
Copy link
Member

@ahejlsberg ahejlsberg commented May 10, 2016

This PR fixes several of the issues we've had around nothing types in control flows. We previously took the view that when the current control flow based type of a variable is disjoint with the type implied by a type guard, we produce type nothing. Effectively, we'd trust that the control flow based type is right and therefore conclude that the type guard must be wrong.

This PR changes the reasoning in favor of the type guard instead. We now say that if narrowing the current control flow type produces nothing, but narrowing the declared type would produce an actual type, then the type guard must be there for a reason (and consequently we must have missed something in the reasoning about the control flow). Therefore, instead of producing nothing, we revert to the declared type and narrow that. Effectively we treat the type guard as an assertion that the condition could be true and respect it if it actually could be true according to the declared type.

Things that could cause the control flow reasoning to disagree with the type guard include an invalid argument passed for a parameter or an assignment to local variable by a nested function. For example:

function foo() {
    let s: string | undefined = undefined
    function inner() {
        // Possibly assign to s
    }
    inner();
    if (s) {
        // Treat s as string instead of nothing
    }
}

Fixes #8513.
Fixes #8429.

@mhegazy
Copy link
Contributor

mhegazy commented May 10, 2016

what about the idea of not using types from nothing branches ?

@ahejlsberg
Copy link
Member Author

@mhegazy The idea of ignoring types from nothing braches is more complicated than it first appears. When computing the control flow based type for a given variable, it isn't enough to just check if the variable is nothing before an assignment as we discussed. A branch might be dead code for other reasons (e.g. a type guard on another variable) and we'd need to analyze that separately. That possibly requires separate passes for loop constructs and creates new opportunities for circularities. Plus it wouldn't actually solve the problem shown in my initial description for this PR (the problem in #8429 which we only sort of solved in cases where variables don't have an initializer or a preceding assignment).

In aggregate I think this new approach is much better. We only produce nothing types when there is no possible way a type guard could be true. In cases where it might be true, we assume it is and do out best to make the types work out.

@DanielRosenwasser
Copy link
Member

I think this is generally a pretty good direction, though I'll probably need to think more on it. I still think it would be worth investigating flagging spots where a condition narrows to nothing if a user is interested.

@mhegazy
Copy link
Contributor

mhegazy commented May 11, 2016

👍

@Arnavion
Copy link
Contributor

Yes, if a branch is inferred to be nothing, it might be dead code. Even more so if the user has added a type guard there. Since the compiler warns about dead code in other forms it would be nice to continue being able to warn about this.

@ahejlsberg
Copy link
Member Author

ahejlsberg commented May 11, 2016

Latest commits add the "revert to declared type" behavior to the ! postfix operator to make it consistent with type guards. This is the previously suggestion option 1 from this discussion. I believe this fixes the issue that still remained in @ivogabe's code.

@ahejlsberg ahejlsberg merged commit 89506c1 into master May 11, 2016
@mhegazy mhegazy deleted the typeGuardAsAssertion branch May 11, 2016 18:29
@malibuzios
Copy link

malibuzios commented May 14, 2016

@ahejlsberg

This seems arbitrary:

function func(x: number | string) {
    x; // 'x' has type 'number | string' here

    if (typeof x !== "number" && typeof x !== "string") {
        x; // why does 'x' get type 'number' here? 
           // why not, say, 'string', 'number | string', or 'nothing' instead?
    }

    x; // 'x' has type 'number | string' here
}

(tested on 1.9.0-dev.20160514)

@ahejlsberg
Copy link
Member Author

Yes, it does seem a bit arbitrary that x gets type number. The reason is that we switch back to the declared type only once we discover that a narrowing operation would produce nothing, and that doesn't happen until the second typeof test. Consider this equivalent code:

if (typeof x !== "number") {
    x;  // string (we're fine here)
    if (typeof x !== "string") {
        x;  // number (we switched back to declared type)
    }
}

The behavior above doesn't seem quite as arbitrary, but from a control flow perspective there's really no difference.

All in all, we're talking about the best possible compromise here. None of this happens as long as you're writing meaningful code and the alternative of nothing exhibits several issues that in aggregate seem worse.

@malibuzios
Copy link

malibuzios commented May 14, 2016

First, I think one confusing part here is why you referred to number, and not number | string as the 'declared type', so perhaps there's some non-ideal terminology here. I believe you mean something like 'origin type', 'source type' or just 'previous type'.

I feel that in order to get a good perspective on this I would need to observe many examples and variations or become familiar with the actual, precise logic that is being used at this point.

Anyway, I tried two more variations and compared them:

Expected:

function func1(x: number | string | boolean) {
    x; // (1) 'x' has type 'number | string | boolean'

    if (typeof x !== "number") {
        x; // (2) 'x' has type 'string | boolean'

        if (typeof x !== "string") {
                x; // (3) 'x' has type 'boolean'
        }
    }
}

Unexpected:

function func2(x: number | string | boolean) {
    x = "hi"; 
    x; // (1) 'x' has type 'string'

    if (typeof x !== "number") {
        x; // (2) 'x' still has type 'string'

        if (typeof x !== "string") {
                x; // (3) 'x' has type 'number | boolean'
        }
    }
}

So it seems that in the second example, the 'surprising' negative assertion for a string type made the compiler somehow suddenly consider number as a valid possibility for (3), unlike in the first example. This is despite the fact that a number typed value was never assigned to x and there wasn't any positive or negative guard for number aside for the one that preceded (2). It looks a bit bizarre. I'm not sure if it makes any sense.

@malibuzios
Copy link

malibuzios commented May 14, 2016

OK, I think I got the idea, it reverts back to the 'declared type' but then applies the negative assertion. So it is more like the Declared Type - negatively asserted type.

I feel that's a very simple (simplistic?) way to approach this scenario. The original intention here seemed to have been to mostly address cases with positive assertions through guards, which seem much simpler.

Negative assertions appear less trivial. I'm trying to think about alternative solutions, hopefully not ones that would require more sophisticated techniques like backtracking etc. - though it may be unavoidable here, I'm not sure at this point.

@malibuzios
Copy link

malibuzios commented May 14, 2016

I think I could isolate the issue here in the following way:

function func2(x: number | string | boolean) {
    x = "hi"; 
    // (1)

    if (typeof x !== "number") {
        // (2)

        if (typeof x !== "string") {
                x; // (3) 'x' has type 'number | boolean'
        }
    }
}

With the new logic, the compiler now makes the 'worst-case' assumption that the mysterious event that caused x not to be a string and eventually pass the if (typeof x !== "string") guard happened somewhere in (2), i.e. after the negative assertion if (typeof x !== "number") was made. However that assumption isn't necessarily correct - it could have happened in (1) as well.

So after looking at it like this, surprisingly I'm starting to think this is somewhat reasonable after all as a rough fallback, for these types of cases.

However the original one, there wasn't any side effect that could (reasonably) possibly happen between the two negative assertions, so I believe it would be better for it to fall back to either number | string or nothing here:

function func(x: number | string) {
    x; // 'x' has type 'number | string' here

    if (typeof x !== "number" && typeof x !== "string") {
        x; // why does 'x' get type 'number' here? 
    }

    x; // 'x' has type 'number | string' here
}

(it would be extremely unusual if x is somehow reassigned (in particular: reassigned to a number!) between the execution of the two typeof predicates - anyway, that would be some serious abuse of the short-circuiting and operator..)

@malibuzios
Copy link

malibuzios commented May 15, 2016

I'll try to 'reboot' once again from a more practical perspective:

function func(x: number | string) {
    if (typeof x !== "number" && typeof x !== "string") {
        // 'x' gets type 'number' here
        //
        // However the programmer's intention was most likely to check if it was
        // null, undefined, or some other type that's not 'number' or 'string'
        // .. 
    }
}

func(<any> undefined); // <any> cast not needed if non-nullability is disabled

The reasoning for this behavior was that it is somehow consistent with nested guards and that the compiler isn't currently 'smart' enough to differentiate between conjunct and nested guards. However, that's not really a satisfying argument - that just explains why it happens, but doesn't show why is that a reasonable outcome for this relatively common scenario (based on objective measures, it most likely isn't).

@malibuzios
Copy link

malibuzios commented May 16, 2016

I had some initial conception of an alternative approach, that's still quite 'rough', but provides somewhat more precise 'awareness' for the soundness, or 'confidence' levels for types. I felt that I'm not in a position to evaluate how useful or suitable something like this would be to augment the current analysis, but I wanted to share it here anyway, in case it does turn out to be of some use (perhaps for even other things like literal types etc.).

Idea sketch: soundness metrics for flow analysis

What if the compiler had implicit traits for analyzed type T being either

  1. sound T
  2. unsound T
  3. sound not T
  4. unsound not T

This includes a granular trait for each individual component of a union. E.g. a union like number | string | boolean which just received a number type through a literal would be internally tracked as:

(sound number, sound not string, sound not boolean)

Some general rules:

  • All unknown or ambient variable or parameter types start as unsound.
  • A variable assigned by a literal receives a sound type.
  • A variable assigned by another variable receives the same type metrics as the type of the secondary variable.
  • A variable tested for a particular type by a sound guard receives a sound trait for the type tested and sound not trait for all other members of the union (an unsound guard's effect is similar but it gives the unsound trait instead).
  • Whenever a function is called, which may have side effects, all variables that could possibly be captured by the function have their analyzed types 'degraded' from sound to unsound (note there's no need to actually analyze every function, or even analyze them at all, e.g. one can assume the worst case where all called functions may reassign all their potential captured variables).

Simple example:

let x: number | string; // x starts as (unsound number, unsound string) 

x = 1; // x is (sound number, sound not string)

function mysteriousFunction() {
    // ..
}

mysteriousFunction(); // all 'sound' types of variables possibly captured by the 
                      // function are degraded to 'unsound' 

x;  // x is (unsound number, unsound not string)

if (typeof x !== "number") { 
    x; // 'x' falls back to (sound not number, unsound not string)
       // which is effectively seen as 'string' for the programmer
}

Another example:

let x: number | string; // x starts as (unsound number, unsound string) 

x = 1; // x is (sound number, sound not string)

if (typeof x !== "number") {
    x; // x changes to (sound not number, sound not string) 
       // which is effectively seen as 'nothing' to the programmer
}

More complex example:

function func(x: number | string | boolean) {

    // x is (unsound number, unsound string, unsound boolean)
    x = "hi";
    // x is (sound not number, sound string, sound not boolean)

    function mysteriousFunction() {
       // ..
    }   
    mysteriousFunction();

    // x is (unsound not number, unsound string, unsound not boolean)

    if (typeof x !== "number") {
        // type of 'x' is (sound not number, unsound string, unsound not boolean)

        if (typeof x !== "string") {
            // type of 'x' is (sound not number, sound not string, unsound not boolean)
            // the type shown to the programmer is 'boolean' - despite the fact it is negated, 
            // the negation isn't considered sound so that's the only remaining option.
        }
    }
}

I cannot predict, at this point, and with my limited knowledge, if, when and how exactly these metrics would/should be applied, though (I mean, actually.. in practice rather than just as an example). I guess it feels too early for that.

[Edits: fixed many small mistakes and inaccuracies]

@malibuzios
Copy link

malibuzios commented May 17, 2016

Just wanted to mention I'm aware of this possibility (as well as other cases like a function being indirectly called using call or apply):

function callTheFunctionInTheBox(box: { secret: Function }) {
    box.secret();
}

function example() {
    let x: number | undefined = 1;

    function imNotCalledDirectly() {
        x = undefined;
    }

    let secretBox = { secret: imNotCalledDirectly };

    callTheFunctionInTheBox(secretBox);
    // x is now undefined
}

However that's not really something that cannot be accounted for as well. Simply assume the worst possible case and always consider a local variable's type as unsound if there's a nested function (that's either named or anonymous, called or uncalled) that has it in scope (actually having the knowledge of whether the function internally references it would be a significant improvement, of course). There are still many useful real-world cases where there are no nested functions at a particular variable's scope.

@microsoft microsoft locked and limited conversation to collaborators Jun 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

why does initialization narrow type?
6 participants