Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime check of ascribed types #6790

Merged
merged 36 commits into from
May 30, 2023
Merged

Conversation

JaroslavTulach
Copy link
Member

@JaroslavTulach JaroslavTulach commented May 22, 2023

Pull Request Description

Implements #6682 - by adding a (raw) type check at each ascribed type.

Important Notes

The check is only performed for (x : X) ascription - both in constructor definitions as well as function definitions. That means the check is opt-in for functions - as our functions don't use such ascription so far or only at few places.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • All code follows the
    Scala,
    Java,
    style guides.
  • All code has been tested and improved:
    • Unit tests have been written where possible.
    • Type error: expected column to be Text | Integer, but got Column. Done in: 89a5150
    • QA: verify positive effect on UX in the IDE - it has positive effect

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented May 22, 2023

ConversionMethodsTest is failing for polyglot Map values. The Enso code is:

 polyglot java import java.util.Map as Java_Map
 import Standard.Base.Data.Map.Map

 type Foo
    Mk_Foo data

 Foo.from (that:Map) = Foo.Mk_Foo that

 main =
     jmap = Java_Map.of "A" 1 "B" 2 "C" 3
     Foo.from jmap . data . size

the problem is that is java.util.Map and not Enso Map and the type check fails. CCing @Akirathan - there may be some inconsistency:

polyglot java import java.util.Map as Java_Map
import Standard.Base.Data.Map.Map
import Standard.Base.Meta

main =
    jmap = Java_Map.of "A" 1 "B" 2 "C" 3
    r1 = Meta.type_of jmap
    r2 = jmap.is_a Java_Map
    r3 = jmap.is_a Map
    [r1, r2, r3]

returns [java.util.ImmutableCollections$MapN, True, False] - e.g. Java_Map is not Map, but the conversion works!

Some consistency is needed:

@JaroslavTulach JaroslavTulach added the CI: No changelog needed Do not require a changelog entry for this PR. label May 23, 2023
@JaroslavTulach JaroslavTulach force-pushed the wip/jtulach/AscribedTypes_6682 branch from dcc3e87 to e2c7ed5 Compare May 23, 2023 10:09
@JaroslavTulach
Copy link
Member Author

Failure after recent integration:

Copy link
Collaborator

@hubertp hubertp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM minus some stye changes

case fn: IR.Function => verifyAscribedArguments(fn.arguments)
case _ => newMethod
}
arr.getClass()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workaround for annoying "Unused variable warning" that fails Scala compilation. Left over some experiments.

private final ConditionProfile defaultingProfile = ConditionProfile.createCountingProfile();
// XXX: Type in a Node is wrong!!!!!
private final Type expectedType;
Copy link
Member Author

@JaroslavTulach JaroslavTulach May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great and I'm glad we are finally getting some type checking. That should make it so much easier to find many errors! 🎉 🎉

If I understand correctly from our catch-up, @jdunkerley, you suggested that we should raise these Type_Errors as dataflow errors, not panics? Right?

(I think the rationale was that we don't want exposing panics to the user, at least too often, and these errors are going to happen relatively often if the user does something wrong; so indeed a dataflow error may be safer here).

On other hand I'd be actually cautious. Personally I do prefer the current state of things - panicking on type error. That is because for effectful computations (not common in the GUI but pretty common in our libraries), if the type error is a dataflow error, we may not notice it (if we discard the result) - the only but pretty important difference will be that the function did not actually run. This may lead to hard to debug errors.

I'm quite worried that by using dataflow errors instead of panics, we can make it harder to debug issues in our libraries. If it is a necessary sacrifice, so be it. But is it really necessary?

So I'd propose that maybe we should integrate this change keeping these panics as-is for now and we can see how this plays out in the IDE and if it causes any trouble? And if it causes problems we can revisit turning these into dataflow errors in a separate PR?

@radeusgd
Copy link
Member

What do you think about this type error, @radeusgd? Aggregate_Column.Max accepts Integer or Text and something is feeding it with Column - is it good, bad, what should be the fix?

One comment about it.

The error is:

Execution finished with an error: Type error: expected `Argument #1` to be [Text, Integer], but got Column.

Can we do something to make Argument #1 actually be the name of the constructor argument?? That would be much more readable than a number.

Moreover, this is a bit of a nitpick, but to be consistent with the 'Enso-style', we have a proper syntax for sum types, so why write [Text, Integer] if it is not too much effort to write Text | Integer - which will be actually much closer to what was actually in the type signature.

So overall, I think it would be great if we can make the error say:

Execution finished with an error: Type error: expected `column` to be Text | Integer, but got Column.

Can we have that? 🙂

@radeusgd
Copy link
Member

radeusgd commented May 26, 2023

And this inspires me that we should have a few more tests, just to make sure the logic works (if something doesn't work and cannot be fixed in this PR, please let's still add a test, mark it as pending and create a follow up ticket)

I'd like to see a test for:

foo (x : ((Integer | Decimal) | (Integer | Decimal) | (Decimal | Integer) | Integer)) = x+x

with calls

foo 42
foo 1.5
foo "TEXT"

The | operator can be arbitrarily nested - I'm wondering if the current typechecker can correctly deal with that and what the expected type in the printed error will look like (so ideally it would be nice if the test also inspected the type error message).

Also I'm not sure if we have a test for Any, but definitely worth having some:

foo (x : Any) = x
bar (x : Text|Any) = x
...
# These should all work:
foo 42
foo "X"
bar 42
bar "X"
bar Nothing

@GregoryTravis
Copy link
Contributor

I agree with treating type errors as Panics for now, and we'll see how it goes. It's tricky to have one policy for everything, because we really want our library internals to always type-check perfectly, and we don't want to risk having type errors lost because a dataflow error was missed. But we don't want end users to experience this strictness.

@JaroslavTulach
Copy link
Member Author

There is five [FAILED] tests in the latest test run:

Example_Tests are failing because of:

 - [FAILED] should provide various example tables [116ms]
         Reason: An unexpected panic was thrown: (Type_Error.Error Line_Ending_Style Infer 'Argument #9')
         at <enso> Delimited_Format.Delimited(Internal)
         at <enso> case_branch(/Standard/Table/2023.2.1-dev/src/Delimited/Delimited_Format.enso:60:23-52)
         at <enso> Delimited_Format.type.for_file_read(Standard/Table/2023.2.1-dev/src/Delimited/Delimited_Format.enso:59-63)

addressed in 31dff3c

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented May 28, 2023

There are many failures in Table like failure:

- [FAILED] by an Is_In check, on various types of columns [8ms]
          Reason: An unexpected panic was thrown: (Type_Error.Error Vector (In-Memory Column str) 'Argument #1')
          at <enso> Filter_Condition.Is_In(Internal)
          at <enso> Table_Spec.spec<arg-2>(/runner/_work/enso/enso/test/Table_Tests/src/In_Memory/Table_Spec.enso:752:46-82)
          at <enso> Table_Spec.spec<arg-0>(/runner/_work/enso/enso/test/Table_Tests/src/In_Memory/Table_Spec.enso:752:13-83)
          at <enso> Table_Spec.spec<arg-0>(/runner/_work/enso/enso/test/Table_Tests/src/In_Memory/Table_Spec.enso:752:13-95)

I don't have expertise to know what to do with them. I am adding Any where I can, but that's probably not the right direction. Library guys, your help is needed!

Update on May 30: Done in 5ca27c6

@JaroslavTulach
Copy link
Member Author

Execution finished with an error: Type error: expected column to be Text | Integer, but got Column.

OK, added that as a TCR - a checkbox in the description of the PR.

inspires me that we should have a few more tests

Feel free to add more tests into my branch. More passing tests is always helpful. (Correct) tests that fail would be useful too.

maybe we should integrate this change keeping these panics as-is

@radeusgd, the biggest problem of the PR aren't missing test. We wanted these ascribed types to work as an opt-in and they do - for methods. However the syntax is already heavily used for constructors and thus they are opt-out for constructors!

There are many failures in Table

I don't have expertise to fix them. Ideally I need you to go thru the runtime type errors and fix them. Or we need to find a way to make these constructor checks opt-in too.

@radeusgd
Copy link
Member

Regarding the issues with the tests that you mentioned, they all are related. This is a bit of an issue that we were trying to express stuff that currently Enso does not allow us to. So we either need to add the |Any workaround as you did, or opt-out of typechecking OR figure our something better.

Let me first describe the concrete problems we saw:

  1. Is_In is supposed to take a Vector for Vector.filter operations, but for Table.filter it needs to accept a Vector as well as a Column. We agreed to expose only Vector in type signature, as Filter_Condition is defined in Base from which the Column type is not accessible - so we just describe the situation in the comment. We do not want to introduce a dependency on Table to Base - Base is the basic library that should not have any dependencies.
  2. I'm not sure what happened with Unsupported_Output_Type, TODO we will need to investigate it. This one may be a simple case, possibly.
  3. The Connection | Any case. Here we are simply lacking some form of common type expression. E.g. interfaces could solve this. We have a generic Connection type and want to show on methods/ctors that they accept any connection. But then we have particular connection implementations - Postgres_Connection, SQLite_Connection etc. that have DB-specific behaviour. Currently there is no way to express a common type, so we just define a common type and children with similar names and pretend that Postgres_Connection <: Connection but in reality that is not the case as it's impossible
    to express such a notion in Enso.
  4. The last one is the Aggregate_Columns. I'm tracking a fix for them in Fix types of Aggregate_Column #6866. But the |Column should be enough of a workaround. Turns out it isn't and you need to add |Any. The reason is simple and mostly the same as (3). Enso lacks interfaces. We currently have 2 column types - one from the Table library (for in-memory) and one from Database. They both (are meant to) provide the same public API, but have different underlying implementations. So we need a way to express a common Column type but currently there is none.

One ugly workaround for 4 could be to do a sum type: Table.Column | Database.Column. But that will not work either, because we do not want to introduce a circular dependency of Table depending on Database.

Ideally, we need some kind of better abstraction like interfaces or typeclasses with which we can indicate that both Table.Column and Database.Column implement some kind of common Column API, defined in the Table library. That would also fix the Connection issue from (3).

Alternatively, I think a solution that is available to us without modifying the language, is to create a facade Column type (defined in Table) that has only one field, untyped column_implementation. The facade's only job would be to delegate every operation to column_implementation. Thus we'd have only a single column type but allow for multiple implementations. I personally like this solution. It has the benefit that we can easily check if a given thing is a column regardless of what backend it comes from (as they have the same user-facing type). Then, a very big (IMO) advantage is that this would be the place where we 'store' the API docs for Column operations - since there is just one public entry point - that's the place where the doc comments need to reside in. Currently, for every Column operation we actually need to duplicate all documentation between both implementations (Table.Column and Database.Column). That is quite annoying, because every change of the docs in one backend needs to be reflected in the other one. It is easy to get out of sync. On the other hand, the current state gives us the ability to add some backend-specific documentation, that shows e.g. only for DB columns. However, IMO this is not really a good thing, as if we want to provide a uniform API, the documentation should be synchronized too.

So as for (3) and (4), I'd suggest a rewrite to this form. IMO it would actually improve the clarity of our libraries. @jdunkerley what do you think about that? I remember that you were a bit skeptical about this idea some time ago, although I'm not sure exactly which parts seemed problematic?

As for (1) - I don't know what the right fix is, we probably don't want to move Column to Base. IMO ideally I'd add some parent interface like Can_Be_Used_As_Filter which will be implemented by both Vector and Column, but we currently have no language capacity to do this.

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented May 29, 2023

... some kind of better abstraction like interfaces or typeclasses ... That would also fix the Connection issue from (3).

This is the way to implement Type classes manually in current Enso. All one needs to do is to have two argument Value with data & implementation.

With Set.from (v: Vector) conversions one also gets a nice way to hide factory methods.

assertEquals("Can read ten", 10, some.getMember("unwrap").asInt());
var lazy = module.invokeMember("eval_expression", "Maybe.Some (2 * 5)");
assertEquals("Can read first time ", 10, lazy.getMember("unwrap").asInt());
assertEquals("Can read second time", 10, lazy.getMember("unwrap").asInt());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further work tracked as #6883

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented May 30, 2023

Please add a | Column to the signature and possibly a TODO, ... issue to follow up on this.
you need to add |Any

@radeusgd you can use:

$ curl -L https://github.com/enso-org/enso/pull/6790.patch | grep "| *Any"

to identify the places where I had to opt-out from the runtime type checks. Currently I get 34 lines of output.

@JaroslavTulach JaroslavTulach removed the CI: No changelog needed Do not require a changelog entry for this PR. label May 30, 2023
@jdunkerley
Copy link
Member

So as for (3) and (4), I'd suggest a rewrite to this form. IMO it would actually improve the clarity of our libraries. @jdunkerley what do you think about that? I remember that you were a bit skeptical about this idea some time ago, although I'm not sure exactly which parts seemed problematic?

If we can implement a facade type cleanly (not using the map builder set up of Dialect) - and without a massive amount of time as we are very resource light at the moment and velocity is low - am happy for us to prototype for Column.

For 3 (and also things requiring "File_Format" types), implementing them as a "Connection" with overrides removes the possibility of having type specific methods. Likewise it made the implementation far less clear as you end up with needing to have things done the way the Dialect is built with maps building up functions and plugging bits and bobs in here and there. I spent a fair while playing with this when we started on Postgres/SQLite connections.

For the facade of these type, it needs to be straight forward to add new implementations for things and allow for specific methods for the end types. I think this means a facade wouldn't work here and we are back on needing interfaces/typeclasses (which is the correct solution here).

As for 4, we knew that one would bite us when we did it. I think we can restructure the resolve_aggregate easily to return the resolved columns either as a differet type or as a pair with the original Aggregate_Column.

@JaroslavTulach
Copy link
Member Author

I can write fn (x:Number) function and I get early error when I call it in the IDE:

Type check in the IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clearly report type errors on API boundary
6 participants