-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Local Type Inference approach to typecheck polymorphic calls #564
Conversation
The function allows it, only the spec didn't.
Consider a function spec: -spec negate(bool()) -> bool(); (integer()) -> integer(). Sometimes, we want to make a single function type out of these two clauses. Using an outer union like fun((bool()) -> bool()) | fun((integer()) -> integer()) leads to a loss of precision, because this type says that we either have a function that can take a bool, or we have a function that can take an integer (and we do not know which case it is). This means that we cannot apply a function of this type to anything unless we assert type or something like that. But we know for sure that it can take both, bools and integers! Therefore, using an inner union like fun((bool() | integer()) -> bool() | integer()) seems better to me as this way we know that it can take both. Of course, an ideal solution would be to use intersection types like fun((bool()) -> bool()) & fun((integer()) -> integer()), but Erlang doesn't have them in its syntax and so don't we in Gradualizer.
Union function types (`fun_ty_union`) do not seem to work. These simple examples should be perfectly valid. I think it throws that error because `expect_fun_type/3` inputs a list of function types (i.e., a clause), containing only a single element (the union of two function types). This leads to expecting an intersection function type later on (which is wrong). Note that these union function types aren't that much useful in practice as when you want to call them, all your argument types must be subtypes of parameter types of all the possible function types. That is, when you have a function of the type fun((integer()) -> integer()) | fun((atom()) -> atom()) then you can only call it with `none()` as we don't know whether the function will accept integers or bools and no other type is a subtype of `integer()` and `atom()`.
@zuiderkwast @erszcz I know you're busy, but do you think you'll have time for a code review? |
@xxdavid Sorry for the lack of responsiveness. Life got surprisingly busy the last few months. I can imagine it being quite frustrating given the terrific effort you've put into this work. That being said, I've just gone through the wiki description - it's excellent! |
No need to apologize! :) It's a volunteer project after all and I can imagine life getting busy. Thank you for your kind words and for planning to go over it. I'll stay tuned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is a fantastic piece of work, it's truly impressive!
I've left a few superficial comments, there's nothing serious I could find.
I've also tried it out on some examples and the new and improved error messages are great!
We use the Local Type Argument Synthesis algorithm from https://www.cis.upenn.edu/~bcpierce/papers/lti-toplas.pdf (a bit extended for our use case). This replaces the global constraint generation and constraint solver. It should result in a much more precise type error messages (pinpointing the exact location, not just the function) and more streamlined code (refactor will be made in a subsequent commit). For more details, see the original paper or https://github.com/josefs/Gradualizer/wiki/Polymorphism#typechecking-polymorphic-calls
The following type spec from gradualizer_db uses a type variables (K) as a bound for another type variable (Key): -spec add_entries_to_map([{Key, Value}], #{K => V}) -> #{K => V} when Key :: K, Value :: V. Until now, we converted all Ks and Vs to rigid vars, but then Key and Value got replaced to plain vars K and S in solve_bounds/2. We must therefore convert the type variables to rigid type variables *after* the call to solve_bounds/2, which is what this commit does.
Flexible type variables may now appear only when typechecking a polymorphic call, in which case they are only in the type of that polymorphic function. No other expression may of be a type that contains a flexible type variable. Constraints are thus needed only in the subtype_with_constraints/3 function that gets called only from type_check_poly_call/4. This means that nothing but the subtyping functions and the type_check_poly_call/4 function needs to deal with constraints. Therefore, I am removing constraints from all the other places. I believe this is a great simplification of the code. It also trivially solves all the TODOs like "Don't drop the constraints". In such a huge refactoring, Gradualizer has proved to be an invaluable help. Via self-gradualization, it discovered numerous errors that I didn't notice. Hopefully, there are no undetected errors left. All tests are passing, the self-gradualization does not crash and 100_000 proptests have also passed.
And also remove the assert_type uses that were needed because of this error.
This quite a trivial change enables us to typecheck polymorphic functions like -spec generic_hd([A,...] | {A, any()}) -> A. or functions from Elixir's Enum module that use protocol polymorphism in addition to the parametric polymorphism: -spec enum_map([A] | map(), fun ((A) -> B)) -> [B]. which is really handy for Gradient and other Elixir frontends! Support for intersection polymorphic functions like -spec generic_hd([A,...]) -> A; ({A, any()}) -> A. is of course still missing. But at least some of these can now be typechecked when using a spec with the unions.
Since now we do not generate type variables, they don't need to be strings anymore.
It is already there since December 2022 (unless "subtype polymorphism" means bounded quantification).
They probably originate from the `fmt_location` description that uses hyphens to list possible values.
Removes the hyphens and add `--solve_constraints` flag.
lists:uniq/1 was introduced in OTP 25.
1ea043c
to
aac8000
Compare
Thank you very much for the review! :) I've fixed all the issues and resolved the threads. Can I merge this pull request now? Or would you also like to take a look, @zuiderkwast? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have time to review properly, but I agree it's impressive.
subtype(Ty1, Ty2, Env) -> | ||
case subtype_with_constraints(Ty1, Ty2, Env) of | ||
{true, _Cs} -> | ||
%% There shouldn't be any constraint really. | ||
%% We could assert that the constraints are empty but this could crash | ||
%% Gradualizer even for end users if there was some error in our code, | ||
%% it's probably safer not to assert and just drop the constraints | ||
true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug logging in verbose mode, would that be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good idea! I'll try that when I get to some other work on Gradualizer.
I think it's good to merge now 👍 BTW, @xxdavid, it was great to see you in person on Lambda Days! |
@erszcz Thanks for merging the PR, and it was really nice to see you, too! |
Hi!
After months of working on this, I finally come up with a PR that implements the ideas captured in #553. It behaves in the same way as I described in #553 (comment), I only changed the second error message a tiny bit.
The main changes are in the commit b00cadd. It implements the new approach to typechecking polymorphic calls based on Local Type Inference. I think it is quite straightforward but I already know the algorithm well so I am probably biased. However, it should be at least much simpler than the original constraint solver. I tried to document it in the wiki (together with the rigid type variables), so you may read it as documentation to the code. I'm also writing a thesis about this change, so I will link it to the wiki when it's out, to provide another source of documentation.
The main benefit of this change should be error messages that are understandable even when polymorphism is involved. Only in a few cases, a more complex error message shows up but even then, it should be clear what's going on. I think that if this new approach proves successful, we can even enable
--solve_constraints
by default in the future.As a byproduct, all the "shape limitations" of the original constraint solver are gone (this could have been also solved by fixing the constraint generation). A type variable can now be instantiated to both a tuple and an atom without any problems.
Other big changes are in a3029e1. In this commit, I remove the constraints and clauses for flexible type variables from most of the code. Constraints are now generated and propagated only in functions related to typechecking polymorphic calls and functions related to subtyping. This should make the code simpler and easier to read. It also trivially solves all but one "don't drop the constraints" TODOs because constraints are now combined only in the subtyping functions and in
type_check_poly_call/4
. (Please do not squash commits when merging the PR as this commit would make the resulting commit unreadable.)I also fixed some things along the way (fc432f2, 2323f10, e9a0afa) and added a lot of tests.
While the new approach is better in some cases (like the
minus/1
inpoly_fail.erl
), it is also weaker in some other cases. I see the biggest limitation in that the type of anonymous functions passed to polymorphic functions is now inferred viatype_check_expr/2
and nottype_check_expr_in/2
as before, and thus some errors may not be discovered. This can, however, be solved in the future by solving the constraints gathered from all other arguments first and then typechecking the anonymous function using the information from the solved constraints. But I think it is better to have understandable errors than just more of them. Once we have this, let's then focus on inferring the type of anonymous functions passed to polymorphic functions.