Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update trunk AST #407

Merged
merged 4 commits into from
Apr 3, 2023
Merged

Conversation

shym
Copy link
Contributor

@shym shym commented Mar 27, 2023

ocaml/ocaml#12119 modified the parsetree on trunk.
With this PR, ppxlib compiles with trunk and these changes in this PR are enough for our use case. As this use case doesn’t exercise the particular syntax for which parsing changed, those changes might not be enough even for an attempt to get the minimal support for 5.0 code on trunk.
In particular, dune runtest reports a few segfaults.

This change is semantically incorrect but may be enough to support trunk
for 5.0 code
@panglesd
Copy link
Collaborator

Thanks for the PR! Indeed, now that we have a "trunk" branch we should be reactive with parsetree changes...

In this case, I don't think the PR is enough though:

I'll push an updated version fixing those two points soon, on this PR if you don't mind.

About the segfaults in dune runtest, I can confirm. Did you check they are really related to this PR? I'll try to see if they happen with ppxlib#trunk-support on ocaml#trunk-before-the-parsetree-change. They only happen when running the ppx through the -ppx option.

@pitag-ha
Copy link
Member

@shym, thanks for opening the PR! It's great that we have help maintaining this branch!

@panglesd, thanks for looking into the segfaults!

About the migrations, @panglesd: We are in a new situation here. As @shym pointed out, the compiler parsetree change we're adapting to here is a new parsing feature, not a new syntax feature as usual. So we have two options:

  1. We can merge what @shym has done (once we know more about the segfaults), i.e. the PPX driver reverts the change that was made in the compiler: if the code contains syntax, for which the parsing has changed, the PPX driver successfully returns a parsetree, in which the new parsing feature was dropped
  2. We can do what we usually do: if the new feature is encountered, the PPX driver fails.

The downside of 1. is that when using this Ppxlib, people will test trunk only up to this particular parsing change (since Ppxlib reverts that change). The downside of 2. is that Ppxlib stays incompatible with a lot of projects: concretely, with all projects that use the syntax for which the parsing has changed (this is different to the usual situation of a new syntax feature). I'd probably go for @shym's solution. However, I don't know how much we miss by "silently" keeping people from testing this one compiler change. @shym, what is your opinion on this? Was your choice deliberate/conscious?

@shym
Copy link
Contributor Author

shym commented Mar 28, 2023

My choice was not so elaborate, as I was consciously just tricking the typechecker into accepting to build on trunk 😄
If I understand correctly, a proper implementation of your 1. (ie reverting the parse change) would require doing what @panglesd was suggesting, to transform how 5.1 parsed the code into how it would have been parsed by 5.0, and conversely, wouldn’t it?
The commit of this PR was meant as a first step for 1. As the OCaml change is about shifting some code from the parser to the typechecker, I think ppxlib providing a uniform interface to the parsetree (1) is better than rejecting all those programs (2).
Do not hesitate to update this PR!

@pitag-ha
Copy link
Member

would require doing what @panglesd was suggesting, to transform how 5.1 parsed the code into how it would have been parsed by 5.0, and conversely, wouldn’t it?

Oh, is "and conversely" possible? My understanding was that the parsetree was enriched in a way that e.g. (copying Octachron's example from the PR)

let f: type a b c. a -> b -> c = fun x y z -> ...

and

let f: 'a 'b 'c. 'a -> 'b -> 'c = (fun (type a) (type b) (type c) x y z -> ... : a -> b -> c)

yield the same parsetree before the change and different parsetrees after the change. In that case, a perfect mapping from "how it would have been parsed by 5.0" to "how 5.1 parsed it" wouldn't be possible. However, I've only superficially skimmed the parsetree change we're talking about. @panglesd , you've looked into it in more detail, haven't you? Could you let me know if this understanding is right?

I think ppxlib providing a uniform interface to the parsetree (1) is better than rejecting all those programs (2).

👍

@panglesd
Copy link
Collaborator

panglesd commented Mar 28, 2023

Yes. So, in all OCaml version,

let f: type a b c. a -> b -> c = fun x y -> ...

is supposed to be equivalent to

let f: 'a 'b 'c. 'a -> 'b -> 'c = (fun (type a) (type b) (type c) -> (fun x y -> ... : a -> b -> c))

In OCaml 5.0.0 and below, the first version was directly transformed into the second when building the parsetree. So, parsing one or the other would yield the same parsetree.

However, there is also the need to pretty-print the parsetree, and there are some efforts in the <=5.0 pretty-printer to desugar this, to recover the original code. We do not want that:

let f : int -> int = fun x -> x (*short form *)

is parsed and pretty-printed into

let (f : int -> int) = (fun x -> x : int -> int) (* long form *)

The pretty-printer tries to distinguish whether the parsetree comes from a parsed short form or from a parsed long form. For polymorphic types, it's not really possible: so the pretty-printer assumes it comes from the short form whenever that's compatible with the parsetree it has. For non-polymorphic ones, it can distinguish between the two forms, since they are actually encoded differently: the short form uses an empty Ptyp_poly (which cannot come from a parsed long form).

In OCaml 5.1 and above, the two forms are represented differently in the parsetree, (but they are equivalent later on). This has two advantages:

  • the pretty-printer knows which form was written
  • the invariant that there is no empty Ptyp_poly is enforced.

So, migrating from 5.1 to 5.0 is easy, since we know in which situation we are.
Migrating from 5.0 to 5.1 is more involved: we come from a parsetree where two situation are encoded in the same way, to a parsetree where they are encoded differently. However:

  • It actually does not matter too much, since both parsetree are supposed to be functionnally equivalent ; as long as we don't introduce an empty Ptyp_poly, which is easy to do. It still matters a bit, since a PPX author might (rightfully) not treat the two versions as equivalent.
  • We can reuse the code of the pretty-printer to guess in which situation we are, and encode the right form of 5.1. Relevant parts is here: is_desugared_gadt answers in which situation we are.

@panglesd
Copy link
Collaborator

I added the two migrations. Are there some tests checking that the migration are sound?

I haven't looked at the segfaults yet...

Copy link
Member

@pitag-ha pitag-ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks a lot for taking the time to explain the compiler change in so much detail, @panglesd! So, from what I understand, we do have the two options 1. and 2. I mentioned above, and in the case of parsing features like here, we all agree that 1. is the better option. It is important, though, that we're aware that we're changing the compiler behavior in a way.

Thanks for implementing this in a solid way! I haven't looked through the details, but it seems right. Is there any part you have doubts about?

Are there some tests checking that the migration are sound?

That depends on what you call "sound". Do you mean a test checking that first upward migrating and then downward migrating yields the identity?

let typ_poly =
{
typ with
ptyp_attributes = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you remove the attributes here and everywhere else? Are they duplicated, and you're de-duplicating them?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is because my code is misleading... I hesitated writing it like this and probably should not, but the result was shorter... (very bad reason)

I'm actually creating a new node, with the same location value as typ. I'm modifying all other fields (attributes and desc), and as you can see, ptyp is used in desc: so the attributes are just in the original node.

Now that I checked, it seems that some of the locations in the crafted nodes should be "ghosted" to make the test "parsing with 5.1 then migrating to 5.0 == parsing with 5.0" pass. But not all of them...

If there is a convenient way to test this, and get a diff, that would be great!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for explaining!

About the locations: to make sure that the migrated parsetree fulfills the two location invariants "non-ghost children are nested" and "non-ghost siblings are disjoint and ordered", you could run a trivial ppxlib driver with -perform_locations_check. With "trivial" driver I mean one with no PPXs linked. However, I think that those two invariants should be fulfilled here anyways. You'll know better than me! I don't think we have more location checks than that.

If there is a convenient way to test this, and get a diff, that would be great!

You might already have thought about the following, but just in case: The following should work I think. You could write a few-liner program using Ppxlib, which parses a file and pprints its ast. Then, to get the "parsing with 5.1 then migrating to 5.0" AST, you can compile that with OCaml trunk and this Ppxlib branch. And for the "parsing with 5.0", you can compile it with OCaml 5.0 and the main Ppxlib branch. Btw, if we want to take more time at some point, it would also be valuable to make a CI job out of this!

(About the "parsing with 5.1 then migrating to 5.0" part: When pprinting an AST, Ppxlib pprints the AST in its Ppxlib version, i.e. currently 5.0/4.14. In my opinion, that's not an ideal choice of ours, but that's a different topic...Also, in this particular situation it comes in nice.)

@panglesd
Copy link
Collaborator

if the code contains syntax, for which the parsing has changed, PPX driver successfully returns a parsetree, in which the new parsing feature was dropped

I'm not sure what you mean by "feature" the change is just a change in the way some information is encoded in the parsetree: no new functionallity/syntax!
So it's not that the feature is dropped (in the sense of forgotten): the way the information is encoded has to be translated from one to the other.

I don't think ppxlib failing whenever there is a let-binding with a type constraint (such as in let x : int = 5) would be appreciated in the PPX ecosystem ;) So I don't think 2. is an option!

It is important, though, that we're aware that we're changing the compiler behavior in a way.

Could you precise? I'm not sure to understand, since I wouldn't say we are changing the compiler behaviour!

That depends on what you call "sound". Do you mean a test checking that first upward migrating and then downward migrating yields the identity?

That would be one good test, but I would also like to test that parsing with 5.1 is the same as parsing with 5.0 and then migrating to 5.1 ; and similarly, that parsing with 5.1 and migrating to 5.0 is the same as parsing with 5.0!

@shym
Copy link
Contributor Author

shym commented Mar 29, 2023

Regarding the segfault, it indeed appears exactly at the commit that brings the change.
It is not surprising: as the compiler uses the same magic for 5.0 and trunk, Ast_io.read identifies the file as an 5.0 AST and so the type confusion naturally leads to segfault when searching the AST (when mistakenly calling Migrate_500_501 to update it to 5.1 since it was identified as 5.0, I think).
A solution could be for upstream to use a dedicated “unstable” magic on trunk (or already the future magic, they are pretty predictable, if I’m not mistaken ;-), rather than the latest stable magic.

When the magic numbers aren't unique, before this commit, the binary
AST reader would read into the first AST version matching the magic
number. With this commit, it takes the last such version. That's important
for trunk support.

Signed-off-by: Sonja Heinze <[email protected]>
@pitag-ha
Copy link
Member

as the compiler uses the same magic for 5.0 and trunk, Ast_io.read identifies the file as an 5.0 AST and so the type confusion naturally leads to segfault when searching the AST

Thanks, @shym , that's a very very good point I've just pushed a one-liner change that should fix the problem to a separate branch: https://github.com/pitag-ha/ppxlib/tree/shym_trunk_support_update. I don't have a 5.1 switch yet, so I haven't tested it yet, but I'm pretty sure that it should fix the segfaults. I'll test it tomorrow and if it does what I expect, I'll push it to your branch.

@pitag-ha
Copy link
Member

pitag-ha commented Mar 29, 2023

@panglesd , about your answer:

I'm not sure what you mean by "feature" the change is just a change in the way some information is encoded in the parsetree: no new functionallity/syntax!

I was calling the change a "parsing feature"/"parsetree feature", because it enriches the AST by making it more concrete: with the change, the parsetree distinguishes syntax that before wasn't distinguished. I was explicitly saying that this is different from syntax features. We can also simply call it "parsing change" instead of "parsing feature", though.

I don't think ppxlib failing whenever there is a let-binding with a type constraint (such as in let x : int = 5) would be appreciated in the PPX ecosystem ;)

Hah - that's a good point! I was assuming the new pvb_constraint field was only used for type constraints on polymorphic types.

Start edit:
As pointed out in a comment below, it's not actually true that with option 2, ppxlib would fail whenever there is a let binding with type constraint. At least, that's not what I meant with option 2. It would only fail in cases such as

let f: 'a 'b 'c. 'a -> 'b -> 'c = (fun (type a) (type b) (type c) -> (fun x y -> ... : a -> b -> c))

Finish edit

Could you precise? I'm not sure to understand, since I wouldn't say we are changing the compiler behaviour!

We're changing the parsing behavior: When using the trivial Ppxlib driver (i.e. the one without any PPX linked), one would expect the identity driver, and usually, that's the case. However, on this branch, the output parsetree can be different from the input parsetree. So, apart from expanding the AST, the driver also modifies the AST. If you want, I can give you an example!

That would be one good test, but I would also like to test that parsing with 5.1 is the same as parsing with 5.0 and then migrating to 5.1

Are you sure that's the case? If that's the case, what I said in the last paragraph might be different.

and similarly, that parsing with 5.1 and migrating to 5.0 is the same as parsing with 5.0!

Yes, I agree that that'd be great to test! I've pointed out in the code comment above how we could do that. I don't know how much time you want to spend on this, though.

@shym
Copy link
Contributor Author

shym commented Mar 29, 2023

@pitag-ha:

I haven't tested it yet, but I'm pretty sure that it should fix the segfaults

Tested, approved and pushed here 😄

@panglesd
Copy link
Collaborator

Are you sure that's the case? If that's the case, what I said in the last paragraph might be different.

The only case when 5.0 -> 5.1 -> 5.0 is not the identity is for:

let f: 'a 'b 'c. 'a -> 'b -> 'c = (fun (type a) (type b) (type c) -> (fun x y -> ... : a -> b -> c))

(the type has to be explicitly polymorphic, otherwise the migration is the identity)

I'm pretty sure such code appear in no OCaml codebase! But I agree that, if we are breaking a pre-existing invariant about AST migration, we should be cautious about this (we haven't really a choice...)
Also, even though it is not strictly the identity, it is the identity modulo some equivalence relation of "representing the same thing". So I think it is fine!

Nice fix for the segfaults!

@shym
Copy link
Contributor Author

shym commented Mar 29, 2023

Oh! By the way, dune runtest on that branch now segfaults on 5.0, of course.
Maybe another solution would be to bias the search towards the current version:

diff --git c/ast/versions.ml w/ast/versions.ml
index 8c3456dcd9..e5e71b8fe8 100644
--- c/ast/versions.ml
+++ w/ast/versions.ml
@@ -587,5 +587,5 @@ module Find_version = struct
           else
             loop tail
     in
-    loop all_versions
+    loop @@ (module OCaml_current : OCaml_version) :: all_versions
 end

With that I get no segfault on either version.

@pitag-ha
Copy link
Member

Sounds like a very good idea, @shym! I'd do that and soften the strict "ocaml" {= "5.1.0"} constraint in the .opam and dune-project file of this branch and get rid of the paragraph I wrote on this branch's README about the branch being incompatible with stable compiler versions.

@pitag-ha
Copy link
Member

@panglesd, to come back to our conversation about this kind of parsetree change being a new situation for us: I think we were more or less agreeing all way long, but focussing on different perspectives. Both perspectives are important and I'm very glad we've had this long discussion!

To clarify one last misunderstanding: Given that the only case where 5.1 -> 5.0 -> 5.1 is not the identity is

let f: 'a 'b 'c. 'a -> 'b -> 'c = (fun (type a) (type b) (type c) -> (fun x y -> ... : a -> b -> c))

that case would also be the only case in which we would error if we went for option 2. We would not error for every let binding with type constraint!

Copy link
Member

@pitag-ha pitag-ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's anything blocking this from being merged, is there? I'd merge this so that people have a working trunk-support branch again. We can (and should) fix the potential location problem and add the tests we've talked about once it's merged. Does that sound good to you, @panglesd?

@avsm
Copy link

avsm commented Apr 1, 2023

This branch is working excellently from me on current OCaml trunk and unblocked my development. Thanks!

@panglesd
Copy link
Collaborator

panglesd commented Apr 1, 2023

Does that sound good to you

Yes, that sounds good to me! (Especially since I won't be available to work on this next week, let's not block trunk-support)

@shym
Copy link
Contributor Author

shym commented Apr 3, 2023

My proposal for fixing the test suite on the stable compiler is available in my shym:extend-support branch. I’m fine with either including in this PR (as mostly reviewed already) or postponing it to another PR, along with the other changes suggested.

@pitag-ha pitag-ha merged commit 98e1f9f into ocaml-ppx:trunk-support Apr 3, 2023
@pitag-ha
Copy link
Member

pitag-ha commented Apr 3, 2023

Thank you, @shym! Could you open a PR to main with that change? Once we've merged it into main, I'd rebase this branch, and make the changes to the opam file and README of this branch I mentioned before (to reflect compatibility with stable OCaml versions). Does that sound good to you?

@shym shym deleted the trunk-pvb_constraint branch April 3, 2023 12:07
@shym
Copy link
Contributor Author

shym commented Apr 3, 2023

Perfectly good. I’ve opened #409 with that commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants