Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMI and PPX generation ordering #193

Closed
xguerin opened this issue Jul 19, 2017 · 44 comments · Fixed by ocaml/opam-repository#12442
Closed

CMI and PPX generation ordering #193

xguerin opened this issue Jul 19, 2017 · 44 comments · Fixed by ocaml/opam-repository#12442

Comments

@xguerin
Copy link

xguerin commented Jul 19, 2017

Hello there,

I am migrating ppx_import to OMP and jbuilder. This extension uses existing cmi files to look-up type definitions present in a local mli and load them into the corresponding ml.

Beside the fact that migrate_driver does not yet accept -L CLI arguments (an attempt at addressing this has been submitted here: ocaml-ppx/ocaml-migrate-parsetree#22), it looks like jbuilder run the pps statements on both mli and ml files before compiling mli files.

Would it be possible to change the flow dependencies so that PPX executions on ml files are run after the cmi for the corresponding mli has been generated ? Concurrently, pending acceptation of the OMP patch, would it be possible to add the -L and -I flags to the PPX call ?

Thanks,

@ghost
Copy link

ghost commented Jul 19, 2017

Does ppx_import calls the OCaml typer? I think it's not only this single cmi that we need to add as a dependency but as well the cmi of all the dependencies. In the end performances might suffer.

@xguerin
Copy link
Author

xguerin commented Jul 19, 2017

Yes. In a nutshell, ppx_import:

  1. gets the module name of the type to import
  2. scan the directories in the load_path for a cmi matching that name (uncapitalized)
  3. call Cmi_format.read_cmi to load that file

So we don't need to reference each cmi per se, only pass the proper load path to the ppx driver (which I believe can be deduced from the libraries statement).

@ghost
Copy link

ghost commented Jul 19, 2017

Well, if the ppx might end up reading arbitrary cmi files, they need to be declared as dependencies of the action running the ppx.

Setting the load path is not going to be useful if it's not populated with the cmi files yet.

@xguerin
Copy link
Author

xguerin commented Jul 19, 2017

That was my original thought. However, I tried adding he cmi in the preprocessor_deps statement and jbuilder complains about circular dependencies. I'll send a terminal excerpt in a bit.

@ghost
Copy link

ghost commented Jul 19, 2017

Yes, that's expected. The pp deps are common to all files and the cmi are built from the preprocessed files.

In the current world it's not possible to run the typer during the preprocessing phase with jbuilder. Personally I think it's cleaner this way. We could change things to make it work. It's an invasive change and we'd need to make sure that it doesn't degrade performances, so it might be a lot of work.

@xguerin
Copy link
Author

xguerin commented Jul 19, 2017

The pp deps are common to all files and the cmi are built from the preprocessed files.

The makes complete sense.

However I can't help but wonder about the reasoning behind running the pps on ml and mli first and then the run compilation steps. Some kind of semantic barrier ?

Surely there is no direct dependency between the preprocessing of interface and implementation files, so why not use a finer grained level of parallelism in that data flow graph ?

Personally I think it's cleaner this way.

Probably, it's hard to judge from my venture point. Is that a working hypothesis behind the tool design ?

It's just a bit disappointing as it breaks a perfectly good extension that works with any other builder under the sun. But jbuilder is marketed as being opinionated so I should probably not be too surprised ;)

In any case, thanks for offering to take a look. I'll try to get more familiar with the code to get a better understanding.

@ghost
Copy link

ghost commented Jul 20, 2017

However I can't help but wonder about the reasoning behind running the pps on ml and mli first and then the run compilation steps. Some kind of semantic barrier ?

It's just to avoid doing the preprocessing three times:

  • once for ocamldep
  • once for ocamlc
  • once for ocamlopt

Surely there is no direct dependency between the preprocessing of interface and implementation files, so why not use a finer grained level of parallelism in that data flow graph ?

This is already the case, ml and mli files can be preprocessed in parallel. If we assume that the preprocessor might run the typer, we need to add a lot more dependencies; basically the same dependencies as for compiling the file.

Probably, it's hard to judge from my venture point. Is that a working hypothesis behind the tool design ?

Yes. That's how the vast majority of ppx rewriters work, so why not take advantage of it.

BTW, if the goal of ppx_import is to avoid copying the same type definition between the .mli and .ml, why not simply write it in a separate .ml file without .mli? We do this all the time at Jane Street; it works fine and it's only using plain OCaml.

If you want to import a type definition coming from an external library, then you can do this with jbuilder, simply not as ppx rewriter.

@xguerin
Copy link
Author

xguerin commented Jul 20, 2017

Thanks for the clarification !

If we assume that the preprocessor might run the typer, we need to add a lot more dependencies; basically the same dependencies as for compiling the file.

I was about to write how all this looks like a simple scheduling problem, but I think I understand your point now. jbuilder considers rewriters as pure functions.

Which explains the lack of support for load_path and include_dirs in the current version of OMP and explains further the lack of necessity to add them.

If my assessment is correct it would probably be fair to document that choice. It's a sane one, but also a radical one as it removes some degrees of freedom in the writing of PPX extensions.

why not simply write it in a separate .ml file without .mli?

I'm not quite sure what you mean. I'm no OCaml expert ;) How would that work when developing a library and these types are part of its external interface?

@ghost
Copy link

ghost commented Jul 20, 2017

This could be documented indeed.

How would that work when developing a library and these types are part of its external interface?

For instance if you have big type definition declared in a toplevel module Foo, instead of writing it in both foo.ml and foo.mli, you can write it once in foo_types.ml as follow:

type t = ... [@@deriving blah]

then in foo.ml you write include Foo_types and in foo.mli you write include module type of Foo_types

@xguerin
Copy link
Author

xguerin commented Jul 20, 2017

Got it. Thanks !

@rgrinberg
Copy link
Member

ppx_import offers more interesting functionality than just copying. For example, you can replace/add ppx attributes on imported fields for example (https://github.com/whitequark/ppx_import/#with-replacements).

@ghost
Copy link

ghost commented Jul 21, 2017

You could do all that for types coming from other libraries with jbuilder, simply not as a ppx

@ejgallego
Copy link
Collaborator

Hi folks, for completeness I am adding my use case for ppx_import, which I am not so sure can be replaced by any of the approaches mentioned in this thread.

The context is, we have a large development not wanting to use ppx [Coq in this case], and we want to instrument it. As we have no access to the Coq codebase, the way we proceed is to alias every Coq module of interest plus its dependencies:

 (* file ser_foo.ml for original module Foo *)
 type t = [%import: Foo.t] [@@deriving sexp]

we also have Bar.t, which uses Foo.t:

 (* ser_bar.ml *)
 module Foo = Ser_foo
 type t = [%import: Bar.t] [@@deriving sexp]

and we continue ad-infinitum. In this case the current behavior of ppx_import is essential and quite a bit hard to emulate without the current semantics. Note that Ser_bar.t_of_sexp will depend on Ser_foo.t_of_sexp. Maybe we could just drop that dependency and re-add the alias at linking time, but things become tricky. I dunno, I never thought a lot about that as it just used to work.

Note that I do not personally think that requiring a change of build system in order to solve non-working ppx is an acceptable requirement in general, even thou I think jbuilder seems very interesting and I am looking foward to try it with some large OCaml codebase. Cheers!

@ghost
Copy link

ghost commented Jan 20, 2018

I was looking at this issue again. We could easily support importing types coming from library dependencies. For instance with something like this:

(library
 ((name foo)
  (preprocessor_uses_typer)
  (preprocess (pps (ppx_import))
  (libraries (a b c))))

you could import types from libraries a, b and c. However, if we want to import types between the modules of library foo that's a much more complicated problem since there is a circular dependency:

  • to compile a .ml or .mli we need to have compiled its dependencies first
  • to compute its dependencies, we need to have preprocessed it first
  • to preprocess it with ppx_import, we need to have compiled its dependencies first (!!)

But I assume that ppx_import is mostly used to import types from other libraries, right? So would supporting this case only be enough?

@xguerin
Copy link
Author

xguerin commented Jan 20, 2018

But I assume that ppx_import is mostly used to import types from other libraries, right? So would supporting this case only be enough?

It can do both.

However, since there is a workaround for sharing declarations within the same compilation unit it is fine if jbuilder (or dune? I was confused by the name change ;)) does not support it.

At least having this feature would enable using ppx_import with jbuilder for most scenarios, which is not the case at the moment.

@ghost
Copy link

ghost commented Jan 20, 2018

Ok, that seems worth adding then. Instead of adding this field in jbuild files using ppx_import, we could add something to the META file of ppx_import, such as use_typer="true".

BTW, I'm curious, how do you deal with the circular dependency?

@ghost
Copy link

ghost commented Jan 20, 2018

Oh, I get it, from ppx_import.cppo.ml:

      if Ast_mapper.tool_name () = "ocamldep" then
        (* Just put it as manifest *)
        if is_self_reference lid then
          { type_decl with ptype_manifest = None }
        else
          { type_decl with ptype_manifest = Some manifest }
      else

the ppx does something different depending on whether it is invoked from ocamldep or ocamlc/ocamlopt...

Well, that makes things simpler actually. We can simply add a new type of preprocessing:

(preprocess (alt_pps (ppx_import)))

alt is for alternative. It will be the same as (pps (...)) except that it will use the -ppx option of ocamldep/ocamlc/ocamlopt. However, ppx_import will still need to use the ocaml-migrate-parsetree driver and install itself as a library to be compatible with dune.

@xguerin
Copy link
Author

xguerin commented Jan 20, 2018

alt is for alternative

Wouldn't your original proposal to add a extra option in the META make the whole experience a bit more standardized ? I fear that having multiple pps operators may confuse users.

ppx_import will still need to use the ocaml-migrate-parsetree driver and install itself as a library to be compatible with dune.

I think I have that in a branch somewhere. I did not propose the change for inclusion at the time as I also migrated the build system to jbuilder and could not make the test work because of the very reason we are discussing :)

@ghost
Copy link

ghost commented Jan 21, 2018

Ok. Yeah, you are right we shouldn't add another pps operator.

That requires changing more things in jbuilder though, as currently we resolve libraries after setting up the rules, while now we'll need to resolve them before. But in the end we are starting to need that for other features such as inline tests, so I think that's worth doing.

@rgrinberg I though a bit about how to do this without breaking things, and I think we should add something like this in build_system.mli:

(** [prefix_rules prefix ~f] implicitly prefix all rules generated by [f] with [prefix]. *)
val prefix_rules : (unit, unit) Build.t -> f:(unit -> unit)

and we would use it this way in library_rules and executables_rules:

let prefix = Build.record_lib_deps libs_written_by_user in
let libs, prefix =
  match resolve_libraries libs_written_by_user with
  | Ok libs -> (libs, prefix)
  | Error e -> ([], prefix >>> Build.fail e)
in
prefix_rules prefix ~f:(fun () -> ...)

that would preserve the current behavior: we only fail about missing libraries while building things, not while setting up the rules.

@rgrinberg
Copy link
Member

One thing that I've been wondering is that this pattern of preprocessors having their own dependencies is not uncommon. We've already encountered it with menhir, and it seems like it would be useful with https://github.com/stedolan/ppx_stage/ as well. I wonder if we should have a more general interface for preprocessors to spit out what dependencies they want. Seems like this should be simple enough to accomplish with the new build API at least.

@ghost
Copy link

ghost commented Jan 21, 2018

Yh, that makes sense. I suppose we could just add an argument --depend to the driver. Technically it could just set the tool name to "ocamldep", and then scan the generated ast and print out the dependencies rather than dump the generated ast. Makedepend is part of compiler-libs.common so that should be easy to to.

In fact, even for normal ppx rewriters that could be useful, the driver could do both the rewriting and spit out the dependencies at the same time, so that we don't need to call ocamldep afterwards. So we could have the two following flags:

  • ppx.exe --only-depend, set the tool to "ocamldep" and only print dependencies
  • ppx.exe --with-depend, doesn't change the tool, dump the ast and the dependencies

The first mode should be opt-in for ppx rewriters that need it, since it will need two calls to the ppx rewriter rather than just 1. So I think we still need to mark such ppx rewriters.

I think we should start with something simple that just switches between the current ppx method and passing a -ppx option to ocamldep/ocamlc/ocamlopt and keep the driver changes as future optimisations.

@rgrinberg
Copy link
Member

@rgrinberg I though a bit about how to do this without breaking things, and I think we should add something like this in build_system.mli: ...

Would resolve_libraries invoke the build system in your example? We do need to build all those .requires files to calculate the closure of those libs for this to work for inline tests for example.

@ghost
Copy link

ghost commented Jan 29, 2018

This is not supported at the moment, you cannot wait for a file while setting up the rules. For instance, that's why you cannot include a generated file in a jbuild tile. This will be supported at some point but it needs some work.

For now, if we want to look at the transitive closure, we have to compute it without going through the build system. So we need to get rid of the .requires files. If there is a performance degradation, we can use some in memory caches. In the end, this should simplify the code since it will be more direct.

I've been thinking of a better API to deal with libraries and name resolution in general. The API will allow to easily switch between doing the computations statically or dynamically through the build system. I'll try to write up the mli files soon.

@ghost ghost mentioned this issue Feb 6, 2018
@samoht
Copy link
Member

samoht commented Apr 20, 2018

Would it be possible to re-open that issue? ppx_import is very useful when used in conjonction of Crowbar to generate random fuzz generators (see OCaml parsetree or x509 generators). That's a very pleasant way to generate these.

Is there any current workaround to use ppx_import in a dune project?

@rgrinberg
Copy link
Member

I recall @diml mentioning that adding support for ppx_import isn't going to be so hard after all. Perhaps you'd like to tackle it @samoht :)

@ghost
Copy link

ghost commented Apr 23, 2018

It should be relatively easy now, and easier once #576 is merged.

@samoht
Copy link
Member

samoht commented May 3, 2018

Unfortunately I don't have time to work on that :-(

But can you re-open the issue so it won't get lost? As I said It's actually a blocker to use ppx_deriving_crowbar on large projects.

@ghost ghost reopened this May 3, 2018
@ghost
Copy link

ghost commented May 3, 2018

I re-opened it. Does ppx_deriving_crowbar uses the OCaml typer as well?

@ghost
Copy link

ghost commented May 3, 2018

It makes me think that it would be good to have a better story to use the typer in ppx rewriters. It's one of the most expansive pass of compilation and it's not really good if we end up doing many more times.

@Drup
Copy link

Drup commented May 3, 2018

Note that in various cases, when a ppx runs the typechecker, it's not necessarily the vanilla unmodified typechecker. See the various libraries by Jun Furuse for concrete examples. For a while, we considered making the new eliom compiler works like that as well.

Having first-class support in jbuilder for such "late ppxs" that need to run right before typechecking would make such experiments quite easier.

@ghost
Copy link

ghost commented May 8, 2018

Ok. But can these libraries read vanilla cmi files?

One thing I'm thinking is that the majority of ppx rewriters are locally expanding an extension point. So instead of having them register a whole pass and have several of them type the whole AST just to untype right after, we could instead modify the typer so that when it encounters an extension point, instead of raising it would call an expander just on this extension point and pass it the typing environment. This would give a clean way for ppx rewriters to access typing information. Especially it would be much faster and would have a better composition semantic.

@Drup
Copy link

Drup commented May 8, 2018

Yes, they read vanilla cmi files. You can introduce some interesting typing additions without changing the typedtree/types datastructure so much.

In theory, your idea is not bad. Unfortunately, I don't think it'll work for OCaml in the short term (ask Leo if you want many arguments against it) and it's rather more subtle to add than you would think.
In all cases, it requires not-so-trivial modifications to the typechecker. I don't think support for PPXs that access CMIs should be based on that.

@ghost
Copy link

ghost commented May 8, 2018

In any case it cannot be based on having individual ppx running the typer as it doesn't work as soon as you use two of them

@Drup
Copy link

Drup commented May 8, 2018

PPX that access CMIs do not always run the typer. This ticket was about ppx_import first. I agree composition is not perfect, but having a mechanism for proper composition (using typechecker hooks, or whatever) seems orthogonal to having the proper build-level dependencies in place.

@ghost
Copy link

ghost commented May 8, 2018

Sure. Making ppx_import work with Dune seems fine. The current plan is as follow: I'm waiting for ocaml/opam-repository#11903 to be merged as it releases a new version of ppxlib with a change that's necessary to unblock #576. Then I'll rebase and merge #576 which refactors the ppx management code. Then we can change a bit the code so that under certain circumstances it uses the -ppx argument of ocamldep, ocamlc and ocamlopt rather than run the ppx rewriters only once manually. That should be enough to get ppx_import to work with Dune.

To select the latter method, I think we'll add (kind ppx_rewriter_using_typer) and (kind ppx_deriver_using_typer) so that Dune knows the ppx rewriter might read cmi files. These will translate to some variable in the META file.

@Drup
Copy link

Drup commented May 8, 2018

Right, that sounds great! :)

@rgrinberg
Copy link
Member

rgrinberg commented May 10, 2018 via email

@ghost
Copy link

ghost commented May 10, 2018

I had something much simpler than implementing classical ppx in mind. I was just thinking of passing the driver via -ppx "ppx.exe ...". Properly implementing classical ppx is a bit more complicated, and there is not enough information in classical ppx meta files to support them smoothly

@Niols
Copy link
Contributor

Niols commented Jul 10, 2018

Hey, apparently, the two PR mentioned by @rgrinberg have been merged. Any chance to have the other modifications? It's not urgent, but that would be great!

@ghost
Copy link

ghost commented Jul 10, 2018

The refactoring of the of ppx system in dune to remove hard-coded stuff was quite painful, so I left the rest on the side. For now we are finalizing the 1.0.0 release of Dune.

If someone is willing to work on this now, I can detail what needs to be done, otherwise one of us will look at this in the coming weeks.

@ghost
Copy link

ghost commented Aug 2, 2018

ppx_import should now be supported via: (staged_pps ppx_import)

rgrinberg added a commit to rgrinberg/opam-repository that referenced this issue Aug 6, 2018
CHANGES:

- Fix lookup of command line specified files when `--root` is given. Previously,
  passing in `--root` in conjunction with `--workspace` or `--config` would not
  work correctly (ocaml/dune#997, @rgrinberg)

- Add support for customizing env nodes in workspace files. The `env` stanza is
  now allowed in toplevel position in the workspace file, or for individual
  contexts. This feature requires `(dune lang 1.1)` (ocaml/dune#1038, @rgrinberg)

- Add `enabled_if` field for aliases and tests. This field controls whether the
  test will be ran using a boolean expression language. (ocaml/dune#819, @rgrinberg)

- Make `name`, `names` fields optional when a `public_name`, `public_names`
  field is provided. (ocaml/dune#1041, fix ocaml/dune#1000, @rgrinberg)

- Interpret `X` in `--libdir X` as relative to `PREFIX` when `X` is relative
  (ocaml/dune#1072, fix ocaml/dune#1070, @diml)

- Add support for multi directory libraries by writing
  `(include_subdirs unqualified)` (ocaml/dune#1034, @diml)

- Add `(staged_pps ...)` to support staged ppx rewriters such as ones
  using the OCaml typer like `ppx_import` (ocaml/dune#1080, fix ocaml/dune#193, @diml)

- Use `-opaque` in the `dev` profile. This option trades off binary quality for
  compilation speed when compiling .cmx files. (ocaml/dune#1079, fix ocaml/dune#1058, @rgrinberg)

- Fix placeholders in `dune subst` documentation (ocaml/dune#1090, @emillon, thanks
  @trefis for the bug report)

- Add locations to errors when a missing binary in PATH comes from a dune file
  (ocaml/dune#1096, fixes ocaml/dune#1095, @rgrinberg)
@ejgallego
Copy link
Collaborator

ejgallego commented Sep 3, 2018

ppx_import should now be supported via: (staged_pps ppx_import)

@diml how is this supposed to work? I have:

(library
 (public_name coq-serapi.serlib)
 (name serlib)
 (synopsis "Serialization Library for Coq")
 (preprocess (staged_pps ppx_import))
 (libraries sexplib))

However I get:

egallego@lasserre:~/research/coq-serapi$ dune build
File "serlib/dune", line 5, characters 13-36:
Error: No ppx driver were found. It seems that ppx_import is not compatible
with Dune. Examples of ppx rewriters that are compatible with Dune are ones
using ocaml-migrate-parsetree, ppxlib or ppx_driver.

I am using ppx_import from git, so maybe more changes upstream are needed?

@ghost
Copy link

ghost commented Sep 4, 2018

Yes, ppx_import needs to be ported to use ocaml-migrate-parsetree. Additionally, porting it to dune would help setting everything so that it can be used with dune.

@ejgallego
Copy link
Collaborator

Yes, ppx_import needs to be ported to use ocaml-migrate-parsetree. Additionally, porting it to dune would help setting everything so that it can be used with dune.

Thanks for the information, I will follow up with upstream.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants