Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for BUILD_PATH_PREFIX_MAP #7540

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

richardlford
Copy link
Contributor

@richardlford richardlford commented Apr 13, 2023

Modify Dune to produce BUILD_PATH_PREFIX_MAP mappings that map paths to package built artifacts to abstract paths that mirror the installed hierarchy. In particular, for example, if file

src_root/_build/default/somepaths/somefile

is a part of package pkg and will be installed in a section sec (e.g. lib or doc) at location

$prefix/sec/pkg/anotherpath/renamed_somefile

then the Dune BUILD_PATH_PREFIX_MAP will map that file to abstract path

/install_root/sec/pkg/anotherpath/renamed_somefile

where in this case, I have chosen /install_root as the literal prefix for mappings of things that are part of a package.

If a built artifact is not part of a package, then the mapping remains the same, namely the above file, if not part of a package, will map to

/workspace_root/somepaths/somefile

This is implemented by first using Dune's install logic to get a mapping of all installed files from their location in the build directory to the install location. Then that mapping is abbreviated, taking advantage of the fact that for many artifacts we can do a directory-level mapping rather than mapping each individual file. When the basename of a file changes when installed, the file needs its own mapping.

In addition, the dune describe command was expanded.

The dune describe workspace command now includes package information.

There is a new dune describe map command that outputs information about the BUILD_PATH_PREFIX_MAP mappings, both full and abbreviated.

The net effect of this change is that for any installed file, its runtime location can be obtained from its abstract path by merely replacing /install_root with the actual install location, e.g. $OPAMROOT/$SWITCH.
Fixes #7413.

@richardlford richardlford marked this pull request as draft April 13, 2023 00:18
@richardlford
Copy link
Contributor Author

This is still a draft, as it still has some debugging code in it, and output that is not sanitized. Also, I'm still adding tests.
However, @rgrinberg, could you please take a look to see if I'm on the right track? Thanks.

@richardlford
Copy link
Contributor Author

I have a separate gitlab PR, https://gitlab.com/gasche/build_path_prefix_map/-/merge_requests/2#7facec255884dc55d3818e4dfbea89b1bdd7f49e, for the vendored build_path_prefix_map changes. This brings over changes that were merged into the ocaml trunk, with some small additions.

@richardlford richardlford marked this pull request as ready for review April 20, 2023 00:58
@richardlford
Copy link
Contributor Author

richardlford commented Apr 20, 2023

@rgrinberg, I've converted this from draft to ready for review, not because it is perfect, but because I need your help with some memo/fiber issues. Also, for the changes to the vendored build_path_prefix_map code, I have a separate PR out to @gasche, so I realize that those changes would not be checked directly into Dune. Hopefully, that PR will be accepted and merged before this PR is ready.

I took your suggestion of looking at the dune describe command and found it a handy vehicle for testing my mappings. For that purpose, I added a dune describe map command to call the functions to compute the forward and inverse mappings, and also to test them for consistency.

The strategy for computing the forward mappings was to start with a map of every single file that will be installed, mapping from its relative location in the build directory to its abstract installation path. That map is then abbreviated by doing directory-level mapping when possible.

For the inverse map, we first split out the mappings of installed files that exist in the source directory (i.e. are not generated). We do that because, for the inverse mappings, we want to give precedence to the location in the source directory.

One complication that arose was related to sandboxing. Because of it, we could not use the pwd in computing the build directory paths. Instead, we had to take the root that is available in action_exec. That means that when doing a mapping from action_exec, a final adjustment to the map is needed.

All is working well as far as the mapping goes, but there were some technical challenges, and I'm not sure my solution was the correct or best way. In particular, I'm running into an assertion failure which I'll mention below.

But first, let me mention the first challenge. The existing code that does mapping is in the dune_engine/action_exec.ml file, in the exec function. It was calling code in the Build_path_prefix_map0 module to encode the desired map. But previously there was only one mapping pair, from the build root to /workspace_root. But now there is a more complex map that must be computed, and the information needed to compute it is in the dune_rules/install_rules files. But those are not accessible to dune_engine. My solution was to call the functions that compute the maps from dune_rules/main.ml in the get functions that returns the Dune_rules.build_system. At that point, I compute the maps, but store them in some references that are defined in dune_util/build_path_prefix_map0.ml. There they are handily available when needed.

The memo/fiber problem I'm having is showing up when trying to build the lwt library. The assert false in file dune_engine/build_system.ml, in function update_build_progress_exn, line 187 is failing. That function is only supposed to be called when the State is Building, but by adding some print statements, I was able to see that the State was Initializing. So it appears that reset_progress was never called. However, if I delete the calls to compute my maps, the problem goes away. I've spent quite a bit of time studying the Fiber and Memo code, but I think there is still a lot that I don't understand. Perhaps the solution to the problem would be apparent to you.

This could logically be divided into two PRs, one for the forward mapping, and another for the inverse (i.e. the dune debug command). But it made sense to code the algorithms for the forward and inverse mappings in a coordinated way, to insure they were consistent.

I'll keep doing more testing and debugging, but if you could take a look and help me with the memo/fiber problem, I'd much appreciate it.

Thanks

src/dune_lang/pform.mli Outdated Show resolved Hide resolved
that is a prefix of the input [path]. If it succeeds,
it replaces this prefix with the corresponding target.
If it fails, it just returns [None]. *)

val rewrite_all : map -> path -> path list
(** [rewrite_all map path] finds all sources in [map]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you pulling these changes from upstream? If so, this should be done separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned, I have a PR to @gasche, to get this change into his upstream repo. I've not gotten any response from him yet on that PR, however, he was the one that merged that change into the OCaml trunk. I suppose if you wanted to merge this into Dune separately you could, and then when it gets upstream you could go back to getting it there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not heard from @gasche on the build_path_prefix_map changes, so we may need to merge this as part of the PR and later vendor it when it is ready. Relative to my build_path_prefix_map I have one change, a local copy of List.find_map, in order to be able to build Dune with versions of OCaml that did not yet have List.find_map.

bin/debug.ml Outdated Show resolved Hide resolved
@rgrinberg
Copy link
Member

The memo/fiber problem I'm having is showing up when trying to build the lwt library. The assert false in file dune_engine/build_system.ml, in function update_build_progress_exn, line 187 is failing. That function is only supposed to be called when the State is Building, but by adding some print statements, I was able to see that the State was Initializing. So it appears that reset_progress was never called. However, if I delete the calls to compute my maps, the problem goes away. I've spent quite a bit of time studying the Fiber and Memo code, but I think there is still a lot that I don't understand. Perhaps the solution to the problem would be apparent to you.

Probably because of the mutation you've introduced. All memo computation must not have effects. We need to think of a different way to thread this map.

@richardlford
Copy link
Contributor Author

richardlford commented Apr 24, 2023

Probably because of the mutation you've introduced. All memo computation must not have effects. We need to think of a different way to thread this map.

I'm open to your suggestions on that. I see that the Dune_rules.Main.get() function returns a memo, but I'm storing some results in ref variables from within that function, so that is an effect in a memo computation.

Is there a way to use a memo but return a non-memo? Do we have to convert the memo to a fiber and then start up the scheduler?

Another question I had: Currently I'm computing the maps up front. Would there be any advantage to computing them lazily?

@Alizter
Copy link
Collaborator

Alizter commented May 8, 2023

A few comments:

  • The changes to describe you are proposing are similar to Add "dune describe package-entries" #7480. You should probably split it off from this PR too.
  • I'm inclined to think that the command should be in the ocaml command group, so it would be dune ocaml debug. We don't have any other debuggers planned to be added (not even for Coq) so it might seem a little strange. My motivation is that it is a really ocaml-centric feature so should be in that command group.

I'm still aware that this PR is in draft mode, so I won't do many comments on the code itself for now since it might change.

@richardlford
Copy link
Contributor Author

@Alizter, Thanks for pointing that out. I will split out my "dune describe" changes to show packages. My "dune describe maps" is different and I will keep it, as it is also helpful in validating the maps.

  • I'm inclined to think that the command should be in the ocaml command group, so it would be dune ocaml debug. We don't have any other debuggers planned to be added (not even for Coq) so it might seem a little strange. My motivation is that it is a really ocaml-centric feature so should be in that command group.

OK. Or how would you feel about "dune ocamldebug"? That would be a little more concise (though only by one space, so perhaps your suggestion is better). I'll also be splitting this part into a separate PR.

@Alizter
Copy link
Collaborator

Alizter commented May 8, 2023

Since command groups have a tree structure, dune ocaml debug is probably better.

@richardlford richardlford marked this pull request as ready for review May 8, 2023 22:58
@@ -7,7 +7,7 @@ in the same dune file, but require different ppx specifications
$ dune build @all --profile release
$ dune ocaml merlin dump-config $PWD
Usesppx1
((STDLIB /OCAMLC_WHERE)
((STDLIB /install_root/lib/ocaml)
Copy link
Collaborator

@Alizter Alizter May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the behaviour of BUILD_PATH_PREFIX_MAP inside a cram test is broken. The setting of the env var above should make this get substituted for OCAMLC_WHERE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually not. The setting of the map in the Dune supersedes the existing setting. For paths not covered by the Dune mappings, the previous settings would take effect.

One difference between the current Dune mappings and the previous settings is that before, only paths under the PWD were mapped. Now paths under the opam root are also mapped (to /install_root/...). That is why the prior mappings are not effective. Perhaps these tests should have these mappings removed since they are no longer needed?

@richardlford
Copy link
Contributor Author

I've converted this back to "ready to review" again. I've removed all of the code that will be split out into a separate PR, and in particular, the code for the "dune ocaml debug" command (and also some debugging code I had added).

I have some questions for the reviewers that will guide any final changes (in addition to the comments reviewers will have):

  • The goal of the current PR is that the rewritten build-time paths for files that will be installed will have the form /install_root/<package>/<package-relative-path>. Sometimes this mapping can be achieved at the directory level, but cases where it cannot include:
    • When the basename of the file installed is different than the file in the build directory.
    • When files in a given build directory have multiple destination directories.
  • As a result of the preceding, for a large project, the mapping can be large, and in some cases we are running into a problem with exceeding the space allocated for arguments and environment variables (when doing execve).

Possible solutions to the above include:

  • Only guarantee abstract paths that mirror the install path for selected paths. The PR currently does this by selecting only OCaml source files (`*.ml). But since Dune is supposed to be a general purpose build system, that does not seem satisfactory (though it could perhaps be OK for this PR).
  • Like the preceding, but allow the user to select the filter in a stanza. Probably most users would not run into the limitation so we could only filter if requested by the user.
  • Currently, if the project has multiple packages, we make a mapping that includes all of the files in all of the packages. But would it work to have a separate mapping for each package? Then when building something that is part of a package, we would use the mapping for that package.
    • That would reduce the size of the mappings. Then we would only run into size problems when an individual package was very large.
    • Is it true that build artifacts for a given package do not reference files in a different package? If not, then perhaps we need to keep having a single mapping for all packages.
    • If it is OK to have each action use the mapping for its containing package, we need a way to know for which package a given action is being performed. I don't currently know how to do that, though if we knew the input or output files for an action we could perhaps use some maps from files to their package (we would need to make these, but we have that information).
  • We could extend the mapping mechanism to allow for storing maps in a file. For example, if the value of BUILD_PATH_PREFIX_MAP started with @, we could interpret the rest as a path to a file containing the mapping. In order to allow user files that have a @, we would need to introduce an encoding for @ similar to the way :, =, and % are currently encoded.

Note that the computations of the maps is now being invoked in bin/build_cmd.ml/run_build_system in these lines;

              Install_rules.Build_map.build_and_save_maps scontexts

This is earlier than install information would typically have been computed, but we need to have the mappings available during the build steps. This seems to be having the effect that some messages that previously were being emitted are skipped (though I do not understand exactly why--yet).

That's all for now. I look forward to some feedback from reviewers on the issues above.

-> Install.Entry.Sourced.t list Package.Name.Map.t
-> Build_path_prefix_map.map

val build_all_maps :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this function being used outside of this module? If it isn't, it shouldn't be in the mli.

let* packages = Stanzas_to_entries.stanzas_to_entries sctx in
Memo.return (ctxn, packages))
in
let maps = Context_name.Map.of_list_reduce pairs ~f:(fun old _new -> old) in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reduce doesn't seem right. How could you have a collision on a context name here? Shouldn't it be Map.of_list_exn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that of_list_reduce works correctly, but of_list_exn is more naturally the right function to use, so I will change to use it.

@rgrinberg
Copy link
Member

Currently, if the project has multiple packages, we make a mapping that includes all of the files in all of the packages. But would it work to have a separate mapping for each package? Then when building something that is part of a package, we would use the mapping for that package.

If this possible, I think it would be preferable. Although it does seem a bit more difficult. So what you have currently is fine too.

Note that the references introduced in Dune_util.Build_path_prefix_map0 are a no-go. I would suggest to focus on getting rid of them. For example, by extending Execution_parameters to allow the necessary information to be passed without mutation.

Also, could you move as much of the logic as possible outside of install_rules and into its own module?

Those changes in Dune_util.Build_path_prefix_map probably don't belong there either. I understand you wanted a change a single code path that will work for both dune_engine and dune_rules, but this is all quite specialized to our rules. So it should live in dune_rules with sensible hooks in dune_engine.

bin/describe.ml Outdated
@@ -817,6 +1001,7 @@ module What = struct
| External_lib_deps
| Opam_files
| Pp of string
| Map
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"map" is a rather general name for a command that does something quite specific. How about "build-path-prefix-map"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (in next push)

package installation.

For the testcase for issue ocaml#7413, the issue is resolved with the
current Dune changes. The result of the grep now gives a count of 1.

For the deployment phase, we need the inverse of the
BUILD_PATH_PREFIX_MAP. I've decided that this should use a different
environment variable, so I've chosen DEPLY_PATH_PREFIX_MAP.
The reason two variables are needed is that a new dune command,
`dune ocaml debug` will been added. It is similar to `dune exec`,
except instead of invoking the executable directory, it
give the executable to the OCaml debugger. But before doing that,
it builds the executable, if necessary, to make sure it is up-to-date.
During the building, BUILD_PATH_PREFIX_MAP maps build-time
absolute paths to abstract paths, but the debugger needs the inverse
mapping, so will use DEPLY_PATH_PREFIX_MAP.

So forward and inverse mappings are computed, and the
`dune describe map` command was added to display and test
the forward and reverse maps.

Promote tests with expected results for new mapping

Add find_map locally to enable build with early OCamls

Guard against invalid existing map in order to guarantee
we always emit a valid map.

Only map .ml files exactly, to decrease map size.
This is a temporary solution to avoid exceeding
the argument/environment size for execve (exhibited
in the benchmarks). Maybe make user-configurable,
or have an indirection feature in the mapping
environment variable so the full mapping can be
in a file.

Use String.is_suffix, as it is available in stdune
for all versions of OCaml.

Signed-off-by: Richard L Ford <[email protected]>
@richardlford
Copy link
Contributor Author

Note that the references introduced in Dune_util.Build_path_prefix_map0 are a no-go. I would suggest to focus on getting rid of them. For example, by extending Execution_parameters to allow the necessary information to be passed without mutation.

@rgrinberg, I'm having a difficult time finding an alternative that does not require mutation. I spent all last week trying to come up with a solution.

Currently, the default execution parameters are set in bin/common.ml, function init. This function also calls Dune_rules.Main.init which sets the Build_config execution_parameters field to a function that computes the execution parameters for each directory (updating them using the information in the dune-project file).

Computation of the mappings requires reading all the dune and dune-project files (as well as opam files), (such as is done by Dune_load.load, as for example in Dune_rules.Main.get ()). I experimented with inserting the following code at the end of Common.init:

  let build_sys =
    (* Here we make the assumption that this computation doesn't yield. *)
    Fiber.run
      (Memo.run (Dune_rules.Main.get ()))
      ~iter:(fun () -> 
        Dune_util.Log.info [Pp.text "common.ml:init, Fiber.run failed"];
        assert false)
  in
  ignore build_sys;

This builds, but when I run the tests I get a lot of crashes, though I don't see my message in the log. So it seems that computation of the maps must be done later, but since Common.init determines the Execution_parameters, I don't see how I can get the maps into the Execution_parameters without mutation.

I'm considering abandoning these changes and instead adding Dune documentation that if a package wants its code to be debuggable after installation, then it needs to arrange its source hierarchy to mirror the installation-time hierarchy.

Do you have any other suggestions?

@richardlford
Copy link
Contributor Author

I'm pushed changes to reflect some of the review comments, but have not yet solved the issue of mutation (as mentioned above).

@richardlford
Copy link
Contributor Author

I'm changing jobs and will not have time to complete this PR. It has proved more complex than expected. If someone else thinks this is worthwhile, they are welcome to continue this effort. As a simpler alternative, I have #7741. It fixes the gap in reproducibility, but the abstract paths will only work for installed libraries if the library authors choose a source layout that mirrors the installation layout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Debug mapping is sometimes inconsistent with installed locations
4 participants