-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducible command line + determinism #450
Comments
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed. cc @rust-lang/compiler @rust-lang/compiler-contributors |
@rustbot second I think it's generally a good idea to provide this functionality. Let's move forward with implementing this as an unstable feature so we can get some real-world experience with it. Note that stabilizing this feature needs a separate decision by the whole compiler team. |
And thanks a lot for the detail proposal, @danakj! |
@rustbot label -final-comment-period +major-change-accepted |
…haelwoerister Introduce -Z remap-cwd-prefix switch This switch remaps any absolute paths rooted under the current working directory to a new value. This includes remapping the debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`. Importantly, this flag does not require passing the current working directory to the compiler, such that the command line can be run on any machine (with the same input files) and produce the same results. This is critical property for debugging compiler issues that crop up on remote machines. This is based on adetaylor's rust-lang@dbc4ae7 Major Change Proposal: rust-lang/compiler-team#450 Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
…haelwoerister Introduce -Z remap-cwd-prefix switch This switch remaps any absolute paths rooted under the current working directory to a new value. This includes remapping the debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`. Importantly, this flag does not require passing the current working directory to the compiler, such that the command line can be run on any machine (with the same input files) and produce the same results. This is critical property for debugging compiler issues that crop up on remote machines. This is based on adetaylor's rust-lang@dbc4ae7 Major Change Proposal: rust-lang/compiler-team#450 Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
…haelwoerister Introduce -Z remap-cwd-prefix switch This switch remaps any absolute paths rooted under the current working directory to a new value. This includes remapping the debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`. Importantly, this flag does not require passing the current working directory to the compiler, such that the command line can be run on any machine (with the same input files) and produce the same results. This is critical property for debugging compiler issues that crop up on remote machines. This is based on adetaylor's rust-lang@dbc4ae7 Major Change Proposal: rust-lang/compiler-team#450 Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
…haelwoerister Introduce -Z remap-cwd-prefix switch This switch remaps any absolute paths rooted under the current working directory to a new value. This includes remapping the debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`. Importantly, this flag does not require passing the current working directory to the compiler, such that the command line can be run on any machine (with the same input files) and produce the same results. This is critical property for debugging compiler issues that crop up on remote machines. This is based on adetaylor's rust-lang@dbc4ae7 Major Change Proposal: rust-lang/compiler-team#450 Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
…haelwoerister Introduce -Z remap-cwd-prefix switch This switch remaps any absolute paths rooted under the current working directory to a new value. This includes remapping the debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`. Importantly, this flag does not require passing the current working directory to the compiler, such that the command line can be run on any machine (with the same input files) and produce the same results. This is critical property for debugging compiler issues that crop up on remote machines. This is based on adetaylor's rust-lang@dbc4ae7 Major Change Proposal: rust-lang/compiler-team#450 Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Proposal
Hello,
We in Chromium are working on integrating Rust into our distributed build system. A key component of that is reproducible builds. Currently, we can not achieve the level of reproducibility required by our infrastructure teams with rustc, and we would like to propose a command line flag that would get us there.
Background
How Rust is being invoked in Chromium
The Chromium build system consists of 3 major components:GN: this generates the set of inputs and outputs and the command lines to get from one to the other. The output of GN feeds into Ninja.
Ninja: this orders and distributes the set of actions to run across many processes, in order to build in a highly efficient parallel way.
Goma: this acts as a compiler proxy on developers machines, and on build bots. Goma provides a
rustc
binary that proxies the inputs to a distributed build system, which is distributed across a fleet of machines. It also acts as a build cache, such that most compilation steps, for most developers, simply hit the cache and can immediately send back the resulting build output.Our builds are run by developers and bots running Linux, Mac, and Windows, and Goma supports the same platforms.
There are 4 forms of build determinism, or reproducibility:
Basic: building the project twice produces the same output.
Incremental: rebuilding just part of the project (change the timestamp on a single file for example) produces the same output.
Local: the build output does not depend on where the source code, build directory, or output is located
Universal: the build output does not depend on any machine-specific state, and produces the same output on any machine.
These are explained in more detail in this blog post: https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html
Like Gecko, we have no external crates, they will all be under a single source root.
This was originally brought up on #38322, and a separate issue was created at #87325, which has now moved to this MCP.
Chromium bug: https://crbug.com/1230302
Reproducibility Issue
When building with rustc, the (absolute path to the) current working directory must appear either:
in the build outputs (
DW_AT_comp_dir
,DW_AT_decl_file
, and coverage output), oron the rustc command line.
The current working directory is a machine-specific (and typically user-specific) state. Thus it should not appear in any build output from the Rust compiler when determinism is needed. The compiler provides a
--remap-path-prefix
switch that can replace any path prefix, including the current working directory, within all paths. It was designed and implemented on issue #38322.While
--remap-path-prefix
is often used specifically to strip off the current working directory, it is designed to be more general, requiring the user to specify the path to be remapped. This was done to support external crates located in paths that are not all relative to a single root. However, this places a requirement on the build system to include the (absolute path to the) current working directory in the rustc command line in order to rewrite paths in build outputs to be relative.In a distributed build system, the actual
rustc
invocation is not run locally, but is proxied over to another machine. With--remap-path-prefix
, the developer’s command line includes their own current working directory, so this directory must also be proxied over to the build bot and replace the PWD environment variable there for the command line to work correctly. This places user/machine specific information in both the command line and the environment.It is a desirable property of a compiler to provide reproducible builds across build environments.
These are critically important for debugging distributed build systems, in order to locally reproduce and debug problems as they arise on build bots. In particular, this includes if a bug should occur in rustc that needs to be diagnosed. For example, to reproduce and debug a build on a bot means grabbing logs, generated with ltrace, which shows the exact command lines being run. We do this to reproduce builds on a developer machine, and we expect the same commands to run exactly the same way there. There should be nothing machine-specific in the invocation or execution of rustc or the commands can fail to correctly reproduce failures. For this reason, we have a single source root, and Chromium always passes relative paths to the compiler. But by introducing the current working directory into the command line for
--remap-path-prefix
, this property would be lost.Secondly, any build cache must maintain a mapping from its inputs to the build output. One of those inputs is the command line, as compiler flags can change the output. A build cache requires the input command line to be machine- and user-independent in order to share the cache. Such a cache will not be reliable unless it is keyed off the fully-resolved command line. If it used a command line with shell invocations embedded (such as
--remap-path-prefix=$(pwd)=.
), then it would produce incorrect results in other situations like-C debuginfo=$(determine_debug_info.sh)
. If a script were used to wrap the compiler, not only does this increase the complexity of the build system, but if any flags introduced in the wrapper script would not be visible to the build cache, which could produce caching mistakes.Wrapper scripts impose other complexity and problems which are not warranted and can be avoided with fairly simple
rustc
support. As a cross-platform distributed build system, many platforms are supported, and the runtime characteristics and limitations of each platform differ. Of note is that on Windows, spawning a subprocess is expensive, and at scale would be both visible in time and cost dollars. Second, the--remap-path-prefix
must be part of the command line on developers’ machines as well, in case the compilation is not proxied (a distributed build decides this at compile time, while command lines are decided at an earlier step), which prevents the server from injecting the flag itself. And beyond that, it does not change the actual invocation of rustc, which continues to interfere with the ability to reproduce a build from a buildbot on a developer machine, as captured rustc executions would be incorrect.We see the reproducibility of compiler execution as an important property of the Rust compiler in order to scale into large development infrastructure.
Prior work
The same set of problems exist for us with Clang, which must also generate a
DW_AT_comp_dir
record in its output. Clang provides a-fdebug-compilation-dir
which allows the user to specify what should be written as the current working directory for that field explicitly. The flag is described as:This flag supports reproducible command lines by not having the current working directory appear as part of its input. Only the desired
DW_AT_comp_dir
(andDW_AT_decl_file
) path is specified, not the source path.There exists a related RFC #3127 for cargo, as mentioned by @jyn514, to remove absolute paths in its build. However this does not resolve this issue, as Chromium will not be using cargo to invoke rustc and we are already passing relative paths to rustc with our GN-generated rules.
There is an issue #64839 which states that there are privacy concerns with
--remap-path-prefix
requiring the user’s absolute path to their project. This proposal may solve that issue as well, as no absolute paths need to be passed, and they will also be stripped from the compiler’s output.Proposal
There are a few high-level approaches that can resolve this issue.
A command line switch to modify
DW_AT_comp_dir
andDW_AT_decl_file
specifically. This would be similar to Clang’s-fdebug-compilation-dir
flag. Internally, the flag’s value would replace rustc asking for thestd::env::current_dir()
. However, this would not change all outputs, such as coverage output, and--remap-path-prefix
already provides machinery for this with the ability to reach everything.When no external crates are present, and all paths provided to the compiler are relative, then all absolute paths come with a common prefix of the current working directory. Thus the missing functionality for universally reproducible command lines is a way to map path prefixes of the current working directory.
This could be done with a special codeword in
--remap-path-prefix
such as--remap-path-prefix=%CWD%=.
Then the compiler would substitute%CWD%
withstd::env::current_dir()
when doing file path mapping.--remap-cwd-prefix
(based on naming feedback from @jsgf). In order to experiment with the flag, we would land it initially as-Zremap-cwd-prefix
. This flag would take precedence over any --remap-path-prefix switches, and otherwise behave exactly like--remap-path-prefix=$(pwd)=
, mapping the current working directory prefix to a specified value without requiring shell expansion.@jsgf flagged the question of if the current working directory should include the trailing path separator. The concern raised was that if the current working directory is
/home/me
it would also remap/home/meal
. The implementation of remapping is found inFilePathMapping::map_prefix(&self, path: PathBuf)
, incompiler/rustc_span/src/source_map.rs
. The remap is done only ifPathBuf::strip_prefix()
succeeds, which only maps prefixes that are full path components, meaning that/home/meal
would (correctly) not be matched by this proposal without any extra work.Thus, based on the discussion so far, we believe that the last option seems like the simplest and best path forward. This is done in PR #87320.
Future work
Clang has recently landed a flag to rewrite relative paths into generated coverage profiling data, now known as
-fcoverage-compilation-dir
. Clang also added a switch that rewrites the current working directory in both debug and coverage data. With the--remap-cwd-prefix
proposal, Rust should already solve this issue, as long as the coverage profiling data is written with FileName::prefer_remapped(). If not, it should be a simple correction.Thank you for reading.
Mentors or Reviewers
@michaelwoerister
Process
The main points of the Major Change Process are as follows:
@rustbot second
.-C flag
, then full team check-off is required.@rfcbot fcp merge
on either the MCP or the PR.You can read more about Major Change Proposals on forge.
Comments
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.
The text was updated successfully, but these errors were encountered: