Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: trace config [env] table in dep-info. #14701

Merged
merged 3 commits into from
Oct 25, 2024

Conversation

linyihai
Copy link
Contributor

What does this PR try to resolve?

The env defined in config.toml or --config env are stripped before recording in the dep-info file. The absence of the env table doesn't rebuild when env changed

By tracing the env table into the dep-info file, If the envs changed then the rebuild can be triggered.

Fixes #13280

How should we test and review this PR?

One commit add the test, the latter commit update the test through the fixes.

Additional information

The PR only fixed the env table changes can't trigger a rebuild, other CARGO-like envs
doesn't take into account.

@rustbot
Copy link
Collaborator

rustbot commented Oct 18, 2024

r? @ehuss

rustbot has assigned @ehuss.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-build-execution Area: anything dealing with executing the compiler A-rebuild-detection Area: rebuild detection and fingerprinting S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 18, 2024
//
// For issue#13280, We trace env vars that are defined in the `config.toml`.
on_disk_info.env.retain(|(key, _)| {
!rustc_cmd.get_envs().contains_key(key) || key == CARGO_ENV || config_envs.contains(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, though here is a caveat of this solution:

If people define a Cargo internal env in [env] table, such as

[env]
CARGO_MANIFEST_DIR = "whatever"

Before this patch, Cargo's depinfo won't track CARGO_MANIFEST_DIR. After it tracks and records as CARGO_MANIFEST_DIR=/path/to/manifest-dir.

I wouldn't advertise Cargo supports tracking internal envs, but this could be a hacky workaround for #14154 — by putting an internal env key with random value in [env], a user can force Cargo to track it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One clumsy solution is to filter these internal_envs out of [env] table

 let internal_envs = LazyCell::new(|| HashSet::from([
        "CARGO",
        "CARGO_MANIFEST_DIR",
        "CARGO_MANIFEST_PATH",
        "CARGO_PKG_VERSION",
        "CARGO_PKG_VERSION_MAJOR",
        "CARGO_PKG_VERSION_MINOR",
        "CARGO_PKG_VERSION_PATCH",
        "CARGO_PKG_VERSION_PRE",
        "CARGO_PKG_AUTHORS",
        "CARGO_PKG_NAME",
        "CARGO_PKG_DESCRIPTION",
        "CARGO_PKG_HOMEPAGE",
        "CARGO_PKG_REPOSITORY",
        "CARGO_PKG_LICENSE",
        "CARGO_PKG_LICENSE_FILE",
        "CARGO_PKG_RUST_VERSION",
        "CARGO_PKG_README",
        "CARGO_CRATE_NAME",
        "CARGO_BIN_NAME",
        "OUT_DIR",
        "CARGO_PRIMARY_PACKAGE",
        "CARGO_TARGET_TMPDIR",
        "CARGO_RUSTC_CURRENT_DIR",
        paths::dylib_path_envvar(),
    ]));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like it is actually doing #14701 (comment) 😆.

Comment on lines 333 to 339
let config_envs: HashSet<String> = build_runner
.bcx
.gctx
.env_config()?
.keys()
.cloned()
.collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clone happens for every single unit of work. It may hurt the performance a bit when the [env] table is large. That's a tradeoff.

An alternative is using Arc to avoid cloning the entire HashSet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way is we could have a set of internal env + a set of env from cargo::rustc-env build script invocation. dep_info should first remove those and anything left is what we're interested in (including [env]).

See the PoC

  • Can avoid cloning cargo::rustc-env by acquiring mutex again, though this is just a PoC so didn't do that.
  • Should also need to filter out CARGO_BIN_<exe>
  • Need an debug_assertion to check exhaustiveness of internal_envs in rustc invocation.
diff --git a/src/cargo/core/compiler/fingerprint/mod.rs b/src/cargo/core/compiler/fingerprint/mod.rs
index e6fff90932a..03489d2c99a 100644
--- a/src/cargo/core/compiler/fingerprint/mod.rs
+++ b/src/cargo/core/compiler/fingerprint/mod.rs
@@ -361,8 +361,10 @@
 
 mod dirty_reason;
 
+use std::cell::LazyCell;
 use std::collections::hash_map::{Entry, HashMap};
 
+use std::collections::HashSet;
 use std::env;
 use std::fmt::{self, Display};
 use std::fs::{self, File};
@@ -2130,7 +2132,7 @@ pub fn translate_dep_info(
     rustc_cwd: &Path,
     pkg_root: &Path,
     target_root: &Path,
-    rustc_cmd: &ProcessBuilder,
+    envs_from_build_script_output: &HashSet<String>,
     allow_package: bool,
 ) -> CargoResult<()> {
     let depinfo = parse_rustc_dep_info(rustc_dep_info)?;
@@ -2140,6 +2142,33 @@ pub fn translate_dep_info(
     let mut on_disk_info = EncodedDepInfo::default();
     on_disk_info.env = depinfo.env;
 
+    let internal_envs = LazyCell::new(|| HashSet::from([
+        "CARGO",
+        "CARGO_MANIFEST_DIR",
+        "CARGO_MANIFEST_PATH",
+        "CARGO_PKG_VERSION",
+        "CARGO_PKG_VERSION_MAJOR",
+        "CARGO_PKG_VERSION_MINOR",
+        "CARGO_PKG_VERSION_PATCH",
+        "CARGO_PKG_VERSION_PRE",
+        "CARGO_PKG_AUTHORS",
+        "CARGO_PKG_NAME",
+        "CARGO_PKG_DESCRIPTION",
+        "CARGO_PKG_HOMEPAGE",
+        "CARGO_PKG_REPOSITORY",
+        "CARGO_PKG_LICENSE",
+        "CARGO_PKG_LICENSE_FILE",
+        "CARGO_PKG_RUST_VERSION",
+        "CARGO_PKG_README",
+        "CARGO_CRATE_NAME",
+        "CARGO_BIN_NAME",
+        "OUT_DIR",
+        "CARGO_PRIMARY_PACKAGE",
+        "CARGO_TARGET_TMPDIR",
+        "CARGO_RUSTC_CURRENT_DIR",
+        paths::dylib_path_envvar(),
+    ]));
+
     // This is a bit of a tricky statement, but here we're *removing* the
     // dependency on environment variables that were defined specifically for
     // the command itself. Environment variables returned by `get_envs` includes
@@ -2170,7 +2199,10 @@ pub fn translate_dep_info(
     // not tracked elsewhere in the fingerprint.
     on_disk_info
         .env
-        .retain(|(key, _)| !rustc_cmd.get_envs().contains_key(key) || key == CARGO_ENV);
+        .retain(|(key, _)| 
+            !internal_envs.contains(key.as_str()) && !envs_from_build_script_output.contains(key)
+            || key == CARGO_ENV
+        );
 
     let serialize_path = |file| {
         // The path may be absolute or relative, canonical or not. Make sure
diff --git a/src/cargo/core/compiler/mod.rs b/src/cargo/core/compiler/mod.rs
index ee5fdbbecd6..2f3e547c762 100644
--- a/src/cargo/core/compiler/mod.rs
+++ b/src/cargo/core/compiler/mod.rs
@@ -337,8 +337,11 @@ fn rustc(
         // directory is present.
         if artifact.is_true() {
             paths::create_dir_all(&root)?;
+
         }
 
+        let mut envs_from_build_script_output = HashSet::new();
+
         // Only at runtime have we discovered what the extra -L and -l
         // arguments are for native libraries, so we process those here. We
         // also need to be sure to add any -L paths for our plugins to the
@@ -359,7 +362,7 @@ fn rustc(
                 )?;
                 add_plugin_deps(&mut rustc, &script_outputs, &build_scripts, &root_output)?;
             }
-            add_custom_flags(&mut rustc, &script_outputs, script_metadata)?;
+            add_custom_flags(&mut rustc, &script_outputs, script_metadata, &mut envs_from_build_script_output)?;
         }
 
         for output in outputs.iter() {
@@ -456,7 +459,7 @@ fn rustc(
                 &cwd,
                 &pkg_root,
                 &target_dir,
-                &rustc,
+                &envs_from_build_script_output,
                 // Do not track source files in the fingerprint for registry dependencies.
                 is_local,
             )
@@ -880,6 +883,7 @@ fn rustdoc(build_runner: &mut BuildRunner<'_, '_>, unit: &Unit) -> CargoResult<W
             &mut rustdoc,
             &build_script_outputs.lock().unwrap(),
             script_metadata,
+            &mut HashSet::new(),
         )?;
 
         // Add the output of scraped examples to the rustdoc command.
@@ -1525,6 +1529,7 @@ fn add_custom_flags(
     cmd: &mut ProcessBuilder,
     build_script_outputs: &BuildScriptOutputs,
     metadata: Option<Metadata>,
+    envs: &mut HashSet<String>,
 ) -> CargoResult<()> {
     if let Some(metadata) = metadata {
         if let Some(output) = build_script_outputs.get(metadata) {
@@ -1536,6 +1541,7 @@ fn add_custom_flags(
             }
             for (name, value) in output.env.iter() {
                 cmd.env(name, value);
+                envs.insert(name.to_owned());
             }
         }
     }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very appreciate your examples. I'm sorry, I'm not clear on some questions

Should also need to filter out CARGO_BIN_<exe>

Does CARGO_BIN_<exe> means that environment vars prefixed with CARGO_BIN except CARGO_BIN_NAME?

I found these enivironments that meet the contions

  • CARGO_BIN_EXE_<name>
  • CARGO_<ARTIFACT-TYPE>_DIR_<DEP>
  • CARGO_<ARTIFACT-TYPE>_FILE_<DEP>_<NAME>

See,

These environments vars has placehold parts, so that we just need to strip them, and it is imporsible to check exhaustiveness of them in rustc invocation. ?

Need an debug_assertion to check exhaustiveness of internal_envs in rustc invocation.

Does it means that check all environment variables in internal_var are present in rust innovations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the original purpose of excluding Cargo internal environment variables from Cargo's dep-info is just they are already tracked by Cargo, so not necessary. (Though we know the statement is not entirely true according to #14154.)

If we choose to use a patch similar to the one I shared in #14701 (comment), it might become a headache for syncing between the hash set and env vars actually applied to rustc invocations. That's why I propose to check the exhaustiveness in debug_assertion. That will help at least when we run cargo's test suite.

Does it means that check all environment variables in internal_var are present in rust innovations?

We'll need to do the other way round — check every env var in rustc_cmd.envs() is an element in internal env var hash set. However, as you've noticed, dynamic ones are hard to track, and may need some more refactors. Yet I think here we focus on environment variables set for crates, not set for running build scripts. Environment variables for artifact-dependency do not seem to belong to this category?


At this stage, I am fine with Arc on gctx.env_config to reduce the clone cost, if the second approach looks too brittle and need more explorations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you explanations. The second approach seems require more preparation and prior knowledge for me.

Besides, I had apply the Arc on gctx.env_config.

@rustbot rustbot added the A-configuration Area: cargo config files and env vars label Oct 22, 2024
Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Only one tiny suggestion.

src/cargo/core/compiler/mod.rs Outdated Show resolved Hide resolved
Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this patch!


"#]])
.with_stderr_data(str![[r#"
[COMPILING] foo v0.5.0 ([ROOT]/foo)
Copy link
Member

@weihanglo weihanglo Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is funny that if you change the env config to ENV_TEST = { value = "from-config", force = true } in this test, this cargo invocation triggers a rebuild, which I expect no.

[DIRTY] foo v0.5.0 ([ROOT]/foo): the environment variable ENV_TEST changed

Before this patch, it coinciddentally didn't trigger a rebuild because [env] was not tracked in depinfo. After this rebuilds because [env] doesn't take into account here when checking the old depinfo:

} else {
gctx.get_env(key).ok()
};

I think we should add a test case covering force = true scenario, and fix it. I have one solution in mind:

  • In GlobalContext::env_config we filter out env vars that are not force = true and already exist in the env snapshot (GlobalContext::get_env). That is, env_config() always return the set of envs we need to apply.
  • Could add a doc comment for env_config() pointing out the returning is pre-filtered against env snapshot, not the literal table value in [env].
  • In addition, because of this change, we could remove this condition in apply_env_config().
  • Also, everything, including the filter of disallowed envs, should be inside the try_borrow_with if possible to avoid non-necessary computation. Maybe we could go further that also resolve values inside it, so subsequent calls don't need to do it.

What do you think?

(I am afraid of missing any more thing. Environment variables in Cargo are a bit messy)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry that I discovered this problem so late 🙇🏾.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else {
gctx.get_env(key).ok()
};

Thank your for pointing out these. Here shoudld also apply the contions to figure out which env change should trigger a rebuild.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the PR so swiftly!

@linyihai linyihai force-pushed the trace-env-table branch 2 times, most recently from d126728 to 5bf7c92 Compare October 24, 2024 09:10
src/cargo/util/context/mod.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/fingerprint/mod.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/fingerprint/mod.rs Outdated Show resolved Hide resolved
tests/testsuite/cargo_env_config.rs Show resolved Hide resolved
Cargo fails to detect environment variable is newly set and rebuild if the environment variable is set in `config.toml`
@weihanglo
Copy link
Member

@bors r+

Thanks for working on this along with us!

Note that this fix has a caveat #14701 (comment). If a user sets cargo internal env like CARGO_MAINFEST_DIR in [env] it will track that env var in Cargo's dep info with the "correct" value, not the value specified in [env]. This is not a feature we advertise and considered as a bug, though it could potentially be used as a workaround for #14154.

@bors
Copy link
Contributor

bors commented Oct 25, 2024

📌 Commit 668d82e has been approved by weihanglo

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 25, 2024
@bors
Copy link
Contributor

bors commented Oct 25, 2024

⌛ Testing commit 668d82e with merge 8315873...

@bors
Copy link
Contributor

bors commented Oct 25, 2024

☀️ Test successful - checks-actions
Approved by: weihanglo
Pushing 8315873 to master...

@bors bors merged commit 8315873 into rust-lang:master Oct 25, 2024
22 checks passed
bors added a commit that referenced this pull request Oct 25, 2024
refactor(env): remove unnecessary clones

### What does this PR try to resolve?

This was found during the review of #14701.

Not every environment variable access requires an owned type.
To be precise, most of our usages don't.

### How should we test and review this PR?

Refactor only. Should not have any change in behavior.

One drawback is that the fn signatures deviate from `std::env::var{_os}`.
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 26, 2024
Update cargo

14 commits in cf53cc54bb593b5ec3dc2be4b1702f50c36d24d5..e75214ea4936d2f2c909a71a1237042cc0e14b07
2024-10-18 13:56:15 +0000 to 2024-10-25 16:34:32 +0000
- refactor(env): remove unnecessary clones (rust-lang/cargo#14730)
- test(install): Verify 2024 edition / resolver=3 doesn't affect resolution (rust-lang/cargo#14724)
- Fix: trace `config` `[env]` table in dep-info. (rust-lang/cargo#14701)
- Added unstable-schema generation for Cargo.toml (rust-lang/cargo#14683)
- fix: add source replacement info when no matching package found (rust-lang/cargo#14715)
- feat(complete): Include descriptions in zsh (rust-lang/cargo#14726)
- refactor(fingerprint): avoid unnecessary fopen calls (rust-lang/cargo#14728)
- docs(resolver): Make room for v3 resolver (rust-lang/cargo#14725)
- test: add fixes in the sat resolver (rust-lang/cargo#14707)
- docs(ci): Don't constrainty latest_deps job by MSRV (rust-lang/cargo#14711)
- refactor: use `Iterator::is_sorted` (rust-lang/cargo#14702)
- refactor(rustfix): minor refactors (rust-lang/cargo#14710)
- chore(deps): update msrv (rust-lang/cargo#14705)
- fix(renovate): Switch matchPackageNames to matchDepNames (rust-lang/cargo#14704)
@rustbot rustbot added this to the 1.84.0 milestone Oct 26, 2024
@linyihai linyihai deleted the trace-env-table branch November 9, 2024 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-execution Area: anything dealing with executing the compiler A-configuration Area: cargo config files and env vars A-rebuild-detection Area: rebuild detection and fingerprinting S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
5 participants