Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safe.directories config #10736

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

ehuss
Copy link
Contributor

@ehuss ehuss commented Jun 8, 2022

This is an implementation of RFC 3279.

This only uses the CARGO_UNSTABLE_SAFE_DIRECTORIES=true environment variable instead of a -Z flag because -Z flags are loaded after the config files have been loaded.
I didn't feel comfortable trying to allow that to work (config is loaded early to handle aliases).

The config code needing some changing around so that the checks could be placed in one location.
There are two uses here. The normal config loading, and the config loading for the cargo config command.
This also changed because it needs to load the home directory config first instead of last, in order to fetch the safe.directories settings.

@rust-highfive
Copy link

@ehuss: no appropriate reviewer found, use r? to override

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 8, 2022
Comment on lines +1690 to +1691
// FIXME(ehuss): Loading and parsing manifests just to find the root seems
// very inefficient. I think this should be reconsidered.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance of this was tested before merging in this PR. There is some hit to the performance but not enough that it was a concern at the time. There was hope that any concerns about performance would be found by people testing the feature but none have come up so far.

If it is something that needs to be worked on the easiest fix would be to record workspace root paths once they have been found. Then each time a workspace root is needed you check if an already existing workspace root is the correct one for the current manifest. The idea is similar to what I described in this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a quick test of using workspace inheritance in the rust workspace. It increased the startup time from 358.8 ms to 449.1 ms. That's a pretty substantial increase. I didn't do any profiling to see if this particular thing is the culprit, but I suspect it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind sharing what you did for the test, I would like to take a look at it. That is much higher than I have seen but I think that is because of the nesting in the rust repo.

I think caching paths is the best way to eliminate most of that but I don't know for certain. I will take a look at it and see what I can do. I don't want to clutter this PR so if a separate issue is better for further discussion I would be happy to open one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Extract this file in benches/workspaces
    rust2.zip

    This is just a quick hack that adds inheritance to a bunch of packages, but I didn't really put much thought into setting it up.

  2. Extract rust.tgz in that directory, too.

  3. Prime the target directory with a cache of rustc info. In rust and rust2, run: cargo c -p linkchecker. Otherwise it would be measuring rustc overhead.

  4. Test out some command that spends most of its time loading the workspace. There aren't very many good candidates, but cargo metadata or cargo tree can work. I use hyperfine for quick measurements. For example:

    hyperfine "cd rust; $(rustup which cargo) metadata" "cd rust2; $(rustup which cargo) metadata"

We really should add workspace initialization benchmarks. If you want to work on that, that would be wonderful. I think that would just require adding a new benchmark to benches/benchsuite/benches that essentially just calls cargo::core::Workspace::new. Might have to factor out the common logic from resolve.rs that sets up the workspaces.

ehuss added 9 commits June 9, 2022 17:04
This centralizes the workspace discovery code into one location
`find_workspace_root_with_loader` so that it isn't duplicated.
This reworks the walk_tree so that the home config can be loaded first
in order to check safe_directories.
This adds a function that will convert an OwnershipError to a message
explaining to the user what happened and how to fix it. This will be
shared by both the config validation and the manifest validation.
This loads safe.directories from the home config, and then checks
the ownership of all other directories.
This adds a check for file ownership when locating Cargo.toml files.
Comment on lines +1116 to +1118
// Note: Unlike other Cargo environment variables, this does not
// assume paths relative to the current directory (only absolute paths
// are supported). This is intended as an extra layer of safety.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I didn't see this in the RFC
  • For my use of git's equivalent, I did use a . though I did it through -c rather than an env variable
  • We are only enforcing this contract by the fact that the a . won't match anything, correct? Should we more explicitly tell the user, either via a warning or error, that their safe directory setting will do nothing?

* Tracking Issue: TODO
* RFC: [#3279](https://github.com/rust-lang/rfcs/pull/3279)

The `CARGO_UNSTABLE_SAFE_DIRECTORIES=true` environment variable enables a mode where Cargo will check the ownership of `Cargo.toml` and `config.toml` files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this is merged, should we do a Call for Testing in TWiR?

@bors
Copy link
Contributor

bors commented Jun 17, 2022

☔ The latest upstream changes (presumably #10764) made this pull request unmergeable. Please resolve the merge conflicts.

bors added a commit that referenced this pull request Jun 18, 2022
Add a benchmark for workspace initialization

It [was suggested](#10736 (comment)) that a benchmark for workspace initialization should be added. This was suggested because there were issues with the performance of [workspace inheritance](#10747) as well as a general way to track the workspace initialization time across cargo changes

### Changes
- Moved common functions out of `resolve.rs` to a shared `lib.rs`
- Added a new struct to be used when creating a new benchmark
  -  This was done because `env!("CARGO_TARGET_TMPDIR")` would fail to compile when put inside of the new `lib.rs`
- Added a new workspace test for workspace inheritance
  - This new workspace does not have a repo that it was built from and if one needs to be made I can change that
bors added a commit that referenced this pull request Jul 5, 2022
add a cache for discovered workspace roots

## History
`@ehuss` [noticed that](#10736 (comment)) workspace inheritance caused a significant increase in startup times when using workspace inheritance. This brought up the creation of #10747.

When using a similar test setup [to the original](#10736 (comment)) I got
```
Benchmark 1: cd rust; ../../../target/release/cargo metadata
  Time (mean ± σ):     149.4 ms ±   3.8 ms    [User: 105.9 ms, System: 31.7 ms]
  Range (min … max):   144.2 ms … 162.2 ms    19 runs

Benchmark 2: cd rust-ws-inherit; ../../../target/release/cargo metadata
  Time (mean ± σ):     191.6 ms ±   1.4 ms    [User: 145.9 ms, System: 34.2 ms]
  Range (min … max):   188.8 ms … 193.9 ms    15 runs
```

This showed a large increase in time per cargo command when using workspace inheritance.

During the investigation of this issue, other [performance concerns were found and addressed](#10761). This resulted in a drop in time across the board but heavily favored workspace inheritance.
```
Benchmark 1: cd rust; ../../../target/release/cargo metadata
  Time (mean ± σ):     139.3 ms ±   1.7 ms    [User: 99.8 ms, System: 29.4 ms]
  Range (min … max):   137.1 ms … 144.5 ms    20 runs

Benchmark 2: cd rust-ws-inherit; ../../../target/release/cargo metadata
  Time (mean ± σ):     161.7 ms ±   1.4 ms    [User: 120.4 ms, System: 31.2 ms]
  Range (min … max):   158.0 ms … 164.6 ms    18 runs
```

## Performance after changes
`hyperfine --warmup 10 "cd rust; ../../../target/release/cargo metadata" "cd rust-ws-inherit; ../../../target/release/cargo metadata" --runs 40`
```
Benchmark 1: cd rust; ../../../target/release/cargo metadata
  Time (mean ± σ):     140.1 ms ±   1.5 ms    [User: 99.5 ms, System: 30.7 ms]
  Range (min … max):   137.4 ms … 144.0 ms    40 runs

Benchmark 2: cd rust-ws-inherit; ../../../target/release/cargo metadata
  Time (mean ± σ):     141.8 ms ±   1.6 ms    [User: 100.9 ms, System: 30.9 ms]
  Range (min … max):   138.4 ms … 145.4 ms    40 runs
```

[New Benchmark](#10754)
`cargo bench -- workspace_initialization/rust`
```
workspace_initialization/rust
    time:   [14.779 ms 14.880 ms 14.997 ms]
workspace_initialization/rust-ws-inherit
    time:   [16.235 ms 16.293 ms 16.359 ms]
```

## Changes Made
- [Pulled a commit](bbd41a4) from `@ehuss` that deduplicated finding a workspace root to make the changes easier
- Added a cache in `Config` to hold found `WorkspaceRootConfig`s
  - This makes it so manifests should only be parsed once
- Made `WorkspaceRootConfig` get added to the cache when parsing a manifest

## Testing Steps
To check the new benchmark:
1. `cd benches/benchsuite`
2. `cargo bench -- workspace_initialization/rust`

Using `hyperfine`:
1. run `cargo build --release`
2. extract `rust` and `rust-ws-inherit` in `benches/workspaces`
3. cd `benches/workspaces`
4. Prime the target directory with a cache of `rustc` info. In `rust` and `rust-ws-inherit`, run: `cargo +nightly c -p linkchecker`. Otherwise it would be measuring `rustc` overhead.
4. run `hyperfine --warmup 10 "cd rust; ../../../target/release/cargo metadata" "cd rust-ws-inherit; ../../../target/release/cargo metadata" --runs 40`

closes #10747
Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch is really comprehensive and covers the current RFC well. Thank you!

One thing I noticed: If both .cargo/config and .cargo/config.toml exist, the current implementation seems check .cargo/config only for backward compatible reason. This behaviour is not new for this PR and looks good to me, though not written in the RFC. OTOH, the RFC does state that rustup would check rust-toolchain and rust-toolchain.toml if both exist. Is this deviation intentional?

Edited: I was wrong. The current PR against rustup doesn't check both of those files.

config: &Config,
) -> anyhow::Error {
let to_approve = if std::env::var_os("RUSTUP_TOOLCHAIN").is_some() {
// FIXME: shell_escape doesn't handle powershell (and possibly other shells)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be overthinking this. Will there be a case that a malicious attacker creates a path that fails to escape in a specific shell, and then a user runs that without scrutiny?

Comment on lines 1015 to +1021
let mut result = Vec::new();
let mut seen = HashSet::new();
let home = self.home_path.clone().into_path_unlocked();
self.walk_tree(&self.cwd, &home, |path| {
let mut cv = self._load_file(path, &mut seen, false)?;
for mut cv in cvs {
if self.cli_unstable().config_include {
self.load_unmerged_include(&mut cv, &mut seen, &mut result)?;
}
result.push(cv);
Ok(())
})
.with_context(|| "could not load Cargo configuration")?;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) could we avoid unnecessary vec allocation here, as config-include is still unstable?

Suggested change
let mut result = Vec::new();
let mut seen = HashSet::new();
let home = self.home_path.clone().into_path_unlocked();
self.walk_tree(&self.cwd, &home, |path| {
let mut cv = self._load_file(path, &mut seen, false)?;
for mut cv in cvs {
if self.cli_unstable().config_include {
self.load_unmerged_include(&mut cv, &mut seen, &mut result)?;
}
result.push(cv);
Ok(())
})
.with_context(|| "could not load Cargo configuration")?;
}
if self.cli_unstable().config_include {
let mut result = Vec::new();
for mut cv in cvs {
self.load_unmerged_include(&mut cv, &mut seen, &mut result)?;
result.push(cv);
}
Ok(result)
} else {
Ok(cvs)
}

}
}
// This is used for testing to simulate a failure.
let simulate = match env::var_os("__CARGO_TEST_OWNERSHIP") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to ship this in production? I was thinking that maybe putting this behind debug_assertion.

fn get_username(uid: u32) -> String {
unsafe {
let amt = match libc::sysconf(libc::_SC_GETPW_R_SIZE_MAX) {
n if n < 0 => 512 as usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry a stupid question. Where does 512 come from?

let result = GetNamedSecurityInfoW(
path_w.as_ptr(),
SE_FILE_OBJECT,
OWNER_SECURITY_INFORMATION | DACL_SECURITY_INFORMATION,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need DACL_SECURITY_INFORMATION here?

}
}

#[cfg(windows)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ehuss for building this up.

To whom that is not familiar with winapi like me, here are some important links to API used in this PR.

Each API handles errors and allocations a bit differently. For me, I usually heed these things:

  • Do we need to call a special finalizer for a certain value, e.g. CloseHandle and LocalFree?
  • Does the API return an error or do we need to retrieve from GetLastError?
  • Mind the encoding difference. UTF-16 is prevalent on Windows.

Just my personal note. Hope it helps other reviewers.

.file("src/lib.rs", "")
.file("myconfig.toml", "build.jobs = 1")
.build();
p.cargo("check --config myconfig.toml -Zunstable-options")
Copy link
Member

@weihanglo weihanglo Jul 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--config is about to stabilize in 1.63.0

(and ditto throughout)

Suggested change
p.cargo("check --config myconfig.toml -Zunstable-options")
p.cargo("check --config myconfig.toml")

@weihanglo weihanglo added S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 15, 2022
@weihanglo weihanglo added S-blocked and removed S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. labels Dec 20, 2022
@weihanglo weihanglo added S-blocked-external Status: ❌ blocked on something out of the direct control of the Cargo project, e.g., upstream fix and removed S-blocked labels Apr 18, 2023
@weihanglo weihanglo added A-configuration Area: cargo config files and env vars A-rustup Area: rustup interaction labels May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-configuration Area: cargo config files and env vars A-rustup Area: rustup interaction S-blocked-external Status: ❌ blocked on something out of the direct control of the Cargo project, e.g., upstream fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants