Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make more cargo-as-a-library functions pub #10414

Closed
wants to merge 3 commits into from

Conversation

jonhoo
Copy link
Contributor

@jonhoo jonhoo commented Feb 22, 2022

What does this PR try to resolve?

Crates that want to consume Cargo as a library and implement their own functionality on top of it will often want to do some of what Cargo does, but not all of it. Often, this requires copy-pasting a bunch of code from Cargo since the "high level" functions do too much, and the "mid level" functions aren't pub. This PR marks a few more mid-level Cargo operations pub.

Additional information

The things I've made pub here are all functions that I currently have copies of in a (sadly private) code base that wraps Cargo for certain operations. The primary reason for many of them is that I'm working in a sandboxed environment where Cargo's download_accessible will not work. Instead, I construct a local registry that is populated by a binary that uses Cargo as a library by walking Resolves and downloading each indicated crate version through a specially-allowed ingest mechanism.

@rust-highfive
Copy link

r? @ehuss

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 22, 2022
Jon Gjengset added 3 commits February 22, 2022 15:13
This allows tools that consume Cargo as a library to locate a `std`
Workspace without _also_ having to resolve it and possibly download all
of its dependencies (as `resolve_ws_with_opts` does).
Many of these functions were already `pub`, but weren't re-exported by a
`pub` module.

These functions help users who consume Cargo as a library if they want
to implement variants of `cargo install` themselves.
Previously, users consuming Cargo as a library had to either use the
very low-level `resolve_with_previous`, or the "do everything"
`resolve_ws_with_opts`, which includes downloading the source tarballs
of most crates in the resolved dependency closure. This patch exposes a
version of `resolve_ws_with_opts` that does everything _up to_ anything
that requires a source tarball download, as well as `resolve_registry`,
which is a building block of other resolver functions that may be of
use.
@ehuss
Copy link
Contributor

ehuss commented Mar 8, 2022

These changes seem to be a little more intrusive than I would like to include. I'm fine with adding the occasional pub, but these are some really low-level parts that are getting exposed. Perhaps you can say more about what the scenario is where cargo is sandboxed in such a way that it can't download or can't use vendored packages, or can't access the data from the CLI (such as cargo metadata)?

@jonhoo
Copy link
Contributor Author

jonhoo commented Mar 8, 2022

Sure! So, the build setting here is one where there is no network access for any part of the build pipeline, except for a particular service that allows fetching allow-listed artifacts. For a given build, the build system has access to a Rust toolchain, the source code of the crate to build, a handle to the artifact service, and a copy of the crates.io index metadata (basically https://github.com/rust-lang/crates.io-index). To build, the build system generates a .cargo/config.toml pointing Cargo at a local registry that initially holds just the index and no .crate files. It then uses Cargo as a library to produce a Resolve for the crate being built, and walks that Resolve to find all the .crate files it should download through the artifact service into the local registry in the appropriate location. Once that's done, the build system just delegates to cargo build, which shouldn't need to download anything (and would fail if it tried).

So, for why it can't just download, it's prevented by the network sandbox, which is there for security isolation. We don't want builds to be able to fetch arbitrary things from the internet, and we also don't want builds to be able to send information out either.

As for why not vendor, it's not clear who would do the vendoring. It would require that developers working on Rust crates would always need to vendor their dependencies into the git repository for their crates from, say, their developer environments, which is not only extra work, but also tends to lead to those dependencies to grow stale over time as developers won't think to re-vendor periodically. The current solution is effectively invisible to developers, yet gives both sandboxing and avoids the need for developers to think about vendoring or be bound by its downsides.

I'm not entirely sure how cargo metadata would help here? From what I've seen of the code, I think cargo metadata without --no-deps will also try to go download all the .crate files it needs, which won't work due to the network sandbox. And with --no-deps it won't give the information necessary to fetch all the .crate files.

Where build-std makes this more complicated is that the Resolve for a crate isn't enough on its own, since we also need to fetch the .crate files needed for std. Admittedly I don't actually need the Workspace for std, so I suppose I could change the PR to just expose the Resolve if you'd prefer that, but that seems less generalizeable?

@jonhoo
Copy link
Contributor Author

jonhoo commented Mar 9, 2022

As a separate observation, I think the standard_lib function only takes ws: &Workspace to get the &Config here:

let config = ws.config();

What do you think of changing it to just take &Config directly?

@epage
Copy link
Contributor

epage commented Mar 9, 2022

@jonhoo would a cargo subcommand to ensure all of the crates have been downloaded and then doing all other operations with --offline get what you are wanting?

I have several use cases revolving around the index and accessing crate source. So far I've been using a mix of cargo_metadata and crates-index but the latter only covers parts of cargo's functionality and frequently I get bug reports because of that. I'd like to use cargo's registry code but as part of the whole "call a cargo command instead of linking against cargo" trend, I've been considering a cargo subcommand that let's you query the index. For when I need crate source, I could then have a flag to ensure the crate is downloaded and provide the path to it.

I could see this having a --manifest-path flag that would resolve the dependencies of the selected manifest and use that for the query. It seems like at this point, it would do what you are looking for unless I've missed something about your use case.

@epage
Copy link
Contributor

epage commented Mar 9, 2022

I think the answer to my question is tied into this

I'm not entirely sure how cargo metadata would help here? From what I've seen of the code, I think cargo metadata without --no-deps will also try to go download all the .crate files it needs, which won't work due to the network sandbox. And with --no-deps it won't give the information necessary to fetch all the .crate files.

Wouldn't you run cargo metadata to download the files outside of the sandbox so you can then use them in the sandbox? I feel there is a part of the requirements here I'm not understanding.

Similarly, cargo vendor seems like it could be run outside of the sandbox and then make the results available inside the sandbox.

@jonhoo
Copy link
Contributor Author

jonhoo commented Mar 9, 2022

That's basically what I've been writing — a Cargo subcommand that computes a Resolve, walks it, fetches it using the "special" way to download that's supported internally, and then everything else just delegates to Cargo proper. I don't even really need --offline since network access would just fail if Cargo tried to anyway.

The challenge lies in implementing that subcommand, since Cargo doesn't currently expose the necessary bits in its library to just resolve (hence this PR). If there was a Cargo subcommand that just printed out the entire Resolve, I could use that instead, sure, but I don't know if that's better than exposing it as library bits?

Wouldn't you run cargo metadata to download the files outside of the sandbox so you can then use them in the sandbox? I feel there is a part of the requirements here I'm not understanding.

Nothing gets to run outside of the sandbox is the simple answer. The slightly more convoluted answer is that there is a separate service that downloads basically every .crate from crates.io and sticks it in the special service that's available inside the sandbox, but that service does not build any source code — it just downloads/uploads. So there's nowhere that both has access to the network and gets to build untrusted code/even run Cargo.

@epage
Copy link
Contributor

epage commented Mar 9, 2022

So if I'm understanding correctly, the big issue is that you have a custom index / artifact repository that is available to the sandbox that you need to implement fetching support for because cargo doesn't understand how to. Its in this code that you are resolving and doing all of the other steps.

Whats preventing creating a registry facade around this artifact repository so you just take advantage of all of cargo's existing functionality?

@jonhoo
Copy link
Contributor Author

jonhoo commented Mar 9, 2022

It's not impossible to do so, but it's a real mess since Cargo requires "real" registries to be git-based. It'd be a whole lot easier (and likely what I'll end up doing) once we get HTTP registry support :)

@epage
Copy link
Contributor

epage commented Mar 9, 2022

Thanks for taking the time to help me understand where you are coming from with this.

So if I understand it, you are facing the trade off between

  • The mess to make the git registry proxy
  • Keep fighting cargo's internals, including what is available for reuse, to implement your own resolve + fetch
  • Waiting on / helping with the HTTP registry and then implementing an HTTP proxy (probably too far out there for you to be waiting on)

@jonhoo
Copy link
Contributor Author

jonhoo commented Mar 9, 2022

Exactly right. And 1) I've already discarded as being too much of a hassle to be worth it especially since it'll be thrown away once we get HTTP registries, 2) is actually not that bad modulo the fact that I have to copy-paste a small number of Cargo functions that aren't exposed as pub, and 3) I've already been helping with (#8890) and will probably pick back up once #10064 lands :)

@bors
Copy link
Contributor

bors commented May 4, 2022

☔ The latest upstream changes (presumably #10129) made this pull request unmergeable. Please resolve the merge conflicts.

@weihanglo weihanglo added the A-cargo-api Area: cargo-the-library API and internal code issues label Apr 25, 2023
@epage
Copy link
Contributor

epage commented Mar 6, 2024

Closing due to inactivity. If there are requests for exposing more of cargo's API, feel free to create an issue.

@epage epage closed this Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cargo-api Area: cargo-the-library API and internal code issues S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants