Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command for introspecting on the index #11034

Open
epage opened this issue Aug 30, 2022 · 15 comments
Open

Command for introspecting on the index #11034

epage opened this issue Aug 30, 2022 · 15 comments
Labels
A-new-subcommand Area: new subcommand C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage.

Comments

@epage
Copy link
Contributor

epage commented Aug 30, 2022

Problem

Cargo does not prioritize rust APIs but instead CLI plumbing, like cargo-metadata, for APIs to be built, like cargo_metadata.

Currently, there is no CLI plumbing for the index and the rust API hole is being fulfilled by crates-index which has its own set of issues (not fully implementing cargos logic, not implementing auth, no cli fallback, etc).

Soon, there will be sparse registries. While crates.io might continue to support git, alternative registries might not.

Proposed Solution

Create a command that provides a way to query the index

Strawman sketch

$ cargo index-api --update
... updates the index for git
$ cargo index-api --crates
... lists crate names
$ cargo index-api --crate serde
... lists versions of the crate
$ cargo index-api --crate serde --yanked
... lists versions of the crate
$ cargo index-api --crate serde@=1.2.3
... lists metadata
$ cargo index-api --crate serde@=1.2.3 --download
... path to extracted crate

The main challenge will be an API that works well for both git and sparse registries as they have different network connection models.

Notes

No response

@epage epage added A-new-subcommand Area: new subcommand C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` labels Aug 30, 2022
@epage
Copy link
Contributor Author

epage commented Aug 30, 2022

@kornelski thought it'd be good to get your input on this

@Eh2406
Copy link
Contributor

Eh2406 commented Aug 31, 2022

"lists crate names" is one of the most commonly requested API (used for auto complete in cargo.toml), unfortunately the first release of sparse registries is unlikely to have a way to list all crate names. Similarly, the perfectly reasonable action "update the index" does not have clear semantics under the first version of sparse registries.

@epage
Copy link
Contributor Author

epage commented Aug 31, 2022

Similarly, the perfectly reasonable action "update the index" does not have clear semantics under the first version of sparse registries.

This is part of the challenge we'll have to navigate between the two types of registries and the fact that the API is stateless (can't know about prior runs).

For a git registry, we need an explicit update operation so that future commands can be run in offline mode or else the command's could be too slow.

For a sparse registry, the update operation is unneeded but each of the following commands needs to run without offline mode so they can update the registry as needed.

@kornelski
Copy link
Contributor

The CLI api could have update that takes a list of packages to update. That would be compatible with both protocols.

cargo index-api --update "serde,[email protected],tokio"
cargo index-api --crate serde

This could return the entire JSON file from the index, which includes all information for all the versions. This way you wouldn't need granular commands for working with individual versions, selecting if they're yanked or not, etc.

@kornelski
Copy link
Contributor

kornelski commented Sep 2, 2022

I am concerned about programmatic API. For native Rust programs a CLI interface is cumbersome and slow when querying many crates.

However, cargo as a whole is a pretty heavy dependency, without a 1.x stable API, and has a high MSRV. So I think there's still a need for a smaller crate focused on working just with the index. This could be the crates-index crate, or perhaps index-related code could be extracted from cargo itself into another crate.

@epage
Copy link
Contributor Author

epage commented Sep 2, 2022

The CLI api could have update that takes a list of packages to update. That would be compatible with both protocols.

Mostly and is a good idea. The one case I can think of that this would still fall flat for is if you were to implement something like the resolver where you need to get the information from one crate to then know what further crates to look up.

I am concerned about programmatic API. For native Rust programs a CLI interface is cumbersome and slow when querying many crates.

We are limping along with cargo-metadata.

How frequent is it that a caller is doing a lot of queries? I'm not familiar enough with all of the use cases to really say.

So I think there's still a need for a smaller crate focused on working just with the index. This could be the crates-index crate, or perhaps index-related code could be extracted from cargo itself into another crate.

I would consider the independent crate is a successful fairly. Successful in that it is working enough to show the need but otherwise it is missing a lot of functionality and the maintainers have been passing the buck on resolving some of these (authentication).

As for cargo splitting out a crate, a couple of things that we might run into

  • We don't have a workspace because of the rust-lang workspace which makes scaling up less than ideal. We will need to crate a new nested workspace feature for this
  • Our focus is on regularly breaking changes and don't have processes setup to identifier when not to or not, especially as we scale up to more crates. My hope is cargo-semver-checks will help with this
  • Some times the internal needs are different enough from external needs that it becomes a problem (this is why there isn't more sharing with cargo_metadata). This is less likely to be the case here but still a caution
  • Generally applicable APIs take on more of a maintenance burden for the cargo team which is already low on resources. If others could step in to do community management for the API and extend it as needed for the wider community, that would be a help

@kornelski
Copy link
Contributor

Sparse index does a greedy fetch, so if you ask it to update tokio, it will update recursively tokio-macros, mio, parking_lot, etc. For this interface it would have to assume all features and all targets.

How frequent is it that a caller is doing a lot of queries?

For example tools may want to scan every dependency you have in your project, and if that's a recursive scan, that can be hundreds of queries.

@epage
Copy link
Contributor Author

epage commented Sep 2, 2022

Sparse index does a greedy fetch, so if you ask it to update tokio, it will update recursively tokio-macros, mio, parking_lot, etc. For this interface it would have to assume all features and all targets.

Are you saying it does today or that you propose the command should do a recursive fetch?

In the cases where I've been using crates_index, I've not needed a recursive fetch.

@Eh2406
Copy link
Contributor

Eh2406 commented Sep 2, 2022

Sparse index does a greedy fetch, ... For this interface it would have to assume all features and all targets.

I don't know if this is actually relevant to this discussion, but the current implementation of sparse indexes does not have to do a greedy fetch. It ends up doing waves. It resolves based on everything it currently knows about, using a stub dependency for things not yet fetched. (This lets it skip unused features.) It then looks at the result and requests all of the packages that have been stubbed out, and redoes resolution. Of course the current implementation uses internal APIs. If you wanted to reimplement it over this CLI, it could be done with "only" the depth of the dependency tree calls.

Thinking about building an outside tool to do resolution based on this CLI, it would be really helpful if it didn't just take registries but also sources. I.E. if I have a git dependency I may also want to be able to find out what version is there, what its dependencies are, and find out where it was extracted.

@kornelski
Copy link
Contributor

If you wanted to reimplement something like cargo tree, you'd need a recursive scan. cargo outdated can show status of all dependencies recursively too. If you want to check if any dependencies are yanked, you need to check them all. cargo crev also checks all the dependencies recursively (currently it uses cargo's Rust API).

@Eh2406 It's nice that it's integrated with the resolver! When I wrote the RFC I assumed this would be difficult, so I suggested greedy fetch approach that first fetches everything without resolving exact versions, so that the precise resolver can later run offline, and safely assume it already has all files it needs.

Anyway, even if the implementation proper is smart enough to be conservative, the tooling-oriented CLI can still be fetching all versions with all features for all platforms. I'd just provide a separate switch for updating with dev deps. It will be some extra overhead, but I don't expect too much, especially that these crates are cached, so it'll be mostly one-time cost of a few usual suspects like winapi and redox_syscall. In any case it will be less than the full git index :)

@kornelski
Copy link
Contributor

What would be use-cases for listing all crates? I need it in https://lib.rs, but I also need all the crate metadata, so for this the git index works better.

cargo add needs to normalize _ vs -, but it could just make extra requests (and give up on crate names written in morse code), or it could use the cargo search API.

Would it be useful to list all the crates currently in the cache?

@epage
Copy link
Contributor Author

epage commented Sep 15, 2022

What would be use-cases for listing all crates?

Listing all crates can be useful for #10655

cargo add needs to normalize _ vs -, but it could just make extra requests (and give up on crate names written in morse code), or it could use the cargo search API.

I believe cargo has everything it needs for cargo add to fix any normalization issues.

@kornelski
Copy link
Contributor

kornelski commented Sep 15, 2022

As noted in the issue I don't think typosquatting can be handled client-side. You can detect similar names, but the index lacks the information required to decide which of the similar names is the right one, and without that you may end up hurting users by recommending squatted crates when they correctly add good ones.

@Nemo157
Copy link
Member

Nemo157 commented Jan 27, 2023

I have a usecase which doesn't need recursive querying and would prefer to have -_ normalization handled automatically (especially because IIRC that's a per-registry choice? so having one place that knows crates.io allows normalization would be better): cargo-dl. If there's appetite for even more public plumbing commands; that cli also currently attempts to pull data out of cargo's download cache, if there was some way to ask cargo to download a crate into the cache and provide the path/tarball then I could leave all networking up to cargo and just have a trivial wrapper to automate extracting that for inspection.

@epage
Copy link
Contributor Author

epage commented Jan 27, 2023

@Nemo157 that command might be of interest to #1861.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-new-subcommand Area: new subcommand C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage.
Projects
None yet
Development

No branches or pull requests

4 participants