Command for introspecting on the index #11034

epage · 2022-08-30T21:58:38Z

Problem

Cargo does not prioritize rust APIs but instead CLI plumbing, like cargo-metadata, for APIs to be built, like cargo_metadata.

Currently, there is no CLI plumbing for the index and the rust API hole is being fulfilled by crates-index which has its own set of issues (not fully implementing cargos logic, not implementing auth, no cli fallback, etc).

Soon, there will be sparse registries. While crates.io might continue to support git, alternative registries might not.

Proposed Solution

Create a command that provides a way to query the index

Strawman sketch

$ cargo index-api --update
... updates the index for git
$ cargo index-api --crates
... lists crate names
$ cargo index-api --crate serde
... lists versions of the crate
$ cargo index-api --crate serde --yanked
... lists versions of the crate
$ cargo index-api --crate serde@=1.2.3
... lists metadata
$ cargo index-api --crate serde@=1.2.3 --download
... path to extracted crate

The main challenge will be an API that works well for both git and sparse registries as they have different network connection models.

Notes

No response

epage · 2022-08-30T21:59:07Z

@kornelski thought it'd be good to get your input on this

Eh2406 · 2022-08-31T02:05:58Z

"lists crate names" is one of the most commonly requested API (used for auto complete in cargo.toml), unfortunately the first release of sparse registries is unlikely to have a way to list all crate names. Similarly, the perfectly reasonable action "update the index" does not have clear semantics under the first version of sparse registries.

epage · 2022-08-31T11:44:57Z

Similarly, the perfectly reasonable action "update the index" does not have clear semantics under the first version of sparse registries.

This is part of the challenge we'll have to navigate between the two types of registries and the fact that the API is stateless (can't know about prior runs).

For a git registry, we need an explicit update operation so that future commands can be run in offline mode or else the command's could be too slow.

For a sparse registry, the update operation is unneeded but each of the following commands needs to run without offline mode so they can update the registry as needed.

kornelski · 2022-09-02T12:00:20Z

The CLI api could have update that takes a list of packages to update. That would be compatible with both protocols.

cargo index-api --update "serde,[email protected],tokio"

cargo index-api --crate serde

This could return the entire JSON file from the index, which includes all information for all the versions. This way you wouldn't need granular commands for working with individual versions, selecting if they're yanked or not, etc.

kornelski · 2022-09-02T12:03:51Z

I am concerned about programmatic API. For native Rust programs a CLI interface is cumbersome and slow when querying many crates.

However, cargo as a whole is a pretty heavy dependency, without a 1.x stable API, and has a high MSRV. So I think there's still a need for a smaller crate focused on working just with the index. This could be the crates-index crate, or perhaps index-related code could be extracted from cargo itself into another crate.

epage · 2022-09-02T12:19:38Z

The CLI api could have update that takes a list of packages to update. That would be compatible with both protocols.

Mostly and is a good idea. The one case I can think of that this would still fall flat for is if you were to implement something like the resolver where you need to get the information from one crate to then know what further crates to look up.

I am concerned about programmatic API. For native Rust programs a CLI interface is cumbersome and slow when querying many crates.

We are limping along with cargo-metadata.

How frequent is it that a caller is doing a lot of queries? I'm not familiar enough with all of the use cases to really say.

So I think there's still a need for a smaller crate focused on working just with the index. This could be the crates-index crate, or perhaps index-related code could be extracted from cargo itself into another crate.

I would consider the independent crate is a successful fairly. Successful in that it is working enough to show the need but otherwise it is missing a lot of functionality and the maintainers have been passing the buck on resolving some of these (authentication).

As for cargo splitting out a crate, a couple of things that we might run into

We don't have a workspace because of the rust-lang workspace which makes scaling up less than ideal. We will need to crate a new nested workspace feature for this
Our focus is on regularly breaking changes and don't have processes setup to identifier when not to or not, especially as we scale up to more crates. My hope is cargo-semver-checks will help with this
Some times the internal needs are different enough from external needs that it becomes a problem (this is why there isn't more sharing with cargo_metadata). This is less likely to be the case here but still a caution
Generally applicable APIs take on more of a maintenance burden for the cargo team which is already low on resources. If others could step in to do community management for the API and extend it as needed for the wider community, that would be a help

kornelski · 2022-09-02T13:30:35Z

Sparse index does a greedy fetch, so if you ask it to update tokio, it will update recursively tokio-macros, mio, parking_lot, etc. For this interface it would have to assume all features and all targets.

How frequent is it that a caller is doing a lot of queries?

For example tools may want to scan every dependency you have in your project, and if that's a recursive scan, that can be hundreds of queries.

epage · 2022-09-02T14:41:48Z

Sparse index does a greedy fetch, so if you ask it to update tokio, it will update recursively tokio-macros, mio, parking_lot, etc. For this interface it would have to assume all features and all targets.

Are you saying it does today or that you propose the command should do a recursive fetch?

In the cases where I've been using crates_index, I've not needed a recursive fetch.

Eh2406 · 2022-09-02T14:42:12Z

Sparse index does a greedy fetch, ... For this interface it would have to assume all features and all targets.

I don't know if this is actually relevant to this discussion, but the current implementation of sparse indexes does not have to do a greedy fetch. It ends up doing waves. It resolves based on everything it currently knows about, using a stub dependency for things not yet fetched. (This lets it skip unused features.) It then looks at the result and requests all of the packages that have been stubbed out, and redoes resolution. Of course the current implementation uses internal APIs. If you wanted to reimplement it over this CLI, it could be done with "only" the depth of the dependency tree calls.

Thinking about building an outside tool to do resolution based on this CLI, it would be really helpful if it didn't just take registries but also sources. I.E. if I have a git dependency I may also want to be able to find out what version is there, what its dependencies are, and find out where it was extracted.

kornelski · 2022-09-02T21:29:16Z

If you wanted to reimplement something like cargo tree, you'd need a recursive scan. cargo outdated can show status of all dependencies recursively too. If you want to check if any dependencies are yanked, you need to check them all. cargo crev also checks all the dependencies recursively (currently it uses cargo's Rust API).

@Eh2406 It's nice that it's integrated with the resolver! When I wrote the RFC I assumed this would be difficult, so I suggested greedy fetch approach that first fetches everything without resolving exact versions, so that the precise resolver can later run offline, and safely assume it already has all files it needs.

Anyway, even if the implementation proper is smart enough to be conservative, the tooling-oriented CLI can still be fetching all versions with all features for all platforms. I'd just provide a separate switch for updating with dev deps. It will be some extra overhead, but I don't expect too much, especially that these crates are cached, so it'll be mostly one-time cost of a few usual suspects like winapi and redox_syscall. In any case it will be less than the full git index :)

kornelski · 2022-09-02T21:34:58Z

What would be use-cases for listing all crates? I need it in https://lib.rs, but I also need all the crate metadata, so for this the git index works better.

cargo add needs to normalize _ vs -, but it could just make extra requests (and give up on crate names written in morse code), or it could use the cargo search API.

Would it be useful to list all the crates currently in the cache?

epage · 2022-09-15T14:26:42Z

What would be use-cases for listing all crates?

Listing all crates can be useful for #10655

cargo add needs to normalize _ vs -, but it could just make extra requests (and give up on crate names written in morse code), or it could use the cargo search API.

I believe cargo has everything it needs for cargo add to fix any normalization issues.

kornelski · 2022-09-15T15:36:17Z

As noted in the issue I don't think typosquatting can be handled client-side. You can detect similar names, but the index lacks the information required to decide which of the similar names is the right one, and without that you may end up hurting users by recommending squatted crates when they correctly add good ones.

Nemo157 · 2023-01-27T22:57:27Z

I have a usecase which doesn't need recursive querying and would prefer to have -_ normalization handled automatically (especially because IIRC that's a per-registry choice? so having one place that knows crates.io allows normalization would be better): cargo-dl. If there's appetite for even more public plumbing commands; that cli also currently attempts to pull data out of cargo's download cache, if there was some way to ask cargo to download a crate into the cache and provide the path/tarball then I could leave all networking up to cargo and just have a trivial wrapper to automate extracting that for inspection.

epage · 2023-01-27T23:04:44Z

@Nemo157 that command might be of interest to #1861.

epage added A-new-subcommand Area: new subcommand C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` labels Aug 30, 2022

epage mentioned this issue Sep 1, 2022

cargo search return non-zero exit code when no results found #11037

Closed

epage mentioned this issue Sep 5, 2022

Allow restricting search to crates with binaries for CLI completion purposes #11052

Open

epage mentioned this issue Dec 16, 2022

Please provide a subcommand to refresh the crates.io index #3377

Closed

weihanglo mentioned this issue Aug 18, 2023

Index at github.com-1ecc6299db9ec823 isn't updated #12523

Closed

epage mentioned this issue Nov 3, 2023

Provide access to cargo's local index. #7824

Closed

epage mentioned this issue Nov 25, 2023

Cargo add/Cargo install + tab should list all available crates #13043

Closed

epage mentioned this issue Sep 3, 2024

cargo info doesn't allow to specify output-format #14469

Open

epage added the S-triage Status: This issue is waiting on initial triage. label Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command for introspecting on the index #11034

Command for introspecting on the index #11034

epage commented Aug 30, 2022 •

edited

Loading

epage commented Aug 30, 2022

Eh2406 commented Aug 31, 2022

epage commented Aug 31, 2022

kornelski commented Sep 2, 2022

kornelski commented Sep 2, 2022 •

edited

Loading

epage commented Sep 2, 2022

kornelski commented Sep 2, 2022

epage commented Sep 2, 2022

Eh2406 commented Sep 2, 2022

kornelski commented Sep 2, 2022

kornelski commented Sep 2, 2022

epage commented Sep 15, 2022

kornelski commented Sep 15, 2022 •

edited

Loading

Nemo157 commented Jan 27, 2023

epage commented Jan 27, 2023

Command for introspecting on the index #11034

Command for introspecting on the index #11034

Comments

epage commented Aug 30, 2022 • edited Loading

Problem

Proposed Solution

Notes

epage commented Aug 30, 2022

Eh2406 commented Aug 31, 2022

epage commented Aug 31, 2022

kornelski commented Sep 2, 2022

kornelski commented Sep 2, 2022 • edited Loading

epage commented Sep 2, 2022

kornelski commented Sep 2, 2022

epage commented Sep 2, 2022

Eh2406 commented Sep 2, 2022

kornelski commented Sep 2, 2022

kornelski commented Sep 2, 2022

epage commented Sep 15, 2022

kornelski commented Sep 15, 2022 • edited Loading

Nemo157 commented Jan 27, 2023

epage commented Jan 27, 2023

epage commented Aug 30, 2022 •

edited

Loading

kornelski commented Sep 2, 2022 •

edited

Loading

kornelski commented Sep 15, 2022 •

edited

Loading