Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo publish with internal path dependencies #2224

Closed
wants to merge 4 commits into from
Closed

cargo publish with internal path dependencies #2224

wants to merge 4 commits into from

Conversation

jakwings
Copy link

This RFC proposes the support for publishing a crate with path dependencies.

Implementation: rust-lang/cargo#4735

Rendered

@matklad matklad added the T-cargo Relevant to the Cargo team, which will review and decide on the RFC. label Nov 23, 2017
@matklad
Copy link
Member

matklad commented Nov 23, 2017

cc @rust-lang/cargo

@est31
Copy link
Member

est31 commented Nov 24, 2017

Some points:

  1. the current proposal would allow for a way for crates with publish = false to be uploaded to crates.io, e.g. because the author misunderstood something. Right now I see publish=false as the "proprietary code" anchor that you can use to prevent accidental leakage of your code to the public. With this RFC that property is weakened.

  2. This RFC sort of ignores all collision issues that occur if we just allow any crate name to be specified via cargo. E.g. think of a crate foo that has an internal crate called util (I'll call it foo/util from now on to avoid confusion) that is packaged inside the same crate. A collision would happen now if an application uses foo and util (from crates.io) as its dependencies. This collision wouldn't be an issue for normal compilation as long as you also include foo into the hash of foo/util. However, it would be a big issue with cargo doc output, because here the url for an index.html for each crate is just target/doc/crate_name/index.html. For util and foo/util the url would be the same.

  3. I'm generally more in favour of bigger crates when it comes to frameworks, but I'm still not really convinced of the usefulness of this feature in the first place. I mean I'm pretty sure the first feature request you'll have when this RFC is implemented is "please let cargo only download that path dependency of this crate, I don't need more", and then you'll have completed the circle :).

I would really love to have namespacing, whether it is project based or user based, but this is no right solution to that IMO.

@matklad
Copy link
Member

matklad commented Nov 24, 2017

This RFC sort of ignores all collision issues that occur if we just allow any crate name to be specified via cargo.

I think what's crucial here (and what is completely different from any namespacing solutions) is that such path dependencies are completely invisible to clients, they are a private implementation detail. So, logically, there's no question about collisions, because they can't happen. The practical point about rustdoc is valid though! It's interesting to see that similar collisions are possible today: for example, you can (transitively) depend on two different major versions of a crate, or you can use path dependency yourself. In this case, rustdoc seems to silently leave only one copy of crates docs, overwriting another.

I'm still not really convinced of the usefulness of this feature in the first place. I mean I'm pretty sure the first feature request you'll have when this RFC is implemented is "please let cargo only download that path dependency of this crate, I don't need more", and then you'll have completed the circle

Hm, my understanding is that, because path dependencies are internal and invisible, the said feature request wont' happen: downstream crate is not allowed to us an upstream's path dependency at all, even if the whole upstream is downloaded.

I personally find this feature very useful. I love splitting my code into crates, whenever possible, because it provides physical separation, separate compilation and enforces acyclic logical dependencies. But in today's Cargo, extracting a crate is a lot of work: you have to come up with a version, you have to fill-in Cargo.toml's fields, and, crucially, you have to publish this crate separately, which forces a lot of maintenance work onto you. And if later you decide that the particular split into crates was not ideal, you are stuck with old crates published to crates.io. The core of these problems is that the way your project is split into crates is a part of it's public API, because all crates are published can be used independently, and this RFC I think provides a way to hide these details from clients completely.

@SimonSapin
Copy link
Contributor

At the moment the terms "crate" or "package" are used somewhat interchangeably to describe both the things that crates.io knows about (with a https://crates.io/crates/{} URL each), the things that Cargo knows about (with a Cargo.toml file each), and the things rustc knows about (that each can be used with extern crate).

With this RFC, they might need to become separate concepts.

@jakwings
Copy link
Author

@est31

  1. Please do not bikeshed on the key name, it is just a simple change for the code.
  2. Why not try to fix cargo doc?
  3. I guess you won't want to download an internal path dependency. That is an irrelevant issue.

@SimonSapin
Copy link
Contributor

The publish key already exists and has prior meaning, this is not just a naming bikeshed.

@jakwings
Copy link
Author

I have no objection to change the key name. I hope we can discuss more important issues first.

@est31
Copy link
Member

est31 commented Nov 24, 2017

@matklad

I think what's crucial here (and what is completely different from any namespacing solutions) is that such path dependencies are completely invisible to clients, they are a private implementation detail.

I've run a small test locally with a crate with internal path dependencies, and it turns out that rustdoc is only the smallest of our issues.

I did the following:

Click to expand

foo/Cargo.toml:

[package]
name = "foo"
version = "0.1.0"
[dependencies]
bar = { path = "bar" }

foo/bar/Cargo.toml

[package]
name = "bar"
version = "0.1.0"

bar/Cargo.toml

[package]
name = "bar"
version = "0.1.0"

binory/Cargo.toml

[package]
name = "binory"
version = "0.1.0"
authors = ["est31 <[email protected]>"]

[dependencies]
foo = { path = "../foo" }
bar = { path = "../bar" }

foo/bar/src/lib.rs

/// This is the bar struct
pub struct Bar;

impl Bar {
	pub fn get() -> Bar {
		println!("Getting bar");
		Bar
	}
}

bar/src/lib.rs

/// This is the bar struct
pub struct Bar2;

impl Bar2 {
	pub fn get() -> Bar2 {
		println!("Getting bar2");
		Bar2
	}
}

foo/src/lib.rs

pub extern crate bar;

/// This is the foo struct
pub struct Foo;

binory/src/main.rs

extern crate foo;
extern crate bar;

use bar::Bar2;
use foo::bar::Bar;

fn main() {
	Bar::get();
	Bar2::get();
}

What was my output when I tried to do cargo run inside the binory directory?

$ cargo run
   Compiling bar v0.1.0 ($dir/foo/bar)
   Compiling bar v0.1.0 ($dir/bar)
   Compiling foo v0.1.0 ($dir/foo)
   Compiling binory v0.1.0 ($dir/binory)
    Finished dev [unoptimized + debuginfo] target(s) in 0.52 secs
     Running `target/debug/binory`
Getting bar
Getting bar2

Sounds great! Now let's try it again!

$ cargo run
error: failed to parse lock file at: $dir/binory/Cargo.lock

To learn more, run the command again with --verbose.

So cargo runs into issues the moment you have a Cargo.lock commited to git. Let's look at Cargo.lock:

[[package]]
name = "bar"
version = "0.1.0"

[[package]]
name = "bar"
version = "0.1.0"

[[package]]
name = "binory"
version = "0.1.0"
dependencies = [
 "bar 0.1.0",
 "foo 0.1.0",
]

[[package]]
name = "foo"
version = "0.1.0"
dependencies = [
 "bar 0.1.0",
]

There seems to be no way it distinguishes between bar and foo/bar!

So is foo/bar is definitely not "just an internal thing that is not exposed at all". It is present inside Cargo.lock and it can cause real issues if you try to use a crate with the same name. Do you want people to file bugs to crates with internal dependencies, asking them to rename them so that they are usable?

Another place where this RFC doesn't answer any questions at all is crates.io. Right now I can go to crates.io and see all dependencies of a crate, some being marked with optional. But if we now upload foo to crates.io, what will it show in that list? The most consistent thing would be to specify bar = "*", but that would be easy to mix up with bar from crates.io. So should it be mentioned at all? What about the dependencies of foo/bar, where should they be shown? What if foo/bar is an optional dependency, what when foo/bar has optional dependencies of its own which you may or may not be able to control as user of foo?
What happens when foo/bar has a different license field than foo? I think that people deserve to know when foo/bar is GPL licensed while foo is MIT licensed...

@matklad
Copy link
Member

matklad commented Nov 24, 2017

Hm, an interesting bug @est31!

It is present inside Cargo.lock and it can cause real issues if you try to use a crate with the same name.

Yeah, I think in the ideal world we don't want to put such private path dependencies inside the lockfile at all, because there's nothing to lock, their version is irrelevant.

By the same token, I think that license field and quite probably a ton of other fields from Cargo.toml do not make sense for such packages either.

The question about dependencies is interesting! Logically, dependencies of such private packages should be attached to the parent package, and private package itself should not be mentioned in the dependencies at all.

@est31
Copy link
Member

est31 commented Nov 24, 2017

At the moment the terms "crate" or "package" are used somewhat interchangeably to describe both the things that crates.io knows about (with a https://crates.io/crates/{} URL each), the things that Cargo knows about (with a Cargo.toml file each), and the things rustc knows about (that each can be used with extern crate).

This is another good point. There is a bit of discussion on the distinction here. It seems that cargo/crates.io can already now distinguish between crates and packages. This RFC seems to introduce a new concept, with the ability to upload multiple packages per .crate file (the .crate extension adds now even more to the confusion :p). The proposed category would be very similar to what we have with workspaces today, but it would not be the same, at least not if you read the RFC, because the RFC doesn't mandate the [workspace] key to be present in the parent Cargo.toml. So you'd probably have separate Cargo.lock files and separate directories for artifacts. These things probably don't matter much if you use all these crates as dependencies. I'm not aware of further differences.

AFAIK you can have a [package] field in the top Cargo.toml in workspaces, so maybe it would be an idea to require people to upload workspaces instead of the current approach. But beyond "we don't have to come up with a term for the new category" I'm not aware of any advantages of this.

@est31
Copy link
Member

est31 commented Nov 24, 2017

By the same token, I think that license field and quite probably a ton of other fields from Cargo.toml do not make sense for such packages either.

I think that approach to the question has a lot of merit. It fixes many of my concerns :). I think to make 100% sure that authors are not confused, we should check for such keys during publishing and error if they are present. I think the RFC should be amended with that.

As an alternative proposal to this RFC, what about allowing multiple [[lib]] sections inside a crate like we already now allow multiple [[bin]] sections? @matklad @iology would this cover your use cases?

@jakwings
Copy link
Author

This RFC seems to introduce a new concept, with the ability to upload multiple packages per .crate file

There is still only one top-level package. Others are viewed as internal dependencies. Unless a package can be both "internal" and "publishable", I don't think there will be much trouble about that.

As an alternative proposal to this RFC, what about allowing multiple [[lib]] sections inside a crate like we already now allow multiple [[bin]] sections?

Interesting. An obvious question: how many public libs should we have? If there are many, then that really looks like we can upload multiple packages per .crate file (and bigger chance to have naming conflicts with other crates). If there is at most one public, it seems easier to implement with less trouble.

It seems that cargo/crates.io can already now distinguish between crates and packages.

I'm not sure, but the generated docs have a column named "Crates" (e.g. regex). So maybe [[lib]]?

@kornelski
Copy link
Contributor

I'm strongly against using publish key for this. I use publish=false to protect proprietary code from leaking and it's very important for me that these crates are never uploaded anywhere.

Other than that I really like this feature. Just please use another field. Ideally in the parent, so that the dependency can be unmodified (e.g. be a git submodule).

[dependencies.foo]
path = "foo/"
bundle = true

@kornelski
Copy link
Contributor

kornelski commented Nov 26, 2017

I have two use-cases for this:

I have a crate c3 that uses a sub-crate in its workspace (https://crates.io/crates/c3_clang_extensions) to build and patch clang+llvm. Having this helps me separate code better and avoid rebuilding slowest and most fragile part of the code often. But whenever I publish I have to publish twice, and it pollutes crates.io with unnecessary "internal" crate.

I would also use this to temporarily fork other crates. I often need an urgent fix or addition of a small thing, but the original crate author is slow to respond. So I'd like to be able to temporarily fork and bundle a dependency (and keep ability to unbundle it), without publishing the fork on crates.io as a stand-alone crate, because I can't commit to maintain long term every crate I patch. Often authors do respond to my patches, but with a 3-6 month delay, so it's really a waste to pollute crates.io with crate forks that are abandoned after 3-6 months.

@matklad
Copy link
Member

matklad commented Nov 26, 2017

So I'd like to be able to temporarily fork and bundle a dependency (and keep ability to unbundle it), without publishing the fork on crates.io as a stand-alone crate, because I can't commit to maintain long term every crate I patch.

Hm, interesting 🤔 I would say that this is the use case we want to explicitly not support :) When depending on the crate via usual dependency mechanism, you give downstream crates a lot of control over your dependencies. For example, downstream crate would typically pick the latest minor version (which might be greater then at the time of publishing your crate), or it can even completely swap out the dependency with something like replace.

If you vendor a particular version, you are completely stuck with it, so, for example, if a critical vulnerability is then discovered in vendored dependency, consumers of your crate are forced to silently use the vulnerable version.

I think there's a lot of wisdom in the fact that crates from crates.io can depend only on other published crates, and stuff like git-dependencies is forbidden.

Of course, there are cases when you need to get stuff done, and you can't wait for the upstream to merge such changes. This typically happens when you are developing and shipping an application. Here, you can use patch section of Cargo.toml which gives you complete control over (transitive) dependencies, but which prevents you from publishing crates with modified dependencies to crates.io.

So, coming back to the RFC, I would love to see a mechanism preventing you from vendoring a crate from crates.io :)

@est31
Copy link
Member

est31 commented Nov 30, 2017

Another effect this RFC might have is that -sys crates will now be included into the package... I wouldn't like that as I think this is against modularity.

@sfackler
Copy link
Member

Another effect this RFC might have is that -sys crates will now be included into the package

Why would that be any more likely than the contents of the sys crates being folded into the main crate already?

@est31
Copy link
Member

est31 commented Nov 30, 2017

@sfackler good point, but consider that people could just use it for the sake of using modern features. I wonder what for most people the reason is to have a separate -sys crate and whether this RFC changes something for them.

@vmx
Copy link

vmx commented Dec 3, 2017

I'm also in favour of this RFC, mostly because I'd like to have forks with changes only needed for the main crate. More about it can be found in the forums: https://users.rust-lang.org/t/how-to-work-with-a-forked-dependency/13338

@jakwings
Copy link
Author

jakwings commented Dec 3, 2017

@vmx Nice, there mentioned a discussion thread about [[lib]], so I paste it here:

https://internals.rust-lang.org/t/how-about-changing-lib-to-lib-to-allow-multiple-library-in-a-crate/2022

Concerns in that thread:

This was initially considered when designing cargo, but we chose to only allow one for a variety of reasons:

  • If there are two libraries, how do you select only one to build if you don’t want both?
  • Encouraging separate projects for each crate means dependencies are generally more re-usable than if they were bundled together. Crates which want to provide multiple libraries almost always end up getting large enough that they should be separated anyway as well.
  • With multiple libraries in one package there would need to be a method of specifying dependencies amongst them.
  • A number of features have been added after-the-fact which have made multiple libraries tricky. For example, if a package has a build script, how does it know which crate to link the native libraries into?

@vmx
Copy link

vmx commented Dec 4, 2017

If you vendor a particular version, you are completely stuck with it, so, for example, if a critical vulnerability is then discovered in vendored dependency, consumers of your crate are forced to silently use the vulnerable version.

That's even true today, I would even argue it's worse. If I need a patched version of a dependency, I need to fork that dependency and publish it. With this RFC there won't be random crates/packages floating around that are just forks for one specific use case.

@epage
Copy link
Contributor

epage commented Dec 22, 2017

RE Motivation

For me, the motivation read as describing "managing workspaces or path dependencies is hard, let's make it easier" but instead the RFC is specifically about creating the concept of private crates. I feel like that is what needs to be addressed in the motivation.

The motivations I found in this thread:

@matklad

I personally find this feature very useful. I love splitting my code into crates, whenever possible, because it provides physical separation, separate compilation and enforces acyclic logical dependencies. But in today's Cargo, extracting a crate is a lot of work: you have to come up with a version, you have to fill-in Cargo.toml's fields, and, crucially, you have to publish this crate separately, which forces a lot of maintenance work onto you. And if later you decide that the particular split into crates was not ideal, you are stuck with old crates published to crates.io. The core of these problems is that the way your project is split into crates is a part of it's public API, because all crates are published can be used independently, and this RFC I think provides a way to hide these details from clients completely.

@pornel

I have a crate c3 that uses a sub-crate in its workspace (https://crates.io/crates/c3_clang_extensions) to build and patch clang+llvm. Having this helps me separate code better and avoid rebuilding slowest and most fragile part of the code often. But whenever I publish I have to publish twice, and it pollutes crates.io with unnecessary "internal" crate.
I would also use this to temporarily fork other crates. I often need an urgent fix or addition of a small thing, but the original crate author is slow to respond. So I'd like to be able to temporarily fork and bundle a dependency (and keep ability to unbundle it), without publishing the fork on crates.io as a stand-alone crate, because I can't commit to maintain long term every crate I patch. Often authors do respond to my patches, but with a 3-6 month delay, so it's really a waste to pollute crates.io with crate forks that are abandoned after 3-6 months.

My paraphrasing / summary

  • Code organization
    • particularly to avoid unintentionally introducing cyclic dependencies
  • Build times
  • Private forks

For code organization: is supporting this extra feature worth it over just using mod? Will the module proposals being worked on help with regards to making it easier to catch yourself adding cycles?

For build times: This sounds like a workaround for incremental compilation which is on track for moving towards stable. Should we just wait until then?

@epage
Copy link
Contributor

epage commented Dec 22, 2017

Should the RFC address [patch] / [replace]?

I assume the way to patch a private crate is by patching the public crate but I feel like that should be made explicit in the RFC.

@kornelski
Copy link
Contributor

In my case incremental compilation doesn't help, because the sub crate is to avoid running build.rs unnecessarily. It's much easier to use a crate than to make build.rs clever and faster.

@matklad
Copy link
Member

matklad commented Dec 22, 2017

@vmx I believe "private forks" of crates is a use-case which we deliberately want not to support in Cargo, here are some thoughts as to why this is a good policy: #2224 (comment). This is my personal interpretation though, other members of @rust-lang/cargo may disagree with it :)

@jakwings
Copy link
Author

@vmx Yeah, this feature could also be used for that, and I didn't mention it in the RFC because my very first goal is better modularization. :P I do think this may be the best solution for vendoring.

@vmx
Copy link

vmx commented Dec 22, 2017

@matklad I replied to that with #2224 (comment). I don't see how publishing your fork does have any advantage over having it as a sub-crate (e.g. https://crates.io/crates/noise_search_deps_librocksdb-sys in my case).

@vmx
Copy link

vmx commented Dec 22, 2017

@iology That's good to hear. It makes sense to concentrate on one thing, but it's great if it also works for others :)

@matklad
Copy link
Member

matklad commented Dec 22, 2017

@vmx sorry, I've missed your reply!

I think the crucial difference here is that when you publish you fork to crates.io, downstream users still have control over it: they can use [patch] or [replace] section to redirect your's crate dependency to their fork, for example. However, if you vendor the actual source code of your dependency as a part your crate, then you are stuck with this fixed version of source code.

@vmx
Copy link

vmx commented Dec 22, 2017

@matklad Good point to keep in mind. So if the sub-crates behave like path dependencies do today, I think even [patch]/[replace] should work.

@kornelski
Copy link
Contributor

@matklad I think it'd be good to find ways how private/temporary forking can be done better to fix the downsides that you list.

For example, [replace] should be made to work on privately forked crates too. When there's enough metadata to track where it was forked from, tools may even be able to "unfork" it automatically or at least warn about outdated forks.

The alternative is the status quo of public forks, and that's not good either. Public forks also become abandoned and outdated.

Public forks pollute crates.io with confusing duplicates. A change of policy & UI around yanking and deleting of crates could help to clean them up after they're abandoned, but IMHO it's even better if the fork never becomes publicly visible, so there won't be a risk that other people inadvertently used it and have their dependencies outdated or deleted.

@rnewman
Copy link

rnewman commented Mar 9, 2018

I thought I'd share my experience.

I have a workspace containing 16 crates: one at the top level, one CLI tool, and 14 sub-crates.

This workspace totals about 35,000 lines of Rust, excluding dependencies.

More than a year ago we adopted sub-crates in order to achieve incremental building, incremental testing, and to enforce modularity, as already expressed earlier in this thread.

We also expect to produce multiple binaries and libraries that are composed from these crates: e.g., a transaction log replicator that can exclude the entire query subsystem.

It's possible that some of these sub-crates will in the future be independently published for reuse, but at present they are not intended for public consumption, are not independently versioned, and have no documented or stable API surface.

It's non-trivial to convert this workspace into nested modules: I started, and have about 800 compiler errors left to fix (so far!).

However, as far as I can tell the only way I can publish the top-level crate on crates.io is to publish all of its internal crates or merge everything into a single crate. Permanently publishing implementation details into a flat global namespace seems like a bad decision and poor stewardship, but the alternatives are also unpleasant.

I would be delighted if this RFC were adopted.

@matklad
Copy link
Member

matklad commented Mar 26, 2018

Cargo team discussed this RFC during the latest meeting. Overall it is
clear that the problem is important, and that some solution is
needed. On the other hand, the problem isn't too pressing: the
workaround of publishing through-away internal crates, while being
rather unpleasent, works.

It is also clear that the solution should be rather more nuanced than
currently expressed in the RFC and in the implementation PR. Here are
some unresolved questions:

  • Cargo.toml format for internal crate should be specified, literally
    using publish = false is probably not what we want.

  • What to do, if workspace has two publishable packages, foo and
    bar, which both depend on a private path-dependency baz? Should
    baz be duplicated, or should this situation be completely
    forbidden?

  • How dependencies from private path-dependencies propagate to the
    parent crate? Presumably, at least on crates.io, we want to be able
    to see all dependencies of a crate, even if they are hidden
    behind private sub-crate.

With all this in mind, the team leans towards postponing the RFC at
this moment.

@rfcbot postpone

However, the team would be delighted to accept an experimental
implementation of private path dependencies as an unstable feature, if
it clearly outlines unresolved questions and problematic edge cases.

@matklad
Copy link
Member

matklad commented Mar 26, 2018

@rfcbot fcp postpone

1 similar comment
@matklad
Copy link
Member

matklad commented Apr 2, 2018

@rfcbot fcp postpone

@rfcbot
Copy link
Collaborator

rfcbot commented Apr 2, 2018

Team member @matklad has proposed to postpone this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Apr 2, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Apr 13, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Apr 13, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Apr 23, 2018

The final comment period is now complete.

@matklad matklad added the postponed RFCs that have been postponed and may be revisited at a later time. label Apr 24, 2018
@matklad matklad closed this Apr 24, 2018
@jhpratt
Copy link
Member

jhpratt commented Feb 4, 2020

@matklad Have there been any updates in the past 2ish years in this area? I imagine it's more of an issue than ever with proc macros.

@matklad
Copy link
Member

matklad commented Feb 5, 2020

not that I am aware of, but I was not actively involved in cargo for 2ish years :)

@haraldh
Copy link

haraldh commented Aug 20, 2020

a pitty, that there is no solution

haraldh added a commit to haraldh/cargo that referenced this pull request Aug 21, 2020
Rebase of: rust-lang#4735

RFC: rust-lang/rfcs#2224

Before this patch, all path deps must be published before the root crate
is published to the registry.

This patch allow `cargo package` to include files of path dependencies
in the generated package if the dependencies have the manifest key
"package.publish" set to false, which means the crate cannot be
registered as a sharable crate by itself at a registry, but still
allowed to be published along with the root crate.

Co-authored-by: J.W <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. postponed RFCs that have been postponed and may be revisited at a later time. T-cargo Relevant to the Cargo team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.