Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Nested Cargo packages #3452

Open
wants to merge 34 commits into
base: master
Choose a base branch
from
Open
Changes from 5 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
80b9ddf
Create 0000-nested-publish.md
kpreid Jul 1, 2023
126b2b4
Clarify some terms and discuss workspaces and normalization
kpreid Jul 1, 2023
ef9a157
Discuss trait coherence relaxation
kpreid Jul 1, 2023
a1d4425
Reposition footnote-links
kpreid Jul 1, 2023
35184e0
Add PR number
kpreid Jul 1, 2023
c4dd372
Clarify `path` behavior and existing dev-dependency behavior
kpreid Jul 1, 2023
e71a953
Avoid "lockstep" and discuss a version duplication hazard
kpreid Jul 1, 2023
3da9eeb
Add Definitions section.
kpreid Jul 6, 2023
7f9c09f
Discuss more alternatives for marking packages; define "nested package"
kpreid Jul 6, 2023
130ceb0
Clarify “is private”
kpreid Jul 6, 2023
42bdb61
Require dependencies to be explicitly nested.
kpreid Feb 5, 2024
25d86b1
More motivation and drawbacks.
kpreid Feb 5, 2024
743d531
Define “nested publishing”.
kpreid Feb 5, 2024
34a19dc
Discuss postponed RFC 2224 as prior art.
kpreid Feb 5, 2024
b75a519
Mention vendoring.
kpreid Feb 5, 2024
d7e8dea
Discuss “subcrate dependencies” in prior art.
kpreid Feb 5, 2024
6670428
Discuss “Inline crates” in prior art.
kpreid Feb 5, 2024
da547de
Rewrite reference-level explanation to focus more on effects than cha…
kpreid Feb 5, 2024
b4923ac
Expand alternatives and move inline-crates discussion there.
kpreid Feb 5, 2024
68ad634
Move license and version ideas.
kpreid Feb 11, 2024
33d1c6e
Specify that package names must be unique.
kpreid Feb 11, 2024
680709c
Typo
kpreid Feb 11, 2024
2af4921
Mention feature flattening.
kpreid Feb 11, 2024
0deba10
Discuss workspace inheritance.
kpreid Feb 11, 2024
b999976
Move `dependencies.*.publish = false` to future possibilities.
kpreid Feb 11, 2024
3edb308
Replace `package.publish = "nested"` with `package.publish.nested = t…
kpreid Feb 11, 2024
dd90f20
Refine explanation of `package.publish` being a table.
kpreid Feb 11, 2024
07b058d
Rephrase name conflict rule to avoid "transitive closure".
kpreid Mar 11, 2024
2c220ee
Always error on `workspace.dependencies.*.publish`.
kpreid Mar 11, 2024
4bc77cb
Rewrite feature flattening section.
kpreid Mar 11, 2024
e470314
Rationale for name uniqueness.
kpreid Mar 13, 2024
2a5474e
Polishing.
kpreid Mar 13, 2024
db9e7fa
Update comparison with packages-as-namespaces given that that RFC has…
kpreid Mar 13, 2024
4817861
Explicitly state that nested names are non-unique *outside* of the pa…
kpreid Mar 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 183 additions & 0 deletions text/0000-nested-publish.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor question: If a Git repository contains a package with nested packages, can the other package depends on any of those nested packages as a Git dependency? Currently Git dependency searches packages whose name matches recursively inside the repository.

Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
- Feature Name: `nested_publish`
- Start Date: 2023-06-30
- RFC PR: [rust-lang/rfcs#3452](https://github.com/rust-lang/rfcs/pull/3452)
- Rust Issue: ...

# Summary
[summary]: #summary

Allow Cargo packages to be bundled within other Cargo packages when they are published (not just in unpublished workspaces).

# Motivation
[motivation]: #motivation

There are a number of reasons why a Rust developer currently may feel the need to create multiple library crates, and therefore multiple Cargo packages (since one package contains at most one library crate). These multiple libraries could be:

* A trait declaration and a corresponding derive macro (which must be defined in a separate proc-macro library).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should -sys packages be a motivation with using this being a recommended path or should we discourage using this with -sys

This was talked about a little at #2224 (comment)

* A library that uses a build script that uses another library or binary (e.g. for precomputation or bindings generation).
* A logically singular library broken into multiple parts to speed up compilation.
kpreid marked this conversation as resolved.
Show resolved Hide resolved
Comment on lines +16 to +18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the value of this RFC, and these pain points are truely painful. I am a bit not comfortable about the complexity exposed to Cargo users. With the upcoming public/private dependencies in Edition 2024. The situation becomes way awkward.

# in a `foo` package
foo-priv-types = { path = "priv-types", public = false, publish = "nested" }
foo-core= { version = "0.1", path = "core", public = true }
foo-util = { version = "0.1", path = "util", public = false }
foo-derive = { path = "derive", public = true, publish = "nested" }

The above example is very likely to happen, but it not immediately clear the mixed meaning of public and nested.

  • public = false + nested
    • types are not exposed (private) in public API, and that package is published as a private module
    • 👍🏾 make sense
  • public = true + version
    • types are exposed in public API, and that package is published separately
    • 👍🏾 make sense
  • public = false + version
    • types are not exposed in public API, and that package is published separately
    • 🤔 seems awkward; this RFC addresses it
  • public + nested
    • types are exposed in public API but that package is published as it is a private module
    • 🤔 looks a bit more awkward; this RFC addresses it,

I may have over-complicated the situation, but it indeed introduces cognitive overhead to understand when combining different concept together. I don't know how complex inline-module would be, but that might be a chance to changing to compilation unit from crate to module (don't bash on my head, just an idea).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No need to say when open namespace comes and joins the party. While it's a pretty independent feature, the learning curve doesn't look too good when everything gathers…)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, the big concern with public + nested is when two workspace members do that for the same dependency. Locally, they will be interchangeable. When published, they will not. This delays testing and could confuse users.

Comment on lines +14 to +18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another use case is that this provides another way for us to break dependency cycles that involve dev-dependencies.

Currently, the solution involves dropping the dependency on publish (by not specifying a version). This lacked discovery so by default cargo add does it for all path dev-dependencies. This negatively impacts crater because it means that any packages with dev-dependency cycles or where cargo add was used to add a path dev-dependency, we lose out on a lot of testing with crater.

With this feature, we can instead nest the path dev-dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A logically singular library broken into multiple parts to speed up compilation.

Similarly, a bin might want to split out a lib for local development and testing but not consider it public and not offer semver guarantees for the lib. cargo-edit and cargo-release are like this.


Currently, developers must publish these packages separately. This has several disadvantages (see the [Rationale](#rationale-and-alternatives) section for further details):

* Clutters the public view of the registry with packages not intended to be usable on their own, and which may even become obsolete as internal architecture changes.
* Requires multiple `cargo publish` operations (this could be fixed with bulk publication) and writing public metadata for each package.
* Can result in semver violations and thus compilation failures, due to the developer not thinking about semver compatibility within the group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While some of these indirectly touch on it, one I'd explicitly add is sheer boilerplate.

In working on #3424, one of the things I've noticed is the commentary from people who are looking to further drop boilerplate. This also came up in a recent blogpost and HN discussion of it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal will still require Cargo.tomls for each nested package. What boilerplate do you see removing (besides e.g. explanatory README.md files)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any of the standard manifest fields that crates.io requires. Granted workspace inheritance helps with those (which will automatically be used in cargo new in 1.72) but much nicer if we can just leave them out

Combine that with "cargo script" (if we support [lib] packages) and you might not even need manifests (even ones embedded in source)

Copy link
Contributor Author

@kpreid kpreid Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a definitive statement of exactly which fields crates.io requires to compare to when discussing boilerplate reduction. https://doc.rust-lang.org/cargo/reference/publishing.html#before-publishing-a-new-crate implies it is one, but isn't really (e.g. homepage is not mandatory).


This RFC will allow developers to avoid all of these inconveniences and hazards by publishing a single package.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

By default (and always, prior to this RFC's implementation):

* If your package contains any sub-packages, Cargo [excludes](https://doc.rust-lang.org/cargo/reference/manifest.html#the-exclude-and-include-fields) them from the `.crate` archive file produced by `cargo package` and `cargo publish`.
* If your package contains any non-`dev` dependencies which do not give a `version = "..."`, it cannot be published to `crates.io`.
kpreid marked this conversation as resolved.
Show resolved Hide resolved

(By “**sub-package**” we mean a package (directory with `Cargo.toml`) which is a subdirectory of another package. We shall call the outermost such package, the package being published, the “**parent package**”.)
kpreid marked this conversation as resolved.
Show resolved Hide resolved
kpreid marked this conversation as resolved.
Show resolved Hide resolved

You can change this default by placing in the manifest (`Cargo.toml`) of a sub-package:

```toml
[package]
publish = "nested"
kpreid marked this conversation as resolved.
Show resolved Hide resolved
```

If this is done, Cargo's behavior changes as follows:

* If you publish the parent package, the sub-package is included in the `.crate` file (unless overridden by explicit `exclude`/`include`) and will be available to the parent package whenever the parent package is downloaded and compiled.
* The parent package may have a `path =` dependency upon the sub-package. (This dependency may not have a `version =` specified.)
kpreid marked this conversation as resolved.
Show resolved Hide resolved
* You cannot `cargo publish` the sub-package, just as if it had `publish = false`. (This is a safety measure against accidentally publishing the sub-package separately when this is not intended.)

Nested sub-packages may be freely placed within other nested sub-packages.

When a group of packages is published in this way, and depended on, this has a number of useful effects (which are not things that Cargo explicitly implements, just consequences of the system):

* The packages are versioned in lockstep; there is no way for a version mismatch to arise since all the code was published together. Version resolution does not apply (in the same way that it does not for any other `path =` dependency).
epage marked this conversation as resolved.
Show resolved Hide resolved
* The sub-package is effectively “private”: it cannot be named by any other package on `crates.io`, only by its parent package and sibling sub-packages.
kpreid marked this conversation as resolved.
Show resolved Hide resolved
kpreid marked this conversation as resolved.
Show resolved Hide resolved

## Example: trait and derive macro

Suppose we want to declare a trait-and-derive-macro package. We can do this as follows. The parent package would have this manifest `foo/Cargo.toml`:

```toml
[package]
name = "foo"
version = "0.1.0"
edition = "2021"
publish = true

[dependencies]
foo-macros = { path = "macros" } # newly permitted
```

The sub-package manifest `foo/macros/Cargo.toml`:

```toml
[package]
name = "macros" # this name need not be claimed on crates.io
version = "0.1.0" # this version is not used for dependency resolution
edition = "2021"
publish = "nested" # new syntax

[lib]
proc-macro = true
```

Then you can `cargo publish` from within the parent `foo` directory, and this will create a single `foo` package on `crates.io`, with no `macros` (or `foo-macros`) package visible except when inspecting the source code or in compilation progress messages.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation
kpreid marked this conversation as resolved.
Show resolved Hide resolved

The following changes must be made across Cargo and `crates.io`:
kpreid marked this conversation as resolved.
Show resolved Hide resolved

* **Manifest schema**
* The Cargo manifest now allows `"nested"` as a value for the `package.publish` key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is already a term in cargo, I actually lean towards "vendor"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the similarity, but vendoring normally means making a copy of a package that is available by other means, and one of the design goes here is to discourage any such copies existing (because they are likely to be accidental, and if they aren't, then they may create the same kinds of problems as multiple major versions do). I think reusing the term would create more confusion than it avoids.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vendoring normally means making a copy of a package that is available by other means, an

The perspective I was using when I came up with "vendor" was that instead of getting a dependency through the registry, we are copying it into our package. Its not vendored within the repo but in the .crate file.

This also ties into whether we should generalize this across dependency sources at which point it feels like it becomes even more applicable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think that vendoring is the right term, especially as one of the things that came up during my review of prior art is the concept of using nested-packages-or-whatever for vendoring, e.g. to fix a bug before upstream accepts the patch — I think these need to be kept distinct ideas.

That being considered, what do we need to do here with the RFC text to resolve this thread? Should there be an unresolved question for terminology, or can we just proceed as-is?

Copy link

@sam0x17 sam0x17 Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "vendoring" isn't all-inclusive of the situations where one might use this. Usually vendoring refers to "taking some third party dependency's entire source code and jamming it into to some vendor/whatever directory (or sometimes via a git submodule) for reasons". I don't think it makes sense to use this term in scenarios where the nested crate isn't third party, which I think will actually be most of the time with this feature, so "vendoring" at best will be a misnomer, and at worst might also carry some negative connotations for people recalling some really crazy repo setups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen any "seconding" of support for the "vendor" name, so I'm going to keep "nested" and resolve this. (Of course, there might be some third or fourth better idea to be found…)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least keeping this unresolved to centralize any name bikeshedding conversations

* **`cargo package` & `cargo publish`**
* Should refuse to publish a package if that package (not its sub-packages) has `publish = "nested"`.
* Exclude/include rules should, upon finding a sub-package, check if it is `publish = "nested"` and not automatically exclude it. Instead, they should treat it like any other subdirectory; in particular, it should be affected by explicitly specified exclude/include rules.
* Nested `Cargo.toml`s should be normalized in the same way the root `Cargo.toml` is, if they declare `publish = "nested"`, and not if they do not.
* This avoids modifying the publication behavior for existing packages, even if they contain project templates or invoke `cargo` to compile sub-packages to probe the behavior of the compiler.
* If the nested `Cargo.toml` has a syntax error such that its `package.publish` value cannot be determined, then if it is depended upon, emit an error; if it is not, emit a warning and do not normalize it.
* **`crates.io`**
* Should allow `path` dependencies that were previously prohibited, at least provided that the named package in fact exists in the `.crate` archive file. The path must not contain any upward traversal (`../`) or other hazardous or non-portable components.
kpreid marked this conversation as resolved.
Show resolved Hide resolved
* **Build process**
* Probably some messages will need to be adjusted; currently, `path` dependencies' full paths are always printed in progress messages, but they would be long noise here (`/home/alice/.cargo/registry/src/index.crates.io-6f17d22bba15001f/...`). Perhaps progress for sub-packages could look something like “`Compiling foo/macros v0.1.0`”.

The presence or absence of a `[workspace]` has no effect on the new behavior, just as it has no effect on existing package publication.

# Drawbacks
[drawbacks]: #drawbacks

* This increases the number of differences between “Cargo package (on disk)” from “Cargo package (that may be published in a registry, or downloaded as a unit)” in a way which may be confusing; it would be good if we have different words for these two entities, but we don't.
* If Cargo were to add support for multiple libraries per package, that would be largely redundant with this feature.
kpreid marked this conversation as resolved.
Show resolved Hide resolved
* It is not possible to publish a bug fix to a sub-package without republishing the entire parent package.
kpreid marked this conversation as resolved.
Show resolved Hide resolved

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

kpreid marked this conversation as resolved.
Show resolved Hide resolved
The reason for doing anything at all in this area is that publishing multiple packages is often a bad solution to the problems that motivate it; in particular:

* Non-lockstep versioning risk: If you publish `foo 1.0.0` and `foo-macros 1.0.0`, then later publish `foo 1.1.0` and `foo-macros 1.1.0`, then it is _possible_ for users' `Cargo.lock`s to get into a state where they select `foo-macros 1.1.0` and `foo 1.0.0`, and this then breaks because `foo-macros` assumed that items from `foo 1.0.0` would be present. Arguably, this is a deficiency in the proc-macro system (`foo-macros` has a _de facto_ dependency on `foo` but does not declare it), but not one that is likely to be corrected any time soon. This can be worked around by having `foo` specify an exact dependency `foo-macros = "=1.0.0"`, but this is a subtlety that library authors do not automatically think of; semver is easy to get wrong silently.
* The crates.io registry may be cluttered with many packages that are not relevant to users browsing packages. (Of course, there are many other reasons why such clutter will be found.)
* When packages are implementation details, it makes a permanent mark on the `crates.io` registry even if the implementation of the parent package stops needing that particular subdivision. By allowing sub-packages we can allow package authors to create whatever sub-packages they imagine might be useful, and delete them in later versions with no consequences.
* It is possible to depend on a published package that is intended as an implementation detail. Ideally, library authors would document this clearly and library users would obey the documentation, but that doesn't always happen. By allowing nested packages, we introduce a simple “visibility” system that is useful in the same way that `pub` and `pub(crate)` are useful within Rust crates.

The alternative to nested packages that I have heard of as a possibility would be to support multiple library targets per package. That would be arguably cleaner, but has these disadvantages:

* It would require new manifest syntax, not just for declaring the multiple libraries, but for referring to them, and for making per-target dependencies (e.g. only a proc-macro lib should depend on `proc-macro2`+`quote`+`syn`, not the rest of the libraries in the package).
* It would require many new mechanisms in Cargo.
* It might have unforeseen problems; by contrast, nested packages are compiled exactly the same way `path` dependencies currently are, and the only new element is the ability to publish them, so the risk of surprises is lower.

Also, nested packages enables nesting *anything* that Cargo packages can express now and in the future; it is composable with other Cargo functionality.

We could also do nothing, except for warning the authors of paired macro crates that they should use exact version dependencies. The consequence of this will be continued hassle for developers; it might even be that useful proc-macro features might not be written simply because the author does not want to manage a second package.

## Details within this proposal

Instead of introducing a new value for the `publish` key, we could simply allow sub-packages to be published when they would previously be errors. However, this would be problematic when an existing package has a dev-dependency on a sub-package; either that sub-package would suddenly start being published as nested, or there would be no way to specify the sub-package *should* be published.

We could also introduce an explicit `[subpackages]` table in the manifest. However, I believe `publish = "nested"` has the elegant and worthwhile property that it simultaneously enables nested publication and prohibits accidental un-nested publication of the sub-package.

# Prior art
[prior-art]: #prior-art

I am not aware of other package systems that have a relevant similar concept, but I am not broadly informed about package systems. I have designed this proposal to be a **minimal addition to Cargo**, building on the existing concept of `path` dependencies to add lots of power with little implementation cost; not necessarily to make sense from a blank slate.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

I see no specific unclear design choices, but we might want to incorporate one or more of the below _Future possibilities_ into the current RFC, particularly omitting version numbers.

# Future possibilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined with bin deps, we could allow delegating build scripts to a nested package, allowing a more complete environment for its development.

[future-possibilities]: #future-possibilities

## Omit version numbers

Nested packages don't really have any use for version numbers; arguably, they should be omitted and even prohibited, since they may mislead a reader into thinking that the version numbers are used for some kind of version resolution. However, this is a further change to Cargo that is not strictly necessary to solve the original problem, and it disagrees with the precedent of how local `path` dependencies currently work (local packages must have version numbers even though they are not used).

## Nested packages with public binary targets

One common reason to publish multiple packages is in order to have a library and an accompanying tool binary, without causing the library to have all of the dependencies that the binary does. Examples: `wasm-bindgen` (`wasm-bindgen-cli`), `criterion` (`cargo-criterion`), `rerun` (`rerun-cli`).

This RFC currently does not address that — if nothing is done, then `cargo install` will ignore binaries in sub-packages. It would be easy to make a change which supports that; for example, `cargo install` could traverse sub-packages and install all found binaries — but that would also install binaries which are intended as testing or (once [artifact dependencies] are implemented) code-generation helpers, which is undesirable. Thus, additional design work is needed to support `cargo install`ing from subpackages:

* Should there be an additional manifest key which declares the binary target “public”?
* Should targets be explicitly “re-exported” from the parent package?
* Should there be an additional option to `cargo install` which picks subpackages? (This would cancel out the user-facing benefit from having a single package name.)

## Nested packages with public library targets

Allowing nested libraries to be named and used from outside the package would allow use cases which are currently handled by Cargo `features` and conditional compilation (optional functionality with nontrivial costs in dependencies or compilation time) to be instead handled by defining additional public libraries within one package.

This would allow library authors to avoid writing fragile and hard-to-test conditional compilation, and allow library users to avoid accidentally depending on a feature being enabled despite not having enabled it explicitly. It would also allow compiling the optional functionality and its dependencies with maximum parallelism, by not introducing a single `feature`-ful library crate which acts as a single node in the dependency graph.

However, it requires additional syntax and semantics, and these use cases might be better served by [#3243 packages as namespaces] or some other namespacing proposal, which would allow the libraries to be published independently. (I can also imagine a world in which both of these exist, and the library implementer can transparently use whichever publication strategy best serves their current needs.)

## Additional privileges between crates

Since nested packages are versioned as a unit, we could relax the trait coherence rules and allow implementations that would otherwise be prohibited.

This would be particularly useful when implementing traits from large optional libraries; for example, package `foo` with subpackages `foo_core` and `foo_tokio` could have `foo_tokio` write `impl tokio::io::AsyncRead for foo_core::DataSource`. This would improve the dependency graph compared to `foo_core` having a dependency on `tokio` (which is the only way to do this currently), though not have the maximum possible benefit unless we also added public library targets as above, since the package as a whole still only exports one library and thus one dependency graph node.

[artifact dependencies]: https://github.com/rust-lang/rfcs/pull/3028
[#3243 packages as namespaces]: https://github.com/rust-lang/rfcs/pull/3243