-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tarball: Use cargo_toml
to parse Cargo.toml
file
#6914
Conversation
I just ran the
and
(ironic, I know... 😂) |
Err(de::Error::invalid_value(value, &expected)) | ||
} | ||
} | ||
use cargo_toml::{Dependency, DepsSet, Error, Inheritable, Manifest, Package}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the extra validation worth it?
I'm concerned about us (very easily) adding new things that will then fail on publish. The person publishing would then need to figure out it is a bug in crates.io, report it, someone respond, create a PR upstream with cargo_toml
, wait for a fix and publish, then updates and deploy crates.io.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our long term goal is to reduce the reliance on the metadata JSON blob that is being sent together with the tarball.
an attacker could potentially publish something where the metadata says foo and the tarball says bar, and we would prefer to treat the tarball as the primary source of truth.
this however requires us to parse the Cargo.toml
file and if parsing fails we can't accept the upload. we could keep rolling our own structure definitions, but as we've already seen with the readme
field, this is quite error-prone.
as the PR description says cargo_toml
works for almost all existing crates, so there isn't really a reason to write everything ourselves.
the extra validation is technically not needed if we assume that only cargo
is used to publish new crates, but only relying on client-side validation seems quite risky given scenarios like https://blog.vlt.sh/blog/the-massive-hole-in-the-npm-ecosystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be reasonable to add logging / alerting such that we could detect an increase in failed toml parsing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, that seems like a reasonable idea :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about us (very easily) adding new things that will then fail on publish.
Wouldn't a breaking change like that cause bigger problems anyway, since crates.io is by no means the only downstream parser and consumer of cargo manifests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not even talking about breaking backwards compatibility but breaking forward compatibility.
Think of cases like:
- Add new supported values
- Add new supported types
Cargo needs to have a strict parser for these because an unknown value could have a major impact on behavior. If another tool is acting off of the field, they should hard error as well. However, we need to be able to add new values or else cargo is stuck as-is. This is possible to support in serde using a catch-all variant. cargo_toml
instead errors on unknown values.
A trickier one is adding new types which we've done but cargo_metadata
also has hard errors for that.
So assuming we only maintain backward compatibility. Most likely we'll be playing whack-a-mole with cargo_toml
and cargo_metadata
for them to properly handle new values being emitted.
We then have to deal with making sure they are updated when the new values are out (at minimum, a strong communication path).
Tying this back to this PR, making most of cargo_toml
loosey-goosey for compatibility removes most of the validation that crates-io can do.
Independent of all of that., in my opinion, is that pushing responsibility for this stuff to third-parties has made it so we are unaware of the problems their users face and it is easier to make innocuous changes that break them
we've discussed this PR in the crates.io team meeting last friday and decided to move ahead with this change. we might eventually migrate the code to use https://github.com/LukeMathWalker/cargo-manifest instead, but since it is currently still a bit rough around the edges this will be a follow-up PR. |
@Turbo87 The cargo team had some concerns over the specifics of this approach and how it could cause problems for introducing changes in cargo. I was wondering if you would be willing to discuss possible changes to help avoid problems for us and our users? For example, this would have prevented people from publishing if they started using the new debug profile settings that were recently introduced in cargo. We don't have any mechanism in place to identify when there will be problems. And users won't hit it until usually after stabilization, which could result in a frustrating experience for users when a new feature doesn't work (and doesn't work until the very last step of development with the publishing step). Using third-party crates out of our control also means there would be an indefinite timeline for fixing it, or require more work on your part to fork and patch things. Are there options or ideas that we could discuss to try to avoid these problems? I think a long-term goal is possibly for the cargo team to provide a more stable set of serde-based structures, and then we could have a process to immediately push those changes to crates.io and get it updated here. But I think that will likely be a long while for that to settle. Are there alternatives, like possibly using untyped parsers like |
you can always try to publish to the crates.io staging environment :)
our assumption is that the cargo team notifies us about planned changes to the manifest format, so that we can implement them on our side. I don't expect this to take more than a couple of days, so by the time a change gets stabilized our server should support it.
yes, we are aware of this tradeoff. the upside is that we don't have to maintain all of it ourselves if there already exist at least three such libraries on crates.io. we also discussed migrating to If we indeed see that the maintainers of these crates are indeed not responsive to planned changes we can always fork, but until then I'd rather not increase our maintenance burden unnecessarily.
👍
that again would mean maintaining yet another implementation, while these third party crates would have to eventually be updated anyway to keep working. |
As another failure mode of this situation, which comes across as an endorsement from the Project when the Cargo team has specifically not endorsed it. |
This PR switches the
crates_io_tarball
package over to using the https://crates.io/crates/cargo_toml forCargo.toml
file parsing.This should significantly lower the chances of invalid
Cargo.toml
files ending up on crates.io, since a lot more fields are now properly validated.Note that
cargo_toml
supports workspace inheritance by default, so we have to run an additional validation step to ensure that our server only accepts manifest files without any workspace inheritance.Similarly, the crate accepts
rust-version
as any randomString
, so I had to move our validation check out of theserde
deserialization code path.