Generate .cargo_vcs_info.json and include in `cargo package` #5786

dekellum · 2018-07-24T23:16:19Z

Implements #5629

Edit 1, implementation notes:

I elected to generate the JSON with link in filesystem as target/.cargo_vcs_info.json (compare to target/.rustc_info.json). This makes it usable as a sentinel file. It also makes windows-mandory vs unix-advisory FileLocks a relevent distinction, which required some trial and error on appveyor CI here, now all sorted.

I found one test case (cargo test package::package_verbose) with different output on windows, to quote b553d72:

The package_verbose test has different output on windows, and from that output I can only conclude that on windows, and windows only, cargo is failing to find the parent repo. Note that difference. As the new .cargo_vcs_info.json of PR #5786 is best effort and informative only, this independent problem on windows seems acceptable [for] the feature.

Let me know if you would also like a new issue for that test difference. Cargo windows bug?

Edit 2, more implementation notes:

Note that I needed to adjust the handling of listing in ops/cargo_package vs checks such as what is now check_repo_state (previously named check_not_dirty). This is so that .cargo_vcs_info.json, if it can be produced, is included in the listing. (It would seem poor to omit there.) A side effect of this, is that a user might need to use the --allow-dirty flag to get a listing if the repo is dirty. That doesn't seem onerous to me. One test case reflects that change.

…vior

rust-highfive · 2018-07-24T23:16:31Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

dekellum · 2018-07-25T04:52:26Z

The windows CI failures are real. I'll fix that with additional commits here.

The package_verbose test has different output on windows, and from that output I can only conclude that on windows, and windows only, cargo is failing to find the parent repo. Note that difference. As the new .cargo_vcs_info.json of PR rust-lang#5786 is a best effort, informative only, this independent problem on windows seems acceptable to the feature.

dekellum · 2018-07-27T12:35:29Z

Ping @alexcrichton, this is ready for review.

alexcrichton

Awesome, thanks for this! Could a test also be added to ensure that this file exists and has an expected structure?

Also cc @rust-lang/cargo, do any of y'all have feelings about this? This is a pretty minor feature but seems like it could be useful in the future! We may also want to have a bit more discussion about exactly what goes into this file

And finally, could documentation be added for this as well? Just to document the json format and such.

alexcrichton · 2018-07-27T23:27:44Z

src/cargo/ops/cargo_package.rs

+        // Manual JSON format is safe given sha1 hash is hex encoded.
+        writeln!(ifile, r#"{{"git_head_revision_sha1":"{}"}}"#, hsh)?;
+        ifile.flush()?;
+        ifile.file().sync_all()?;


Hm could you elaborate a bit on why the file locks and syncs are needed here? I was thinking we'd just synthesize this file directly into the tarball (like Cargo.toml.orig without actually putting it on the filesystem)

I figured early on that I wanted a sentry file to try and make it at least inconvenient to spoof the hash, for example by protecting against interference from another cargo package process. The changes would be reduced without the sentry file however. Do you think the anti-spoof efforts don't justify the increased complexity? Note: The test failure by platform diff will remain an issue even without the linked sentry file.

Another early thought was that cargo package was doing something sentry file like, with the tarball produced being valided before being moved in place, so this was for parity.

alexcrichton · 2018-07-27T23:28:49Z

tests/testsuite/package.rs

+        );
+    }
+
+    // FIXME: From this required change in the test (omits final


Hm this seems like it's a bit worrisome, do we know what changed from before to cause this test to fail on windows?

No we don't know that anything changed. the prior check_not_dirty would not bail! (which is avoided for the tests on both platforms) if either no-git-repo-found or git-repo-clean. So if there has been a change with this PR, for windows, then its a change between one of these results to the other. There is no existing test to pin down which one it is.

Also I've changed the order of checks (in cargo_package) vs list output, so another side-effect could be an earlier failure in a check, but that is not apparent. With the PR changes, cargo package just won't generate a .cargo_vcs_info.json file in the repo-not-found or git-repo-clean cases, which makes the output, coincidentally, the same as before.

So I'm asking to move forward with merging this, then followup with a specific test for this which can be backported if necessary for personal diagnostic purposes. Thoughts?

Also: I have a logic flaw in the code comment, clarified above, which I'll fix. When I manually reviewed test fixtures, it looked like the sub-repo was dirty, so that makes me think this the no-git-repo-found case.

joshtriplett · 2018-07-27T23:59:14Z

I'm thrilled to have this information available.

I do think we should generate this directly into the tarball, if we can.

dekellum · 2018-07-28T04:51:27Z

We certainly can, per my understanding of the Tar interface, but the somewhat higher bar, ¿neo-? ¿psuedo-? rhetorical question for you both (@joshtriplett @alexcrichton)—what are the odds someone will later file an issue, and subsequent PR to fix the direct impl. with a locking sentry file? Example titles:

Issue: Cargo VCS Info File is vulnerable to spoofing/forgery attacks
Pull: Restore sentry file target/.cargo_vcs_info.json for security and parity with tarball safety

As I personally think the odds of the above items are quite high, by objective review of past issues/PRs, I respectfully request reconsideration of this PR in its original form.

While you consider that, I will be attempting a reliable test of the no-git-repo-found or git-repo-clean conditions on Windows. :-)

joshtriplett · 2018-07-28T04:56:59Z

It's a client-side mechanism, it's always going to be possible to fake it; let's not pretend it's a security mechanism. It's helpful to inform people, and if the build is reproducible then it can be cross-checked by independent builds.

Some basic guard to avoid unintentional concurrent builds in the same target directory seems like a feature, but that's orthogonal to this.

dekellum · 2018-07-28T14:12:12Z

I accept that the sentry aspect is security in depth at best. I thought I was careful not to suggest greater sanctimony, see above:

make it at least inconvenient to spoof the hash,

...and its not entirely orthogonal. That would require significant independent new feature development. As in stands in master, the tarball acts like a guard but check_not_dirty doesn't somehow reserve that file before the git-repo-clean check. Please accept my attempts to preserve the sentry aspect in this PR, despite it being of limited value.

github cc: rust-lang#5786

dekellum · 2018-07-28T15:04:32Z

Prior to this PR and with #5820, the package_verbose tests now fails with the warning on windows:

https://ci.appveyor.com/project/rust-lang-libs/cargo/build/1.0.5293/job/ug1evef71f6d6pav#L1656

While the test succeeds without warning on not(windows):

https://travis-ci.org/rust-lang/cargo/builds/409298348?utm_source=github_status&utm_medium=notification

So the no-git-repo-found case is causing the disparity in the package_verbose test, as I feared. Suggested plan of action:

~~1. I'll cherry-pick and adjust that warning here, and add the warning output to the adjusted package_verbose test, windows variant.~~ (edit: now a potential merge rather than cherry-pick)
~~2. I'll open a new issue, working title: "no-git-repo-found on windows-only, for package_verbose test"~~ (opened: Issue: #5823 PR: #5820)
3. I'll add an explicit test of the vcs info file content
4. You can merge this, without any remaining concerns on regressions.

@alexcrichton : reasonable plan?

joshtriplett · 2018-07-28T18:16:38Z

@dekellum You were clear in your original post; I was responding to your comment at #5786 (comment) .

I don't have any objection to attempting to detect concurrent builds; my only objection is that doing that only in the specific case of building in a git repository, rather than in every case, seems wrong to me.

joshtriplett · 2018-07-28T18:23:42Z

On a different note: we don't currently have a check for packages already containing a file Cargo.toml.orig; as far as I can tell, we don't have any precedent in Cargo for "you can't include a file named like this". It seems a bit unfortunate to add such a precedent.

I understand the rationale for doing so; it just seems unfortunate.

dekellum · 2018-07-28T18:55:40Z

@joshtriplett, regarding comment I would say the strategy is instead: write the .cargo_vcs_info.json under a lock, ensuring the same file makes it into the tarball. It just so happens that that file only contains metadata from git currently. In the future (see discussion in #5629) it may be extended to other (D)VCS like hg, or with other cargo assertions/metadata. With this reasoning there is no precedent set as this being a git-specific feature.

On the latter point: It seems completely reasonable to me for any package manager, cargo included, to reserve certain file names for which it is to control the providence, format and meaning. Related, Cargo.toml and Cargo.lock are parsed and reformatted. For consistency, should I add the same check for Cargo.toml.orig, erroring if that is present in the source? That could also be added separate from this PR and that would be my preferred route.

Ok with me proceeding as per plan? Note I'm also trying in parallel to just fix the windows repo difference noted above.

joshtriplett · 2018-07-28T19:00:13Z

@dekellum This shouldn't be specific to packages stored in a VCS, either. Why not just have a general check to avoid concurrent builds, rather than as a side effect of a lock on a particular file?

(I'm not objecting to using open_rw and similar to obtain locks; I'm just suggesting that the exclusion of concurrent builds should be handled in a way that applies to all builds, not just VCS builds.)

As for reserving filenames: right now, as far as I can tell, Cargo doesn't reserve any names, even those it uses (which is potentially a problem). Changing that could potentially introduce compatibility issues, as well as interesting questions. What happens, for instance, if someone wants to have a directory of files used in test cases?

I'm not trying to come up with arbitrary corner cases, and I'm not necessarily objecting to blocking the use of certain filenames, but we'd need to handle that carefully and think about it explicitly. What could potentially go wrong with starting to reserve filenames and prevent them from being in a Cargo crate?

dekellum · 2018-07-28T19:20:07Z

This PR agrees I think: check_vcs_file_collision is run regardless of if the project is in git.

dekellum · 2018-07-28T19:21:21Z

The general check would be a /var/run (or ./target/lock) lock file for cargo. That's beyond the scope of both this PR and the original feature request (#5629)

…t-head-sha1

dekellum · 2018-07-28T20:49:12Z

Its probably best to defer this until #5820 is merged. I can then resolve the conflicts here and avoid the unfortunate windows specific test allowance.

alexcrichton · 2018-07-30T22:59:27Z

@dekellum let's remove the synchronization logic here. As @joshtriplett mentioned this isn't buying us any security really and this file is largely going to only be informative to other tools downloading from the registry.

dekellum · 2018-07-30T23:23:08Z

I strongly disagree with "this isn't buying us any security really" part of your statement, for the reasons previously described. Does your preferred direction also include hamstringing (and/or subverting) the feature by allowing the same filename in the source?

alexcrichton · 2018-07-31T14:36:31Z

@dekellum this is not a security feature, full stop. This is an informational feature for downloading crates from crates.io and allowing tools to opportunistically connect the source of a downloaded crate to the exact commit of a git repository.

Arbitrary tarballs can be uploaded to the registry at any time, there's practically no restrictions other than file size. We do not have a threat model here, nor are we trying to develop a threat model. Publishing and packaging are "best effort" when it comes to concurrent publication because it almost never reasonably happens, so the file locking here is not necessary.

dekellum · 2018-07-31T15:43:09Z

Fun fact: When new people come to your project, they frequently attempt to emulate design choices made in the existing code. May I ask why there is the (somewhat inflexible) flock utility in cargo and why there is specific effort made to move the tarball into position only after it is completely built? Why aren't you clamoring to yank that sentry aspect out? And before your full stop, you haven't even illuminated me as to your position for reserving the file name/tarball path. I guess "help wanted" on the original feature request was a good bit more provisional than I originally imagined.

dekellum · 2018-07-31T19:50:34Z

With my efforts in #5820 decisively thwarted, I think this PR is now similarly rendered immovable and you can close it if you wish, as in full stop.

I just want to leave one last note here for posterity: another way to achieve my initial (apparently not entirely embraced) objective of a reasonably reliable git sha1 hash recording, would be to change fn tar to use the git local repo for the file contents, at the same sha1, falling back to the working tree if-and-only-if an include'd file is not found in the repo at that sha1. This makes the process more like an atomic snapshot and no sentry file or other locking (here or on cargo master for that matter) would be required.

Best of luck.

alexcrichton · 2018-07-31T22:15:50Z

@dekellum the flock utilities are there to handle concurrent invocations of cargo build and concurrent updates of the registry and to avoid having instances of Cargo stomp all over one another, it's not meant for intra-process synchronization at all. The tarball is only in place after it's built to avoid using a stale tarball if you ctrl-c one cargo package process and then cargo publish again. That's been a bug that's historically bitten Cargo. The sha1 hash for a git repo is always calculated, so it doesn't suffer from either of these problems.

I do not have an opinion about the file/tarball path option. It's fine to go either way, I was trying to be clear about a different aspect of this PR before moving on to that.

dekellum · 2018-08-13T18:01:27Z

Closing in favor of #5886.

@alexcrichton

Generate .cargo_vcs_info.json and include in `cargo package` (take 2) Implements #5629 and supersedes #5786, with the following improvements: * With an upstream git2-rs fix (tracked #5823, validated and min version update in: #5858), no longer requires windows/unix split tests. * Per review in #5786, drops file system output and locks for .cargo_vcs_info.json. * Now uses serde `json!` macro for json production with appropriate escaping. * Now includes a test of the output json format. Possible followup: * Per discussion in #5786, we could improve reliability of both the VCS dirty check and the git head sha1 recording by preferring (and/or warning otherwise) the local repository bytes of each source file, at the same head commit. This makes the process more appropriately like an atomic snapshot, with no sentry file or other locking required. However given my lack of a window's license and dev setup, as exhibited by troubles of #5823, this feel intuitively like too much to attempt to change in one iteration here. I accept the "best effort" concept of this feature as suggested in #5786, and think it acceptable progress if merged in this form. @alexcrichton r? @joshtriplett cc

dekellum added 4 commits July 24, 2018 15:41

Add generated .cargo_vcs_info.json (git hash) file to cargo package

5e31236

Only list_files once in package, since it warns as side-effect

fdd333f

Fix tests broken by new package vcs info file, and adjusted list beha…

ba64367

…vior

Add test package::vcs_file_collision

fb1bfdc

rust-highfive assigned alexcrichton Jul 24, 2018

dekellum changed the title ~~Generate .cargo_vcs_info.json and include in cargo package~~ [WiP] Generate .cargo_vcs_info.json and include in cargo package Jul 25, 2018

Downgrade the vcs_info flock to shared, read access

7e3cdcb

dekellum force-pushed the record-package-git-head-sha1 branch from f5e6562 to 7eb7720 Compare July 27, 2018 09:51

dekellum force-pushed the record-package-git-head-sha1 branch from 7eb7720 to b553d72 Compare July 27, 2018 09:56

dekellum changed the title ~~[WiP] Generate .cargo_vcs_info.json and include in cargo package~~ Generate .cargo_vcs_info.json and include in cargo package Jul 27, 2018

alexcrichton reviewed Jul 27, 2018

View reviewed changes

dekellum added a commit to dekellum/cargo that referenced this pull request Jul 28, 2018

In check_not_dirty, WARN if no (git) VCS is recognized; fix tests.

adc5074

github cc: rust-lang#5786

dekellum mentioned this pull request Jul 28, 2018

Improve check_not_dirty git repo detection and status shell output of results. #5820

Closed

dekellum mentioned this pull request Jul 28, 2018

No git repo found on windows (only) for package_verbose test #5823

Closed

Merge remote-tracking branch 'upstream/master' into record-package-gi…

cd21233

…t-head-sha1

This was referenced Aug 1, 2018

[WIP] Improve package check git logging and workaround windows issue #5848

Closed

Improve verbose console and log for finding git repo in package check #5858

Merged

dekellum mentioned this pull request Aug 13, 2018

Generate .cargo_vcs_info.json and include in cargo package (take 2) #5886

Merged

dekellum closed this Aug 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate .cargo_vcs_info.json and include in `cargo package` #5786

Generate .cargo_vcs_info.json and include in `cargo package` #5786

dekellum commented Jul 24, 2018 •

edited

Loading

rust-highfive commented Jul 24, 2018

dekellum commented Jul 25, 2018 •

edited

Loading

dekellum commented Jul 27, 2018

alexcrichton left a comment

alexcrichton Jul 27, 2018

dekellum Jul 27, 2018 •

edited

Loading

alexcrichton Jul 27, 2018

dekellum Jul 27, 2018 •

edited

Loading

dekellum Jul 27, 2018 •

edited

Loading

joshtriplett commented Jul 27, 2018

dekellum commented Jul 28, 2018 •

edited

Loading

joshtriplett commented Jul 28, 2018

dekellum commented Jul 28, 2018 •

edited

Loading

dekellum commented Jul 28, 2018 •

edited

Loading

joshtriplett commented Jul 28, 2018

joshtriplett commented Jul 28, 2018 •

edited

Loading

dekellum commented Jul 28, 2018 •

edited

Loading

joshtriplett commented Jul 28, 2018

dekellum commented Jul 28, 2018

dekellum commented Jul 28, 2018 •

edited

Loading

dekellum commented Jul 28, 2018

alexcrichton commented Jul 30, 2018

dekellum commented Jul 30, 2018 •

edited

Loading

alexcrichton commented Jul 31, 2018

dekellum commented Jul 31, 2018

dekellum commented Jul 31, 2018 •

edited

Loading

alexcrichton commented Jul 31, 2018

dekellum commented Aug 13, 2018

Generate .cargo_vcs_info.json and include in cargo package #5786

Generate .cargo_vcs_info.json and include in cargo package #5786

Conversation

dekellum commented Jul 24, 2018 • edited Loading

Edit 1, implementation notes:

Edit 2, more implementation notes:

rust-highfive commented Jul 24, 2018

dekellum commented Jul 25, 2018 • edited Loading

dekellum commented Jul 27, 2018

alexcrichton left a comment

Choose a reason for hiding this comment

alexcrichton Jul 27, 2018

Choose a reason for hiding this comment

dekellum Jul 27, 2018 • edited Loading

Choose a reason for hiding this comment

alexcrichton Jul 27, 2018

Choose a reason for hiding this comment

dekellum Jul 27, 2018 • edited Loading

Choose a reason for hiding this comment

dekellum Jul 27, 2018 • edited Loading

Choose a reason for hiding this comment

joshtriplett commented Jul 27, 2018

dekellum commented Jul 28, 2018 • edited Loading

joshtriplett commented Jul 28, 2018

dekellum commented Jul 28, 2018 • edited Loading

dekellum commented Jul 28, 2018 • edited Loading

joshtriplett commented Jul 28, 2018

joshtriplett commented Jul 28, 2018 • edited Loading

dekellum commented Jul 28, 2018 • edited Loading

joshtriplett commented Jul 28, 2018

dekellum commented Jul 28, 2018

dekellum commented Jul 28, 2018 • edited Loading

dekellum commented Jul 28, 2018

alexcrichton commented Jul 30, 2018

dekellum commented Jul 30, 2018 • edited Loading

alexcrichton commented Jul 31, 2018

dekellum commented Jul 31, 2018

dekellum commented Jul 31, 2018 • edited Loading

alexcrichton commented Jul 31, 2018

dekellum commented Aug 13, 2018

Generate .cargo_vcs_info.json and include in `cargo package` #5786

Generate .cargo_vcs_info.json and include in `cargo package` #5786

dekellum commented Jul 24, 2018 •

edited

Loading

dekellum commented Jul 25, 2018 •

edited

Loading

dekellum Jul 27, 2018 •

edited

Loading

dekellum Jul 27, 2018 •

edited

Loading

dekellum Jul 27, 2018 •

edited

Loading

dekellum commented Jul 28, 2018 •

edited

Loading

dekellum commented Jul 28, 2018 •

edited

Loading

dekellum commented Jul 28, 2018 •

edited

Loading

joshtriplett commented Jul 28, 2018 •

edited

Loading

dekellum commented Jul 28, 2018 •

edited

Loading

dekellum commented Jul 28, 2018 •

edited

Loading

dekellum commented Jul 30, 2018 •

edited

Loading

dekellum commented Jul 31, 2018 •

edited

Loading