Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn about cargo publish with git submodules not checked out #8635

Open
joshtriplett opened this issue Aug 19, 2020 · 12 comments
Open

Warn about cargo publish with git submodules not checked out #8635

joshtriplett opened this issue Aug 19, 2020 · 12 comments
Labels
A-git Area: anything dealing with git C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Command-publish S-needs-mentor Status: Issue or feature is accepted, but needs a team member to commit to helping and reviewing.

Comments

@joshtriplett
Copy link
Member

joshtriplett commented Aug 19, 2020

I've managed to publish crates without checking out the git submodules first, and I've seen others do the same. Could cargo publish check if you have a git submodule but don't have it checked out?

As an added bonus, we could only check that if the submodule isn't excluded (or not-included) from the published crate, but that isn't necessary for a first implementation since you can always run cargo publish --allow-dirty.

@joshtriplett joshtriplett added the C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` label Aug 19, 2020
@ehuss ehuss added A-git Area: anything dealing with git Command-publish labels Aug 19, 2020
@joshtriplett
Copy link
Member Author

Tagging as E-help-wanted. A quick guide to implementation: cargo already checks on cargo publish for a "dirty" git repository, meaning you have changes that aren't committed. Implementing this change would involve additionally iterating over git submodules and checking if they're checked out. Our current status check should verify that a checked-out submodule is checked out at the version from the current commit (though we should confirm that). The added check should cover the case where the submodule isn't checked out at all.

@hbina
Copy link
Contributor

hbina commented Aug 20, 2020

Hi, I am trying to debug this problem.

Are you possibly referring to this snippet here? It seems to have already checked for dirty submodules but maybe the logic is flawed?

fn git(
p: &Package,
src_files: &[PathBuf],
repo: &git2::Repository,
) -> CargoResult<Option<String>> {
let workdir = repo.workdir().unwrap();
let mut sub_repos = Vec::new();
open_submodules(repo, &mut sub_repos)?;
// Sort so that longest paths are first, to check nested submodules first.
sub_repos.sort_unstable_by(|a, b| b.0.as_os_str().len().cmp(&a.0.as_os_str().len()));
let submodule_dirty = |path: &Path| -> bool {
sub_repos
.iter()
.filter(|(sub_path, _sub_repo)| path.starts_with(sub_path))
.any(|(sub_path, sub_repo)| {
let relative = path.strip_prefix(sub_path).unwrap();
sub_repo
.status_file(relative)
.map(|status| status != git2::Status::CURRENT)
.unwrap_or(false)
})
};
let dirty = src_files
.iter()
.filter(|file| {
let relative = file.strip_prefix(workdir).unwrap();
if let Ok(status) = repo.status_file(relative) {
if status == git2::Status::CURRENT {
false
} else if relative.file_name().and_then(|s| s.to_str()).unwrap_or("")
== "Cargo.lock"
{
// It is OK to include this file even if it is ignored.
status != git2::Status::IGNORED
} else {
true
}
} else {
submodule_dirty(file)
}
})
.map(|path| {
path.strip_prefix(p.root())
.unwrap_or(path)
.display()
.to_string()
})
.collect::<Vec<_>>();
if dirty.is_empty() {
let rev_obj = repo.revparse_single("HEAD")?;
Ok(Some(rev_obj.id().to_string()))
} else {
anyhow::bail!(
"{} files in the working directory contain changes that were \
not yet committed into git:\n\n{}\n\n\
to proceed despite this and include the uncommitted changes, pass the `--allow-dirty` flag",
dirty.len(),
dirty.join("\n")
)
}
}

In particular, it will perform the submodule check iff this match fails

if let Ok(status) = repo.status_file(relative) {

And I dont quite understand how git internally works...

Edit: Also, do you have a project where I can reproduce this by changing one of its submodule? I tried creating a dummy project with a submodule but it does not work? Specifically, the submodule's source code is not included in here

let src_files = src.list_files(pkg)?;

@ehuss
Copy link
Contributor

ehuss commented Aug 20, 2020

@hbina The issue is this snippet:

// Ignore submodules that don't open, they are probably not initialized.
// If its files are required, then the verification step should fail.
if let Ok(sub_repo) = submodule.open() {
open_submodules(&sub_repo, sub_repos)?;
sub_repos.push((sub_repo.workdir().unwrap().to_owned(), sub_repo));
}

The comment there is somewhat incorrect. Some projects are structured in such a way that even the verification step won't fail if the files are missing. For example, some C library bindings may use the system library by default, and only use the local code (which is often included via a git submodule) when a certain feature is enabled.

The intent of this issue is that call to submodule.open() should be treated as an error if it fails and --allow-dirty is not specified and that path would normally be included in the project. That last point may be tricky to determine.

There are several projects you can try that have a submodule. Some examples:

These should all fail if the submodule has not been initialized.

@ehuss
Copy link
Contributor

ehuss commented Aug 21, 2020

It's not too complicated, but I don't think that there is any function available to get the answer easily. I'm not sure, but I think the best approach would be to test the submodule path against the include and exclude fields of the package (you can get those from pkg.manifest().include() and pkg.manifest().exclude()). Those values are gitignore-style paths. The code that normally handles those is here. Unfortunately, that code is structured to return a list of files from the filesystem, but when the submodule doesn't exist, it silently skips over the submodules.

I would create the GitignoreBuilder like these lines, and just test if the submodule path matches the include or exclude list. There are some edge cases to consider, like if both lists are empty (in which case it should check if the submodule directory is below the package root), or if both are set (I think exclude is ignored?).

@hbina
Copy link
Contributor

hbina commented Aug 21, 2020

Thanks, I will try this.

@fechu
Copy link

fechu commented Sep 18, 2021

@ehuss Is this still an issue? If yes, I'd like to tackle it. I fee this is a good issue to get started to contribute.

@ehuss
Copy link
Contributor

ehuss commented Sep 20, 2021

Yea, sure! This may not be terribly easy, but the comments above should be a guide on where to get started.

@fechu
Copy link

fechu commented Sep 23, 2021

I would create the GitignoreBuilder like these lines, and just test if the submodule path matches the include or exclude list. There are some edge cases to consider, like if both lists are empty (in which case it should check if the submodule directory is below the package root), or if both are set (I think exclude is ignored?).

I looked into this more closely and it turns out it is unfortunately not as straight forward as running the submodule path against the include and exclude list. The include list could look like: include = ["lib1/file.rs"] (assuming lib1 is a submodule). In that case we would check the path "lib1" (the path of the submodule) against the pattern lib1/file.rs which is obviously no match. It would however work for the more general case if the include list is include = ["lib1"]

I don't see a way to resolve this problem. For the general case there is simply no way to know whether there is something in the submodule that is needed for the package build.

What is your oppinion @ehuss? Might a better solution for example be to print a warning that a not initialized submodule was encountered?

@ehuss
Copy link
Contributor

ehuss commented Sep 26, 2021

Hm, good point!

Thinking about this some more, there are a couple different issues:

  • If you list a file or directory of a submodule in the include list, and the submodule isn't checked out, there is no warning. This is Cargo should warn when "include" doesn't match any files, or filters files from parent directories #9667. I think that should be fixed independently (and would be good to fix!).
  • If you don't list an include or exclude, then all directories in the package will be implicitly included. However, if there is a missing submodule, it is silent. I think in that case, it would be good to error if a submodule would normally be included, but it is missing (unless --allow-dirty is passed).

Does that sound feasible? I haven't looked at how that could be implemented, but I'm guessing it shouldn't be too hard to check if the submodule path is under the package directory, and check if the submodule is initialized.

@Rustin170506
Copy link
Member

@rustbot claim

@ruabmbua
Copy link

ruabmbua commented Jan 3, 2022

I ran into an issue, where I manually merged a PR for a project of mine, but forgot to also update the git submodule. I published a new version of the crate to crates.io, and cargo publish did not complain about the dirty submodule, although it was shown as such in git status.

@ruabmbua
Copy link

ruabmbua commented Jan 3, 2022

I think this can happen quite easily, when one is not aware of this problem, and simply trusts cargo publish.

Imagine someone pushing a security update for a crate, which just updates a submodule dependency. Unfixed security issue on crates.io, and nobody notices. Especially since I do not know a way to check the actual source on crates.io. I usually just check the referenced repository, only sometimes do I bother to lookup the crate source code in my ~/.cargo folder.

@Rustin170506 Rustin170506 removed their assignment Feb 22, 2022
jean-airoldie added a commit to jean-airoldie/zeromq-src-rs that referenced this issue Dec 16, 2022
It seems that if a git submodule is not properly initialize on the machine that will publish the crate, cargo won't warn or detect it, as described in this issue rust-lang/cargo#8635. This is hard to detect because the CI is setup to properly `git clone --recursive` and therefore should not detect this issue.

To reduce the likelyhood of #23 from happening again in the future, the `testcrate` was added as a dev-dedependecy to the root crate and the tests were moved to the root crate so that if `cargo test` is ran (even without the --workspace flag), the tests won't compile and the error should be detected.
@weihanglo weihanglo added S-needs-mentor Status: Issue or feature is accepted, but needs a team member to commit to helping and reviewing. and removed E-help-wanted labels Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-git Area: anything dealing with git C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Command-publish S-needs-mentor Status: Issue or feature is accepted, but needs a team member to commit to helping and reviewing.
Projects
None yet
Development

No branches or pull requests

7 participants