Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Git submodules in source-package-repository #7625

Merged
merged 3 commits into from
Oct 22, 2021

Conversation

akshaymankar
Copy link
Collaborator

@akshaymankar akshaymankar commented Sep 7, 2021

This PR tries to solve #5536. I tried to use --recurse-submodules flags in clone and reset but they don't work very well when the repo is cloned with --no-checkout. So, I just implemented the solution discussed in the issue, this includes changing checkout to reset --hard.


Please include the following checklist in your PR:

Please also shortly describe how you tested your change. Bonus points for added tests!

I manually tested this in two ways:

  1. Without a dist-newstyle in a project which depends on a repository with a submodule. I used cabal from this fork and see if the repository gets cloned correctly and the project compiles.
  2. With an already existing dist-newstyle (compiled by cabal 3.6.0.0) of the same project. This dist-newstyle had the previously cloned the dependency without the submodules checked out. After upgrading to this fork, it was able to use the old directory without any problems.

I am not sure where to add automated tests for this. I would be happy to add them if someone points me to the place to add them.

akshaymankar added a commit to akshaymankar/cabal that referenced this pull request Sep 7, 2021
akshaymankar added a commit to akshaymankar/cabal that referenced this pull request Sep 7, 2021
@Mikolaj
Copy link
Member

Mikolaj commented Sep 7, 2021

If you run git grep -30 "git'\? ", does any of tests that turn up seem a good candidate to copy-paste and modify? I'd avoid tests that invoke v1- commands and probabably also avoid Setup commands, but stick to v2- commands instead.

akshaymankar added a commit to akshaymankar/cabal that referenced this pull request Sep 11, 2021
@akshaymankar akshaymankar force-pushed the submodules branch 2 times, most recently from f07c5d7 to cf5e49a Compare September 11, 2021 16:21
@akshaymankar
Copy link
Collaborator Author

@Mikolaj Thanks for the pointer. I found some property tests and added submodule support in there. It uncovered an edge case in git's handling of submodules. Fortunately I was able to work around that 😄

@Mikolaj
Copy link
Member

Mikolaj commented Sep 13, 2021

@akshaymankar: do the CI failures look relevant or is this only some transient breakage?

@akshaymankar
Copy link
Collaborator Author

Looks like GHC 8.8 and below couldn't infer the kind of the type level variable I introduced. I have added an explicit annotation now. It at least compiles with 8.8.

@akshaymankar akshaymankar force-pushed the submodules branch 2 times, most recently from 6265820 to 62b6df0 Compare September 15, 2021 05:52
@akshaymankar
Copy link
Collaborator Author

akshaymankar commented Sep 15, 2021

The older git (2.17.1) in the bionic container is stricter about git submodule foreach argument ordering than the one on my computer (2.32.0). I have fixed that by changing the order.

I also noticed that I wasn't respecting the verboseArg in the new commands, I have also fixed that.

Copy link
Member

@Mikolaj Mikolaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the tests.

}
| tag <- maybeToList (srpTag repo) ]
++ [ git (resetArgs tag) | tag <- maybeToList (srpTag repo) ]
++ [ git (["submodule", "sync", "--recursive"] ++ verboseArg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much does this, and other more complex git commands you introduced, slow down the cases where there are no submodules or where submodules are handled fine already (are there such cases?). What other risks there are from these more complex commands?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much does this, and other more complex git commands you introduced, slow down the cases where there are no submodules?

I just tried timing the new commands on the cabal repo and got:

$ time git submodule deinit --force --all
git submodule deinit --force --all  0.03s user 0.04s system 113% cpu 0.056 total

$ time git submodule sync
git submodule sync  0.04s user 0.03s system 109% cpu 0.062 total

$ time git submodule update --init --recursive --force
git submodule update --init --recursive --force  0.03s user 0.04s system 109% cpu 0.066 total

$ time git submodule foreach --recursive "git clean -ffdxq"
git submodule foreach --recursive "git clean -ffdxq"  0.03s user 0.04s system 110% cpu 0.061 total

$ time git clean -ffdxq
git clean -ffdxq  0.00s user 0.01s system 96% cpu 0.011 total

So, with this imprecise test, it seems like this PR may have added extra latency of ~0.15 seconds per repository (which doesn't have submodules).

or where submodules are handled fine already (are there such cases?)

I don't think there were any cases where submodules worked.

What other risks there are from these more complex commands?

I can see this becoming a breaking change for some setup where a package repository submodules a private repository which is not needed for compiling haskell. To support this, a flag might be required to disable submodule fetching.

I can't think of any other risks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this becoming a breaking change for some setup where a package repository submodules a private repository which is not needed for compiling haskell. To support this, a flag might be required to disable submodule fetching.

How do you feel about waiting with that flag until we get actual reports that people need it? Is there a workaround for such people in the absence of the flag?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with waiting for the flag.
As a work around maybe people could maintain forks of of depedencies which remove such submodules, but this would be very inconvenient for repositories which need to be updated a lot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So finally we will not have the flag to hide the new feature?

Out of curiosity, did you test it (performance oriented) with large git repos with many submodules?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it with a large git repo, there wasn't much difference, most time was just spent first time cloning the base repo. The repo had 3 submodules, which were all small, so fetching them was comparatively quick.

Copy link
Member

@Mikolaj Mikolaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and the overhead and risks seem acceptable, though my knowledge of the matter is limited (e.g., I can't imagine corner cases and I have no idea how many projects currently use the features and may either benefit from this PR or may need to adjust to it).

@Mikolaj
Copy link
Member

Mikolaj commented Sep 30, 2021

If nobody reviews this soon, I'm going to abuse my privilege and merge it alone. Please don't make me a worse person.

@jneira jneira changed the title Support Git submodules in source-package-respository Support Git submodules in source-package-repository Oct 1, 2021
@jneira
Copy link
Member

jneira commented Oct 1, 2021

If nobody reviews this soon, I'm going to abuse my privilege and merge it alone. Please don't make me a worse person.

i am gonna try it in windows with a hls version which had submodules: https://github.com/haskell/haskell-language-server/tree/e4f677e1780fe85a02b99a09404a0a3c3ab5ce7c (only one)

do you know some haskell repo with a good amount of submodules?

@jneira jneira linked an issue Oct 1, 2021 that may be closed by this pull request
@jneira
Copy link
Member

jneira commented Oct 1, 2021

The build has been succesful in my windows machine. 💯
However i've observed the submodules are initialized even if you dont use a submodule as subdir or you dont have subdirdefined at all, would be ask for too much only trigger the submodules stuff only when the subdir is one of them (afaik it would not be needed in other case)?
That would make the perf penalty not noticed for existing use cases, which cant use submodules right now.

@jneira
Copy link
Member

jneira commented Oct 1, 2021

would be ask for too much only trigger the submodules stuff only when the subdir is one of them (afaik it would not be needed in other case)? That would make the perf penalty not noticed for existing use cases, which cant use submodules right now.

well a corner case would be use a s-r-p without subdir which has source dirs (or other dirs needed for the build) as submodules (or use a subdir which has source dirs as submodules) 🤔

maybe a flag to enable submodules would make more sense

out of scope of this pr (and not asking for a change): maybe the more generic way would be let user tell cabal what concrete git commands would be needed to get the s-r-p, via a script or something

@Mikolaj
Copy link
Member

Mikolaj commented Oct 4, 2021

BTW, I've been told this project is a good test of a possible slowdown for projects with no submodules, because it has so many source-repository-package which are slow already, in particular when rebuilding things:

https://github.com/input-output-hk/plutus-pioneer-program/blob/main/code/week06/cabal.project

If this test shows there is no problem, I'm fine accepting the PR and we'd worry when users with unused submodules complain (but they may be willing to accept some slowdown in unused modules for convenience in used submodules; also the scheme with submodules tends to make caching rebuilds harder, so that's not the route to take if somebody prioritizes (re)compilation speed --- the link above [edit:probably] is the current optimum, as verbose as it is). [Edit: also, if I understand correctly, checking out unused submodules happens only once, right? Not on rebuilds of the project?]

@jneira
Copy link
Member

jneira commented Oct 6, 2021

@akshaymankar i labeled with info-needed to reflect @Mikolaj request for manual testing perf over a repo with lot of s-r-p

@Mikolaj
Copy link
Member

Mikolaj commented Oct 6, 2021

@akshaymankar: I'm aware this review drags on for a month now. Please let us know if you have a deadline or if you are just fed up already. :) We are quite stretched, hence the delay, but we appreciate contributions all the more and we would hate to discourage yours.

@akshaymankar
Copy link
Collaborator Author

I understand that OSS contributions can take some time to get responses. So, I am not fed up yet 😄
Unfortunately, I have had to travel on short notice and I am working and living in different time zones, so I have not had time to look at this. I will try to get to this by the end of the weekend 🤞

@akshaymankar
Copy link
Collaborator Author

Hello, I finally found time!
I ran a couple of tests on a new t4g.nano instance in AWS which never had a cabal update. Not running cabal update ensured that cabal stopped right after getting the source-repository-package, so i could use the time command to measure the time. Here are my observations with the project suggested by @Mikolaj:

  1. I first ran rm -rf dist-newstyle && time cabal build --verbosity=0 three time. Here are the times:

    Cabal 3.4 Cabal from this branch
    attempt 1 70.102s 72.241s
    attempt 2 71.059s 71.971s
    attempt 3 72.656s 73.870s
  2. I ran time cabal build --verbosity=0 between each clean attempt, cabal still ran some git commands. I am not sure why cabal does this, but this happens even in cabal 3.4. Any subsequent cabal build runs do no do this. Here are the times for that:

    Cabal 3.4 Cabal from this branch
    attempt 1 7.408s 9.258s
    attempt 2 9.070s 10.760s
    attempt 3 9.140s 9.888s

This shows about 1-2 second slowdown on first two runs of cabal build for a cabal.projectwith 14 source-repository-packages. This seems OK to me.

@Mikolaj
Copy link
Member

Mikolaj commented Oct 22, 2021

@akshaymankar: amazing. Let's fire the rockets!

@Mikolaj Mikolaj added the merge me Tell Mergify Bot to merge label Oct 22, 2021
@Mikolaj
Copy link
Member

Mikolaj commented Oct 22, 2021

@Mergifyio rebase

The `RepoRecipe` type now takes a type level argument which tells the arbitrary
instance whether to allow submodules in the recipe or not.
@mergify
Copy link
Contributor

mergify bot commented Oct 22, 2021

rebase

✅ Branch has been successfully rebased

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge me Tell Mergify Bot to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cabal.project: source-package-repository with submodule
5 participants