Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-repository-package generates fat clones #7264

Closed
andreasabel opened this issue Jan 28, 2021 · 22 comments · Fixed by #10254
Closed

source-repository-package generates fat clones #7264

andreasabel opened this issue Jan 28, 2021 · 22 comments · Fixed by #10254

Comments

@andreasabel
Copy link
Member

Fetching a source-repository-package seems to make a full clone of the repo:

$ cabal --version
cabal-install version 3.5.0.0
compiled using version 3.5.0.0 of the Cabal library 

$ cat cabal.project 
optional-packages: Agda

source-repository-package
  type: git
  location: https://github.com/agda/agda.git
  tag: e12f391d2539b62a62d18aef74149a9a4695a871

package Agda
  ghc-options: -fno-expose-all-unfoldings -fno-specialise-aggressively

$ cabal build all
... cloning ...

$ du -d3 -h
...
221M	./dist-newstyle/src/agda-eef06c3b3e56f437
...

Isn't there a more economic (and at the same time faster) way to clone a git repo if one only wants the version at a specific commit?

@fgaz
Copy link
Member

fgaz commented Jan 28, 2021

I think it was done like this to allow fast tag switching and global caching like cargo does (I don't think the latter was ever completed). As a workaround, if you already cloned that repo in some other path, you could set location to that path. The referenced clone will be treated as a remote and won't be modified.

@fgaz
Copy link
Member

fgaz commented Jan 28, 2021

I guess there could be an option to create a local shallow clone instead, but we'd have to think about how it'd interact with other repo types, submodules...

@fgaz
Copy link
Member

fgaz commented Jan 28, 2021

Relevant old discussion starting from this comment: #5586 (comment)

@dysinger
Copy link

dysinger commented Jun 9, 2022

image
1GB+ of Amazonka because I need a specific sha (which is handed to cabal)

--depth=1 works wonders at not cloning a 1.3GB repo fully

@Mikolaj
Copy link
Member

Mikolaj commented Jun 9, 2022

Is --depth=1 what @fgaz called a "shallow clone" above? Does the workaround work for you?

@jchia
Copy link

jchia commented Jul 19, 2022

Is --depth=1 what @fgaz called a "shallow clone" above? Does the workaround work for you?

What is --depth=1? A cabal build option? I don't see it in the help.

If --depth=1 refers to the git clone option, how do you apply it to a git source-repository-package in a cabal file to make a 'workaround'?

@Mikolaj
Copy link
Member

Mikolaj commented Jul 19, 2022

Yes, that's a git option. I don't think it can be applied now, but @fgaz said "I guess there could be an option to create a local shallow clone instead" and we are asking whether that would suffice (also I'm asking whether "shallow clone" is the --depth=1 clone). If that's what users need, perhaps let's open a new ticket with that specific task and we'd signal that a PR implementing the ticket would likely be accepted.

@avanov
Copy link

avanov commented Jul 19, 2022

@Mikolaj yes that would suffice. A shallow clone is a repository instance with a truncated history down to the specified --depth N entries, where N=1 is the smallest possible. Many CI pipelines use predefined explicit 1 < depth < 100 to allow for immediate (right after cloning) local branch checkouts while still optimising for bandwidth and time savings on large repository clones. Shallow clones obviously have a few downsides around history availability compared to regular full clones, but for the purpose of cabal --depth 1 or --depth 20 would work without issues. Besides, every shallow repository can later be programmatically converted into a full repository via either git pull --unshallow or git fetch --unshallow.

FYI: Nix had to implement support for the flag some time ago as well - NixOS/nix#4455

@Mikolaj
Copy link
Member

Mikolaj commented Jul 19, 2022

Sounds good. What is the option called in Nix? Any other package managers or tools that do that and have good names? Do they take the depth parameter? Should we rather specify that in cabal.project or somewhere where the repo address is specified? Any other preliminary bikeshedding before we move for the main one to a new ticket?

@avanov
Copy link

avanov commented Jul 19, 2022

Nix uses shallow = true to enable a hardcoded --depth 1 option, i.e. they don't allow to specify a custom depth. TravisCI allows for a custom depth config option. Note, however, that git allows shallow clones to be created via:

It needs a further discussion to decide whether one or all of the methods should be supported, but the important part here is that the depth should be aligned with the checkout tag/branch option of source-repository-package:

Subsequent tag changes and repository fetches between cabal v2-build calls should be handled gracefully as well.

I assume a new source repository property depth and/or shallow-since could be added to indicate the depth in this case. Let's say something like:

source-repository-package
  type: git
  location: https://github.com/ucsd-progsys/liquidhaskell
  tag: b8dc0c2bdff8e6ea9ec4a9fc2439e89fdcd73b69
  depth: 1
  subdir:
       liquid-base
       liquid-prelude
       liquid-ghc-prim

Alternatively, If cabal uses libgit internally (I haven't checked), it can try to utilise the same API call as Rust's Cargo here to perform shallow cloning implicitly via a new API option. As this is a relatively new option, git servers answering the call should support recent protocol versions for the option to work as expected.

@andreasabel
Copy link
Member Author

I would simply add --depth 1 for all git cloning that cabal initiates (in all cases where this works). This should be the default. After all, you typically just want the read the repo contents for a specific commit, rather than having a clone with history and all that which you can use for blame etc. And, if needed, one can always manually unshallow.

@ulysses4ever
Copy link
Collaborator

Agree that --depth 1 should be the default.

@ParetoOptimalDev
Copy link

ParetoOptimalDev commented Jul 27, 2022

I vote:

  1. Add a depth option to the cabal file
  2. Wait a release
  3. Set the default depth to 1

@Mikolaj
Copy link
Member

Mikolaj commented Jul 27, 2022

I vote:

1. Add a depth option to the cabal file

2. Wait a release

3. Set the default depth to 1

If there is a warning about that in the depth option description and possibly elsewhere, including the cabal manual, then IMHO this is a very civilized way of introducing the breaking change. In other words, I vote to either go full hog on preventive warnings (do we have a volunteer for that?) or make the change in one fell swoop, which we have the right to do in a major version with a proper changelog. Half-measures are a waste of effort IMHO.

@andreasabel
Copy link
Member Author

Before we settle on depth: <natural number> we should have a field study what different VCS have, with the hope of finding an interface that not just git supports (an alt could be shallow: <boolean>).

@ffaf1
Copy link
Collaborator

ffaf1 commented Jul 27, 2022

Before we settle on depth: we should have a field study what different VCS have

Sadly, most projects have switched to git. darcs has --lazy, I don't believe Mercurial has a similar thing (without using extensions). shallow seems to capture the idea and adding a "downloads a shallow clone, if possible" in the option documentation should be enough.

Data point for what Haskell devs use for versioning.

@andreasabel
Copy link
Member Author

andreasabel commented Jul 27, 2022

Mercurial seems to have shallow clone via the --root <rev> option: https://www.mercurial-scm.org/wiki/ShallowClone
Sorry, this was just a proposal; it seems Mercurial doesn't support it.

@ParetoOptimalDev
Copy link

Before we settle on depth: <natural number> we should have a field study what different VCS have, with the hope of finding an interface that not just git supports (an alt could be shallow: <boolean>).

I don't think you want to abstract this detail. For git give depth, darcs give lazy, etc.

I guess the current interface tries to abstract away the dvcs details though?

@ffaf1
Copy link
Collaborator

ffaf1 commented Jul 27, 2022

I don't think you want to abstract this detail. For git give depth, darcs give lazy, etc.

I suspect fat/shallow is abstractable (“give me just enough to build this project with”). Whether that is good UX I cannot say, as I have never used the feature!

@andreasabel
Copy link
Member Author

@fgaz wrote

As a workaround, if you already cloned that repo in some other path, you could set location to that path. The referenced clone will be treated as a remote and won't be modified.

This still clones the whole thing. Even if it clones from a local source, it copies everything, swallowing disk space.

alt-romes added a commit to alt-romes/cabal that referenced this issue Aug 7, 2024
Cloning the entire repository for the purpose of compiling packages
specified in source-repository-packages is wasted effort. To read and
compile the package, we need only the HEAD of the repository, thus a
shallow clone is sufficient.

Fixes haskell#7264
alt-romes added a commit to alt-romes/cabal that referenced this issue Aug 12, 2024
Cloning the entire repository for the purpose of compiling packages
specified in source-repository-packages is wasted effort. To read and
compile the package, we need only the HEAD of the repository, thus a
shallow clone is sufficient.

Fixes haskell#7264
alt-romes added a commit to alt-romes/cabal that referenced this issue Aug 12, 2024
Cloning the entire repository for the purpose of compiling packages
specified in source-repository-packages is wasted effort. To read and
compile the package, we need only the HEAD of the repository, thus a
shallow clone is sufficient.

Fixes haskell#7264
alt-romes added a commit to alt-romes/cabal that referenced this issue Aug 12, 2024
Cloning the entire repository for the purpose of compiling packages
specified in source-repository-packages is wasted effort. To read and
compile the package, we need only the HEAD of the repository, thus a
shallow clone is sufficient.

Fixes haskell#7264
mpickering pushed a commit to mpickering/cabal that referenced this issue Aug 14, 2024
Cloning the entire repository for the purpose of compiling packages
specified in source-repository-packages is wasted effort. To read and
compile the package, we need only the HEAD of the repository, thus a
shallow clone is sufficient.

Fixes haskell#7264
alt-romes added a commit to alt-romes/cabal that referenced this issue Aug 16, 2024
Cloning the entire repository for the purpose of compiling packages
specified in source-repository-packages is wasted effort. To read and
compile the package, we need only the HEAD of the repository, thus a
shallow clone is sufficient.

Note that this doesn't change the behaviour of `cabal get -s` which
still does a full clone (--depth=1 is only used in vcsSyncRepo, not in
vcsCloneRepo)

Fixes haskell#7264
@TeofilC
Copy link
Collaborator

TeofilC commented Oct 18, 2024

FWIW nowadays there are also partial clones: https://github.blog/open-source/git/get-up-to-speed-with-partial-clone-and-shallow-clone/

This let's you basically clone lazily, just getting the bits you want. It combines the speed of shallow clones with the ability to have global caching.

Since the global caching thing was never implemented I don't think there's a pressing need to use this instead of shallow. But it's good to keep it in mind for the future

alt-romes added a commit to alt-romes/cabal that referenced this issue Nov 7, 2024
Cloning the entire repository for the purpose of compiling packages
specified in source-repository-packages is wasted effort. To read and
compile the package, we need only the HEAD of the repository, thus a
shallow clone is sufficient.

Note that this doesn't change the behaviour of `cabal get -s` which
still does a full clone (--depth=1 is only used in vcsSyncRepo, not in
vcsCloneRepo)

Fixes haskell#7264
alt-romes added a commit to alt-romes/cabal that referenced this issue Nov 7, 2024
Cloning the entire repository for the purpose of compiling packages
specified in source-repository-packages is wasted effort. To read and
compile the package, we need only the HEAD of the repository, thus a
shallow clone is sufficient.

Note that this doesn't change the behaviour of `cabal get -s` which
still does a full clone (--depth=1 is only used in vcsSyncRepo, not in
vcsCloneRepo)

Fixes haskell#7264
alt-romes added a commit to alt-romes/cabal that referenced this issue Nov 7, 2024
Cloning the entire repository for the purpose of compiling packages
specified in source-repository-packages is wasted effort. To read and
compile the package, we need only the HEAD of the repository, thus a
shallow clone is sufficient.

Note that this doesn't change the behaviour of `cabal get -s` which
still does a full clone (--depth=1 is only used in vcsSyncRepo, not in
vcsCloneRepo)

Fixes haskell#7264
alt-romes added a commit to alt-romes/cabal that referenced this issue Nov 7, 2024
Cloning the entire repository for the purpose of compiling packages
specified in source-repository-packages is wasted effort. To read and
compile the package, we need only the HEAD of the repository, thus a
shallow clone is sufficient.

Note that this doesn't change the behaviour of `cabal get -s` which
still does a full clone (--depth=1 is only used in vcsSyncRepo, not in
vcsCloneRepo)

Fixes haskell#7264
alt-romes added a commit to alt-romes/cabal that referenced this issue Nov 7, 2024
Cloning the entire repository for the purpose of compiling packages
specified in source-repository-packages is wasted effort. To read and
compile the package, we need only the HEAD of the repository, thus a
shallow clone is sufficient.

Note that this doesn't change the behaviour of `cabal get -s` which
still does a full clone (--depth=1 is only used in vcsSyncRepo, not in
vcsCloneRepo)

Fixes haskell#7264
alt-romes added a commit to alt-romes/cabal that referenced this issue Nov 8, 2024
Cloning the entire repository for the purpose of compiling packages
specified in source-repository-packages is wasted effort. To read and
compile the package, we need only the HEAD of the repository, thus a
shallow clone is sufficient.

Note that this doesn't change the behaviour of `cabal get -s` which
still does a full clone (--depth=1 is only used in vcsSyncRepo, not in
vcsCloneRepo)

Fixes haskell#7264
alt-romes added a commit to alt-romes/cabal that referenced this issue Nov 8, 2024
Cloning the entire repository for the purpose of compiling packages
specified in source-repository-packages is wasted effort. To read and
compile the package, we need only the HEAD of the repository, thus a
shallow clone is sufficient.

Note that this doesn't change the behaviour of `cabal get -s` which
still does a full clone (--depth=1 is only used in vcsSyncRepo, not in
vcsCloneRepo)

Fixes haskell#7264
alt-romes added a commit to alt-romes/cabal that referenced this issue Nov 11, 2024
Cloning the entire repository for the purpose of compiling packages
specified in source-repository-packages is wasted effort. To read and
compile the package, we need only the HEAD of the repository, thus a
shallow clone is sufficient.

Note that this doesn't change the behaviour of `cabal get -s` which
still does a full clone (--depth=1 is only used in vcsSyncRepo, not in
vcsCloneRepo)

Fixes haskell#7264
@alt-romes
Copy link
Collaborator

(This is fixed by #10254)

@ulysses4ever ulysses4ever linked a pull request Nov 11, 2024 that will close this issue
5 tasks
@mergify mergify bot closed this as completed in #10254 Nov 17, 2024
@mergify mergify bot closed this as completed in b6c28ee Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.