Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull request: Add the 'fork' command to cabal-install. #2

Closed
wants to merge 1 commit into from
Closed

Pull request: Add the 'fork' command to cabal-install. #2

wants to merge 1 commit into from

Conversation

jmillikin
Copy link

This command reads the source-repositories from a package's description,
determines which VCS to use, and then creates a local branch or checkout
of the package's repository.

@dagit
Copy link
Collaborator

dagit commented Apr 24, 2012

+1 (adding this in case it helps influence this code getting accepted, if it's just obnoxious then I apologize!)

@dcoutts
Copy link
Contributor

dcoutts commented Apr 24, 2012

Hmm. From a UI perspective, I wonder if we should share this with the existing "unpack" command.

@vincenthz
Copy link
Contributor

I don't really want to bikeshed, but it really sounds like 'cabal branch' should be some sub-option of 'cabal fetch' (--from-vcs maybe).

Also 'branch' is only used by bzr and mercurial, git uses 'checkout' and darcs 'get'. And by popularity git, then darcs are the main used vcs, which have a very big majority (6000 packages) compared to bzr/mercurial (600 packages), per @jmillikin 's numbers.

Otherwise the feature seems like a worthy addition.

@dcoutts
Copy link
Contributor

dcoutts commented Apr 24, 2012

So the existing UI has "fetch" which gets tarballs and stashes them in the download cache. It doesn't yet but could have a -o flag or similar to stick the tarball in the current dir or wherever. Then we also have "unpack" which downloads a tarball and unpacks it (by default in the current subdir).

@jmillikin
Copy link
Author

I strongly prefer making this a separate command, because it's doing something different from either fetch or unpack. Giving two separate operations the same command name, then distinguishing based on a flag, leads to UI catastrophes like Git.

The term "branch" is fairly universal among version control systems. Bazaar, Git, Mercurial, and Subversion all use it for basically the same operation (creating a new lightweight fork from an existing line of development). The only significant VCS that does not use "branch" is Darcs, but we shouldn't use the Darcs terminology because cabal get some-package would be confusing.

The only workable alternative to "branch" is "clone", but I think cabal clone some-package would also be confusing.

@dagit
Copy link
Collaborator

dagit commented Apr 25, 2012

I agree with John's justifications.

The only thing I would change, is to go back in time and implement a plugin
architecture for cabal so that this functionality could be implemented as a
plugin instead of part of cabal directly. In my idealized world with cabal
plugins the plugins would be similar to git plugins. Which is to say there
would be a directory somewhere that cabal searches when the user types
"cabal branch ..." and when it finds "cabal-branch" it invokes that with
the supplied command line arguments. We would also be able to teach cabal
how to download plugins from hackage and install them when they are
missing. My motivation for mentioning the non-existant plugins is that I'm
trying to find further evidence to justify them. Maven (a java build
system) uses plugins to great success and extensibility.

Jason

On Tue, Apr 24, 2012 at 6:05 PM, jmillikin <
[email protected]

wrote:

I strongly prefer making this a separate command, because it's doing
something different from either fetch or unpack. Giving two separate
operations the same command name, then distinguishing based on a flag,
leads to UI catastrophes like Git.

The term "branch" is fairly universal among version control systems.
Bazaar, Git, Mercurial, and Subversion all use it for basically the same
operation (creating a new lightweight fork from an existing line of
development). The only significant VCS that does not use "branch" is Darcs,
but we shouldn't use the Darcs terminology because cabal get some-package
would be confusing.

The only workable alternative to "branch" is "clone", but I think cabal clone some-package would also be confusing.


Reply to this email directly or view it on GitHub:
#2 (comment)

@sol
Copy link
Member

sol commented Apr 25, 2012

I'd also strongly prefer a separate command (for the same reasons @jmillikin pointed out).

At first I found branch very confusing, but I probably could cope with it. Anyway, I think it is fair to say that it is biased towards bzr (and it has a completely different meaning in git).

For comparison:

git clone
hg clone
darcs get
bzr branch
svn checkout

What about cabal fork? It is generic in the way that none of the supported systems use it.

@dagit Same thought here about plugins. Looking at sbt shows how awesome it can get.

@dcoutts
Copy link
Contributor

dcoutts commented Apr 25, 2012

I'm not sure I'm convinced. Keep in mind that cabal is not itself a dvcs so the verbs that are common for a dvcs are not necessarily appropriate, simply because the user has a different context in mind. For a dvcs the implicit context is operations on repositories, for cabal or other package management tools, the implicit context is operations on packages. Branching a repo makes sense, I'm not sure branching a package makes sense.

Is it really doing something different from unpack? Isn't it doing the same thing but for a different target? Both give you the source tree locally, but one comes from a tarball and the other from a source repo. I admit that unpack is a verb we use on tarballs and not on remote source repos, but it seems like we ought to be able to find something that covers both.

Or alternatively, perhaps we want a new sub command for source repo operations. In that case the context changes from operations on packages to operations on repos, and checkout/get/clone etc become appropriate terms.

@kosmikus
Copy link
Contributor

I think I'd vote for clone or get. To me, merging it with unpack in some way sounds like a good idea as well. I don't like branch very much, because while it exists as terminology in most VCS, it actually has quite different meanings. Perhaps we can find a name that is fresh. I like the fork suggestion, too.

@vincenthz
Copy link
Contributor

I don't particularly want cabal to become a vcs abstraction, and this is why i suggested fetch with an option in the first place. particularly in this case, i imagined it as "you'll fetch the package's repository instead of fetching a package's tarball".

I'm also fine with unpack (which i didn't know hence my suggestion of fetch) although maybe "unpacking a package's repository" is a bit odd.

clone as in "cloning a package" seems to make perfect sense as cloning the whole history of the package hence the repository.

@jmillikin
Copy link
Author

I'd be fine with fork -- you're right, it's generic across versioning systems, and means pretty much exactly what's being done in this command.

The reason I don't want to make this part of unpack is because I consider the final product to be different. Someone who wanted to get a versioned copy of a branch's source would be unhappy if they ended up with just a directory grabbed from a tarball.

@jmillikin
Copy link
Author

Gentle ping -- @dcoutts, would renaming the command to fork satisfy your concerns, or do you really want it to be a sub-sub-command of something else?

@kosmikus
Copy link
Contributor

I've briefly scanned, but not tested the code. It looks ok to me. I'd be willing to merge if this is renamed to fork, if @dcoutts is ok with it, too.

@jmillikin
Copy link
Author

Apologies for the delay; I got distracted by another project.

Pull request updated to use 'fork' instead of 'branch'. I left one of the internal types (Brancher) unchanged, because otherwise it'd be called a Forker and that's just silly.

This command reads the source-repositories from a package's description,
determines which VCS to use, and then creates a local repository or
branch of the package's repository.
@kosmikus
Copy link
Contributor

kosmikus commented May 7, 2012

@dcoutts and I had more naming discussion, and I think we came to the conclusion that we'd like to name this get rather than fork now, and that while we can then merge your change under that name for now, we'd like to aim for a future where get takes over the responsibilities of unpack, too.

@jmillikin
Copy link
Author

I'd really rather not call this get, or merge it with unpack, because the two commands are doing different things.

cabal fork creates a version-controlled branch of a package's source repository.

cabal unpack creates an unversioned directory containing the contents of the tarball uploaded to Hackage.

The two "types" of source can be very different, such as when the Hackage tarball is generated with a preprocessor, or when multiple packages share the same source repository (example: Cabal and cabal-install).

Since they do different things, the commands should not share a name just because they both happen to be dealing with some sort of source code.

As to the command name "get", that's just plain confusing. Come on. If you tell someone "cabal supports branching a package from its source-repo", they ought to be able to guess the command to use in only 2-3 tries. If the command is "get", the only way to discover it is cabal --help | less.

@kosmikus
Copy link
Contributor

kosmikus commented May 8, 2012

They're doing different but similar things. I think there's value in keeping the number of top-level commands small. Tarballs can be seen as snapshots of the repository at certain times, so I don't see a huge problem in joining the two under a common top-level command. I think we have to think carefully about how to design the user interface so that it does not become confusing, which is why I'd suggest for get and unpack to coexist for a potentially long transition period.

As to branch vs. fork vs. get: I simply disagree. I don't think I'd ever just blindly type in a guessed command I've never used before and expect it to work correctly. But even if I did: I'd never be able to guess branch or fork, whereas get is quite descriptive to me. You're saying it's confusing. That suggests you'd associate a completely different meaning with get? Such as?

@sol
Copy link
Member

sol commented May 8, 2012

I really care about this feature, and not so much about how it's named.

Anyway, rather than building something that may be turned into a consistent UI eventually, it makes more sense to me to build that consistent UI right now.

So I'll throw in an other suggestion (in the hope that this will not turn into extensive bikeshedding).

cabal get <package>
cabal get <package> --upstream

Where get is the same as unpack and get --upstream is get/clone/branch/fork/whatever.

Would that make sense?

@jmillikin If I understand correctly cabal fork foo puts it's result into directory ./foo-x.y.z, wouldn't it make more sense to put it into ./foo instead? And have you considered a --dry-run option? I think that could be useful if you want to know in advance what vcs/repo will be used.

@bmillwood
Copy link
Contributor

I agree with get --upstream, for what it's worth. I see a prominent use of this command being not forking at all, but just getting the latest source code to look at it or test it for compatibility with something else. Hence fork is kind of inappropriate (and I would never guess it, either).

@dcoutts
Copy link
Contributor

dcoutts commented May 10, 2012

I think I agree with sol's suggestion here.

cabal get <package>
cabal get <package> --source-repository
cabal get <package> --source-repository=[head|this|...]

(Note that if you think it's too long, don't worry, you can use any unique prefix, e.g. --source)

The only difference here is instead of --upstream, we're using the exact same terminology as we already have in .cabal files, namely source-repository. For example in .cabal files we have:

source-repository head
  type:     darcs
  location: http://darcs.haskell.org/cabal/
  subdir:   Cabal

And we have pre-defined meanings for different repo kinds, "head", "this" and also allow other named kinds.

So cabal get <package> --source-repository=[head|this|...] lets you be explicit about which one. When you don't specify then we should pick one sensibly: if there's just one declared use that, if there's several, pick 'head' then 'this' then the first other one specified.

The idea of allowing --dry-run here may be useful, however in practice we may not have enough info to know where we'll unpack to until we do it. We could probably say something useful though.

This was referenced May 24, 2012
@byorgey
Copy link
Member

byorgey commented Jul 25, 2012

+1 for this feature. cabal get <package> --source-repository sounds good to me, though honestly I really don't care too much what it's named.

@tibbe
Copy link
Member

tibbe commented Aug 13, 2012

@jmillikin It seems like there is an agreement on what is to be done, are you interested in pushing through, making the agreed changes and have this committed?

@jmillikin
Copy link
Author

I disagree with the proposed user interface changes, so I won't be changing my patch to integrate them. Feel free to use the current state of the patch as a starting place in implementing cabal get, if cabal fork is rejected.

@23Skidoo
Copy link
Member

If nobody wants to take this on, I can make the necessary changes once I'm done with the sandbox patches. This code could be useful for implementing references to remote source repositories from sandboxes (a-la cabal-meta).

@tibbe
Copy link
Member

tibbe commented Aug 15, 2012

I think fork is the widely accepted terminology in distributed versions control, where everything is a fork. It's used for this action on both GitHub and BitBucket.

@dag
Copy link

dag commented Aug 25, 2012

I propose cabal source: it's clear that you're getting the source (unlike cabal get which could mean anything), doesn't suffer from VCS terminology issues and is consistent with e.g. apt-get source and yumdownloader --source. It also goes well with the existing terminology in Cabal around "source repositories".

@simonmar
Copy link
Member

I don't particularly like any of the previous suggestions. How about cabal get-repo?

@gregwebs
Copy link

gregwebs commented Oct 1, 2012

I like simonmar's suggestion. In the future we may want to support additional vcs commands. So it might be best to have cabal repo as a base, with cabal repo get as the command discussed here.

I would be fine with anything here being implemented though :)

@23Skidoo
Copy link
Member

@sol sol mentioned this pull request Oct 30, 2012
@sol
Copy link
Member

sol commented Oct 30, 2012

Not sure if it makes any difference, but I don't like the overloaded semantics of that UI.

@dcoutts even if you say that you agree with my suggestion, it is different from what I proposed.

Please note that my last suggestion is not my favorite. It was just an attempt to foster consensus, and more importantly to prevent a state of "we do it like that for now, and fix it later" (which IMHO is almost always a bad idea).

That said, I think whatever UI is agreed upon, the person who did the initial investment (@jmillikin) has to be ok with it (it does not have to be his favorite, but it still has to be ok with him). Then, even if I don't agree with the final result, I can live with it, because he did the work.

This also has practical advantages. If writing code entitles me to have a say when it comes to bikeshedding, then there is an incentive to write code. But if I have to be prepared that I do the work, and then it will end up as something I just don't like, then I'd rather think twice before investing any of my time.

@23Skidoo
Copy link
Member

@sol I'd actually prefer having a separate cabal fork command, but I think that it's better to have this included in some form instead of not including at all. Supporting command aliases looks like a good compromise.

@dcoutts
Copy link
Contributor

dcoutts commented Nov 1, 2012

So I think we can close this topic now, 23Skidoo has made a new patch and that's now close to going in.

That said, I don't mean to close down the UI discussion. There's some interesting suggestions for future directions here with more vcs integration. So don't assume this is now set in stone. We can make UI changes in a somewhat backwards compat way (like leaving unpack as an alias for the new command).

@dcoutts dcoutts closed this Nov 1, 2012
grayjay added a commit to grayjay/cabal that referenced this pull request Dec 19, 2016
Previously, the solver only checked for cycles after it had already found a
solution. That reduced the number of times that it performed the check in the
common case when there were no cycles. However, when there was a cycle, the
solver could spend a lot of time searching subtrees that already had a cyclic
dependency and therefore could not lead to a solution. This is part of
haskell#3824.

Changes in this commit:
- Store the reverse dependency map on all choice nodes in the search tree, so
  that 'detectCyclesPhase' can access it at every step.
- Check for cycles incrementally at every step. Any new cycle must contain the
  current package, so we just check whether the current package is reachable
  from its neighbors.
- If there is a cycle, we convert the map to a graph and find a strongly
  connected component, as before.
- Instead of using the whole strongly connected component as the conflict set,
  we select one cycle. Smaller conflict sets are better for backjumping.
- The incremental cycle detection automatically fixes a bug where the solver
  filtered out the message about cyclic dependencies when it summarized the full
  log. The bug occurred when the failure message was not immediately after the
  line where the solver chose one of the packages involved in the conflict. See
  haskell#4154.

I tried several approaches before I found something with reasonable performance.
Here is a comparison of runtime and memory usage. I turned off assertions when
building cabal.

Index state: index-state(hackage.haskell.org) = 2016-12-03T17:22:05Z
GHC 8.0.1

Runtime in seconds:
Packages                    Search tree depth   Trials   master   This PR   haskell#1      haskell#2
yesod                       343                 3        2.00     2.00      2.13    2.02
yesod gi-glib leksah        744                 3        3.21     3.31      4.10    3.48
phooey                      66                  3        3.48     3.54      3.56    3.57
stackage nightly snapshot   6791                1        186      193       357     191

Total memory usage in MB, with '+RTS -s':
Packages                                        Trials   master    This PR   haskell#1     haskell#2
yesod                                           1         189       188       188     198
yesod gi-glib leksah                            1         257       257       263     306
stackage nightly snapshot                       1        1288      1338      1432   12699

haskell#1 - Same as master, but with cycle checking (Data.Graph.stronglyConnComp) after
     every step.
haskell#2 - Store dependencies in Distribution.Compat.Graph in the search tree, and
     check for cycles containing the current package at every step.
grayjay added a commit to grayjay/cabal that referenced this pull request Dec 19, 2016
Previously, the solver only checked for cycles after it had already found a
solution. That reduced the number of times that it performed the check in the
common case where there were no cycles. However, when there was a cycle, the
solver could spend a lot of time searching subtrees that already had a cyclic
dependency and therefore could not lead to a solution. This is part of
haskell#3824.

Changes in this commit:
- Store the reverse dependency map on all choice nodes in the search tree, so
  that 'detectCyclesPhase' can access it at every step.
- Check for cycles incrementally at every step. Any new cycle must contain the
  current package, so we just check whether the current package is reachable
  from its neighbors.
- If there is a cycle, we convert the map to a graph and find a strongly
  connected component, as before.
- Instead of using the whole strongly connected component as the conflict set,
  we select one cycle. Smaller conflict sets are better for backjumping.
- The incremental cycle detection automatically fixes a bug where the solver
  filtered out the message about cyclic dependencies when it summarized the full
  log. The bug occurred when the failure message was not immediately after the
  line where the solver chose one of the packages involved in the conflict. See
  haskell#4154.

I tried several approaches and compared performance when solving for
packages with different numbers of dependencies. Here are the results. None of
these runs involved any cycles, so they should have only tested the overhead of
cycle checking. I turned off assertions when building cabal.

Index state: index-state(hackage.haskell.org) = 2016-12-03T17:22:05Z
GHC 8.0.1

Runtime in seconds:
Packages                    Search tree depth   Trials   master   This PR   haskell#1      haskell#2
yesod                       343                 3        2.00     2.00      2.13    2.02
yesod gi-glib leksah        744                 3        3.21     3.31      4.10    3.48
phooey                      66                  3        3.48     3.54      3.56    3.57
Stackage nightly snapshot   6791                1        186      193       357     191

Total memory usage in MB, with '+RTS -s':
Packages                                        Trials   master    This PR   haskell#1     haskell#2
yesod                                           1         189       188       188     198
yesod gi-glib leksah                            1         257       257       263     306
Stackage nightly snapshot                       1        1288      1338      1432   12699

haskell#1 - Same as master, but with cycle checking (Data.Graph.stronglyConnComp) after
     every step.
haskell#2 - Store dependencies in Distribution.Compat.Graph in the search tree, and
     check for cycles containing the current package at every step.
ezyang pushed a commit that referenced this pull request Dec 27, 2016
Previously, the solver only checked for cycles after it had already found a
solution. That reduced the number of times that it performed the check in the
common case where there were no cycles. However, when there was a cycle, the
solver could spend a lot of time searching subtrees that already had a cyclic
dependency and therefore could not lead to a solution. This is part of
#3824.

Changes in this commit:
- Store the reverse dependency map on all choice nodes in the search tree, so
  that 'detectCyclesPhase' can access it at every step.
- Check for cycles incrementally at every step. Any new cycle must contain the
  current package, so we just check whether the current package is reachable
  from its neighbors.
- If there is a cycle, we convert the map to a graph and find a strongly
  connected component, as before.
- Instead of using the whole strongly connected component as the conflict set,
  we select one cycle. Smaller conflict sets are better for backjumping.
- The incremental cycle detection automatically fixes a bug where the solver
  filtered out the message about cyclic dependencies when it summarized the full
  log. The bug occurred when the failure message was not immediately after the
  line where the solver chose one of the packages involved in the conflict. See
  #4154.

I tried several approaches and compared performance when solving for
packages with different numbers of dependencies. Here are the results. None of
these runs involved any cycles, so they should have only tested the overhead of
cycle checking. I turned off assertions when building cabal.

Index state: index-state(hackage.haskell.org) = 2016-12-03T17:22:05Z
GHC 8.0.1

Runtime in seconds:
Packages                    Search tree depth   Trials   master   This PR   #1      #2
yesod                       343                 3        2.00     2.00      2.13    2.02
yesod gi-glib leksah        744                 3        3.21     3.31      4.10    3.48
phooey                      66                  3        3.48     3.54      3.56    3.57
Stackage nightly snapshot   6791                1        186      193       357     191

Total memory usage in MB, with '+RTS -s':
Packages                                        Trials   master    This PR   #1     #2
yesod                                           1         189       188       188     198
yesod gi-glib leksah                            1         257       257       263     306
Stackage nightly snapshot                       1        1288      1338      1432   12699

#1 - Same as master, but with cycle checking (Data.Graph.stronglyConnComp) after
     every step.
#2 - Store dependencies in Distribution.Compat.Graph in the search tree, and
     check for cycles containing the current package at every step.
@emilypi emilypi mentioned this pull request Jan 30, 2021
emilypi added a commit to emilypi/cabal that referenced this pull request Feb 1, 2021
* Fix bug causing spurious app or lib dirs to be generated by
libraries and executables respectively.
* Add unit tests for init flag correctness
* Export useful testing functions from `Init/Command.hs` and `Init.Types.hs`
* Modify `cabal-install.cabal.dev` to reflect change in test file name
for Init/FileCreators -> Init.hs
*
@emilypi emilypi mentioned this pull request Aug 25, 2021
@sol sol mentioned this pull request Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.