Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc: implement basic GC for Git backend #2659

Merged
merged 1 commit into from
Dec 3, 2023
Merged

Conversation

martinvonz
Copy link
Member

This adds an initial jj util gc command, which simply calls git gc when using the Git backend. That should already be useful in non-colocated repos because it's not obvious how to GC (repack) such repos. In my own jj repo, it shrunk .jj/repo/store/ from 2.4 GiB to 780 MiB, and jj log --ignore-working-copy was sped up from 157 ms to 86 ms.

I haven't added any tests because the functionality depends on having git binary on the PATH, which we don't yet depend on anywhere else. I think we'll still be able to test much of the future parts of garbage collection without a git binary because the interesting parts are about manipulating the Git repo before calling git gc on it.

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

@martinvonz martinvonz force-pushed the push-mnqqwqopplzk branch 2 times, most recently from 8816e37 to 0e4e81b Compare December 2, 2023 01:06
@ilyagr
Copy link
Contributor

ilyagr commented Dec 2, 2023

Non-blocking: Perhaps we should also run git pack-refs --all as discussed in https://martinvonz.github.io/jj/prerelease/git-compatibility/#co-located-jujutsugit-repos? Another option is git maintenance run which, AFAIK, does both, but I'm not 100% sure.

@martinvonz
Copy link
Member Author

Non-blocking: Perhaps we should also run git pack-refs --all as discussed in https://martinvonz.github.io/jj/prerelease/git-compatibility/#co-located-jujutsugit-repos? Another option is git maintenance run which, AFAIK, does both, but I'm not 100% sure.

git gc actually does both. I'll update that doc to suggested jj util gc instead of git pack-refs --all. Thanks for pointing that out.

@ilyagr
Copy link
Contributor

ilyagr commented Dec 2, 2023

Good point, apparently there is a gc.packRefs option that defaults to true.

OTOH, I am only distantly familiar with this (reading manpages) and am quite possibly confused, but AFAICT there's still an issue. We want to run jj pack-refs --all and git gc, I would guess, only runs jj pack-refs without --all (the docs are not entirely clear to me). According to the man page, that packs only those refs that are already packed (so, probably, not new jj/ refs since the last jj pack refs --all).

Fortunately, if you don't already know the answer to this, there are plenty of friendly people we could ask...

@martinvonz
Copy link
Member Author

I agree that it sound like --all is required but it seems to pack the ref/jj/keep/ refs in practice:

martinvonz in 🌐 vonz in /tmp/tmp.lfndcPTqYY
❯ find .jj/repo/store/git/
.jj/repo/store/git/
.jj/repo/store/git/HEAD
.jj/repo/store/git/hooks
.jj/repo/store/git/hooks/pre-rebase.sample
.jj/repo/store/git/hooks/pre-commit.sample
.jj/repo/store/git/hooks/docs.url
.jj/repo/store/git/hooks/prepare-commit-msg.sample
.jj/repo/store/git/hooks/commit-msg.sample
.jj/repo/store/git/hooks/pre-merge-commit.sample
.jj/repo/store/git/hooks/pre-applypatch.sample
.jj/repo/store/git/hooks/applypatch-msg.sample
.jj/repo/store/git/hooks/fsmonitor-watchman.sample
.jj/repo/store/git/hooks/pre-push.sample
.jj/repo/store/git/hooks/post-update.sample
.jj/repo/store/git/description
.jj/repo/store/git/refs
.jj/repo/store/git/refs/jj
.jj/repo/store/git/refs/jj/keep
.jj/repo/store/git/refs/jj/keep/ec7edd0fea21f9d251cf573773763692bd28039e
.jj/repo/store/git/refs/jj/keep/5201cdb62b19daa1ad7f5c7b6098c5eb11664cc0
.jj/repo/store/git/refs/jj/keep/edae7f918335743e6139146828d495b75cf0e116
.jj/repo/store/git/refs/jj/keep/9e8c63da8a97ab003b6d3d0d035de0dc435cc31d
.jj/repo/store/git/refs/tags
.jj/repo/store/git/refs/heads
.jj/repo/store/git/objects
.jj/repo/store/git/objects/52
.jj/repo/store/git/objects/52/01cdb62b19daa1ad7f5c7b6098c5eb11664cc0
.jj/repo/store/git/objects/ec
.jj/repo/store/git/objects/ec/7edd0fea21f9d251cf573773763692bd28039e
.jj/repo/store/git/objects/9e
.jj/repo/store/git/objects/9e/8c63da8a97ab003b6d3d0d035de0dc435cc31d
.jj/repo/store/git/objects/pack
.jj/repo/store/git/objects/info
.jj/repo/store/git/objects/ed
.jj/repo/store/git/objects/ed/ae7f918335743e6139146828d495b75cf0e116
.jj/repo/store/git/config
.jj/repo/store/git/info
.jj/repo/store/git/info/exclude

martinvonz in 🌐 vonz in /tmp/tmp.lfndcPTqYY
❯ jj util gc
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 72 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (5/5), done.
Building bitmaps: 100% (4/4), done.
Total 5 (delta 3), reused 0 (delta 0), pack-reused 0

martinvonz in 🌐 vonz in /tmp/tmp.lfndcPTqYY
❯ find .jj/repo/store/git/
.jj/repo/store/git/
.jj/repo/store/git/HEAD
.jj/repo/store/git/hooks
.jj/repo/store/git/hooks/pre-rebase.sample
.jj/repo/store/git/hooks/pre-commit.sample
.jj/repo/store/git/hooks/docs.url
.jj/repo/store/git/hooks/prepare-commit-msg.sample
.jj/repo/store/git/hooks/commit-msg.sample
.jj/repo/store/git/hooks/pre-merge-commit.sample
.jj/repo/store/git/hooks/pre-applypatch.sample
.jj/repo/store/git/hooks/applypatch-msg.sample
.jj/repo/store/git/hooks/fsmonitor-watchman.sample
.jj/repo/store/git/hooks/pre-push.sample
.jj/repo/store/git/hooks/post-update.sample
.jj/repo/store/git/description
.jj/repo/store/git/refs
.jj/repo/store/git/refs/jj
.jj/repo/store/git/refs/tags
.jj/repo/store/git/refs/heads
.jj/repo/store/git/objects
.jj/repo/store/git/objects/pack
.jj/repo/store/git/objects/pack/pack-a26a8762e5623580d0513825f05a946a5b328be6.bitmap
.jj/repo/store/git/objects/pack/pack-a26a8762e5623580d0513825f05a946a5b328be6.idx
.jj/repo/store/git/objects/pack/pack-a26a8762e5623580d0513825f05a946a5b328be6.rev
.jj/repo/store/git/objects/pack/pack-a26a8762e5623580d0513825f05a946a5b328be6.pack
.jj/repo/store/git/objects/info
.jj/repo/store/git/objects/info/packs
.jj/repo/store/git/config
.jj/repo/store/git/packed-refs
.jj/repo/store/git/info
.jj/repo/store/git/info/refs
.jj/repo/store/git/info/exclude

@ilyagr
Copy link
Contributor

ilyagr commented Dec 2, 2023

Nice! (if a bit strange) I'm out of my depth now; if you don't, I might ask on some Git forum at some point. In any case, jj util gc from this PR would be useful as is.

@yuja
Copy link
Contributor

yuja commented Dec 2, 2023

FWIW, packing refs/jj/keep wouldn't matter much. git::import_refs() ignores the directory at all. libgit2 would scan the refs on jj git fetch, but peeling commit objects is way more expensive than scanning loose refs.

@martinvonz
Copy link
Member Author

Nice! (if a bit strange) I'm out of my depth now; if you don't, I might ask on some Git forum at some point. In any case, jj util gc from this PR would be useful as is.

After some more reading and testing, it seems clear that git pack-refs (without --all) does not pack loose refs, but git gc does. That's good enough for me; it seems very unlikely that git gc would stop packing refs in the future.

FWIW, packing refs/jj/keep wouldn't matter much. git::import_refs() ignores the directory at all. libgit2 would scan the refs on jj git fetch, but peeling commit objects is way more expensive than scanning loose refs.

I hope to fix that soon by replacing lots of refs by a single refs to a merge commit.

@yuja
Copy link
Contributor

yuja commented Dec 2, 2023

I hope to fix that soon by replacing lots of refs by a single refs to a merge commit.

OT, but I'm not sure if that's better because the dummy merge would be visible in git log --all?

@martinvonz
Copy link
Member Author

OT, but I'm not sure if that's better because the dummy merge would be visible in git log --all?

Good question. I wonder if people with colocated repos use git log --all sometimes. They would already see tons of uninteresting heads. Maybe they have already gotten used to passing --exclude=refs/jj/keep/*?

@yuja
Copy link
Contributor

yuja commented Dec 2, 2023

OT, but I'm not sure if that's better because the dummy merge would be visible in git log --all?

Good question. I wonder if people with colocated repos use git log --all sometimes. They would already see tons of uninteresting heads.

It does contain lots of empty working-copy heads, but the topology is correct. It would be messier if all orphaned heads are merged.

No idea if people use git log --all. I occasionally do that, but only for debugging purpose.

lib/src/git_backend.rs Outdated Show resolved Hide resolved
cli/src/commands/util.rs Outdated Show resolved Hide resolved
lib/src/git_backend.rs Outdated Show resolved Hide resolved
lib/src/store.rs Outdated Show resolved Hide resolved
@martinvonz martinvonz changed the base branch from main to push-uyoxmxlyxkuw December 2, 2023 18:38
Base automatically changed from push-uyoxmxlyxkuw to main December 3, 2023 00:37
@martinvonz
Copy link
Member Author

It does contain lots of empty working-copy heads, but the topology is correct. It would be messier if all orphaned heads are merged.

I asked for suggestions in Git's Discord server. @dscho suggested using reflogs. That seems like a great idea to. Then we can create a merge commit with thousands of parents, point a refs/jj/keep/ reference to it, then immediately point it to some ancestors commit instead (perhaps one of the parents of the merge). That should prevent GC of the commits while still hiding most or all of the obsolete commits created by jj.

This adds an initial `jj util gc` command, which simply calls `git gc`
when using the Git backend. That should already be useful in
non-colocated repos because it's not obvious how to GC (repack) such
repos. In my own jj repo, it shrunk `.jj/repo/store/` from 2.4 GiB to
780 MiB, and `jj log --ignore-working-copy` was sped up from 157 ms to
86 ms.

I haven't added any tests because the functionality depends on having
`git` binary on the PATH, which we don't yet depend on anywhere
else. I think we'll still be able to test much of the future parts of
garbage collection without a `git` binary because the interesting
parts are about manipulating the Git repo before calling `git gc` on
it.
@yuja
Copy link
Contributor

yuja commented Dec 3, 2023

I asked for suggestions in Git's Discord server. @dscho suggested using reflogs. That seems like a great idea to. Then we can create a merge commit with thousands of parents, point a refs/jj/keep/ reference to it, then immediately point it to some ancestors commit instead (perhaps one of the parents of the merge). That should prevent GC of the commits while still hiding most or all of the obsolete commits created by jj.

Interesting. I didn't know that reflog also prevents GC.

That will hide anonymous branches from git log --all, which seems a bit unfortunate, but I don't know if people would want to see all jj's heads in git log --all.

@ilyagr ilyagr mentioned this pull request Dec 3, 2023
@ilyagr
Copy link
Contributor

ilyagr commented Dec 3, 2023

I haven't found that discussion yet, but does Git have a setting on the amount of reflog history garbage collection preserves? Could this setting be set to 0 sometimes (so GC only preserves the currently existing branch positions)?

If Git preserved all of the reflog from GC, it would seem like a GC wouldn't be able to do much a lot of the time, but I might be missing something.

@dscho
Copy link

dscho commented Dec 3, 2023

Reflogs don't need extra merge commits to be created: just add them individually to the reflog.

The reflog's expiry is time-based, read: you may want to "refresh" the reflog. (I don't know whether the hack to add "future" reflog entries would upset Git or not.)

@martinvonz martinvonz merged commit 1cc2714 into main Dec 3, 2023
15 checks passed
@martinvonz martinvonz deleted the push-mnqqwqopplzk branch December 3, 2023 15:40
@martinvonz
Copy link
Member Author

I haven't found that discussion yet, but does Git have a setting on the amount of reflog history garbage collection preserves? Could this setting be set to 0 sometimes (so GC only preserves the currently existing branch positions)?

Yes, that is my biggest concern with the reflog approach. We'd effectively have to tell users that they're not allowed to run git gc -c gc.reflogExpire=now. We could make jj util gc print such a message, but it's still a risk. Johannes's hack might be an okay solution if it works.

Reflogs don't need extra merge commits to be created: just add them individually to the reflog.

Sorry, the context that I didn't share here is that I'm thinking of replacing refs pointing to thousands of heads by a single ref pointing to a merge commit with thousands of parents. The main reason for that is to make fetching from a remote faster (#293).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants