Skip to content

Commit

Permalink
pack-objects: don't reuse deltas with path walk
Browse files Browse the repository at this point in the history
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.

While this config does enable the path-walk feature, it does not lead to
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.

In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.

I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).

In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.

As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:

 1. The code in send-pack.c that executes 'git pack-objects' is ignorant
    of whether the current process is a client pushing to a remote or a
    remote sending a fetch or clone to a client.

 2. For servers, it is critical that they trust the previously computed
    deltas whenever possible, or they could overload their CPU
    resources.

There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).

The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.

One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.

Alternative options considered included:

 * Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
   choice. However, we already have pack.usePathWalk=true as an opt-in
   to "do the right thing to make my data small" as far as our internal
   users are concerned.

 * Modify the chain between builtin/push.c, transport.c, and
   builtin/send-pack.c to communicate that we are in "push" mode, not
   within a fetch or clone. However, this seemed like overkill. It may
   be beneficial in the future to pass through a mode like this, but it
   does not meet the bar for the immediate need.

Signed-off-by: Derrick Stolee <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
  • Loading branch information
derrickstolee authored and dscho committed Dec 9, 2024
1 parent e1c75f9 commit 39887e4
Showing 1 changed file with 14 additions and 1 deletion.
15 changes: 14 additions & 1 deletion builtin/pack-objects.c
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ static struct bitmap_index *bitmap_git;
static uint32_t write_layer;

static int non_empty;
static int reuse_delta = 1, reuse_object = 1;
static int reuse_delta = -1, reuse_object = 1;
static int keep_unreachable, unpack_unreachable, include_tag;
static timestamp_t unpack_unreachable_expiration;
static int pack_loose_unreachable;
Expand Down Expand Up @@ -4788,6 +4788,19 @@ int cmd_pack_objects(int argc,
path_walk = git_env_bool("GIT_TEST_PACK_PATH_WALK", 0);
}

if (reuse_delta < 0) {
/*
* If we are using the --revs option and path-walk is _implied_
* then use --no-reuse-delta by default.
*/
if (use_internal_rev_list &&
the_repository->gitdir &&
the_repository->settings.pack_use_path_walk)
reuse_delta = 0;
else
reuse_delta = 1;
}

if (depth < 0)
depth = 0;
if (depth >= (1 << OE_DEPTH_BITS)) {
Expand Down

0 comments on commit 39887e4

Please sign in to comment.