fast-import: avoid making replace refs point to themselves #1824

newren · 2024-11-07T22:36:02Z

Changes since v1: Clarified wording in the commit message, and added a little more text at the end of the commit message to address questions that came up in review.

cc: Elijah Newren [email protected]

newren · 2024-11-14T18:46:58Z

/submit

gitgitgadget · 2024-11-14T18:48:02Z

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1824/newren/fast-import-self-pointing-replace-ref-v1

To fetch this version to local tag pr-1824/newren/fast-import-self-pointing-replace-ref-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1824/newren/fast-import-self-pointing-replace-ref-v1

gitgitgadget · 2024-11-15T00:33:56Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Elijah Newren via GitGitGadget" <[email protected]> writes:

> Most git commands that you try to run in such a repository with a
> self-pointing replace object will result in an error:
>
>     $ git log
>     fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264

After reading up to this point, with "Most git commands", I was
afraid that you were going to say "... but I found this command that
fails to stop gracefully, and instead infinitely spin".

If that is not a case, then I am happy ;-) but in that case probably
"Most" -> "All" is warranted.

> Avoid such problems by deleting replace refs that will simply end up
> pointing to themselves at the end of our writing.  Warn the users when
> we do so, unless they specify --quiet.

This may need a bit of rephrasing.

Even when they specify "--quiet", you'd refrain from creating a
self-referencing replace entry, which makes sense, but it was not
clear with only the above description.  I had to read the patch text
to find it out.

Is it reasonable to expect that a self referencing replace entry is
the most common thing to happen, or loops with more than one
elements are equally plausible to happen but the self-referencing
one is singled out by this patch because it is trivial to notice,
unlike other forms of equally problematic loops?

> diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> index 76d5c20f141..51c8228cb7b 100644
> --- a/builtin/fast-import.c
> +++ b/builtin/fast-import.c
> @@ -179,6 +179,7 @@ static unsigned long branch_load_count;
>  static int failure;
>  static FILE *pack_edges;
>  static unsigned int show_stats = 1;
> +static unsigned int quiet;
>  static int global_argc;
>  static const char **global_argv;
>  static const char *global_prefix;
> @@ -1602,7 +1603,19 @@ static int update_branch(struct branch *b)
>  	struct ref_transaction *transaction;
>  	struct object_id old_oid;
>  	struct strbuf err = STRBUF_INIT;
> -
> +	static const char *replace_prefix = "refs/replace/";
> +
> +	if (starts_with(b->name, replace_prefix) &&

Curious how the "diff" machinery decided to represent the hunk like
this, instead of say

>  	struct strbuf err = STRBUF_INIT;
> +	static const char *replace_prefix = "refs/replace/";
> 
> +	if (starts_with(b->name, replace_prefix) &&

Of course that has nothing to do with this patch or its review.

> +	    !strcmp(b->name + strlen(replace_prefix),
> +		    oid_to_hex(&b->oid))) {
> +		if (!quiet)
> +			warning("Dropping %s since it would point to "
> +				"itself (i.e. to %s)",
> +				b->name, oid_to_hex(&b->oid));
> +		refs_delete_ref(get_main_ref_store(the_repository),
> +				NULL, b->name, NULL, 0);
> +		return 0;

I am not sure if a warning is even warranted.  If you decide to
replace an object A with the same object A, the result ought to be a
no-op.  I wonder if it is makes more sense to

 (1) do this unconditionally and silently, and
 (2) when we prepare the replace_map, ignore self-referencing ones.

instead.  If (2) makes sense entirely depends on the answer of an
earlier question (i.e. "is there a reason why self-reference is more
common than general loop?").

Thanks.

gitgitgadget · 2024-11-15T01:10:03Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <[email protected]> writes:

> I am not sure if a warning is even warranted.  If you decide to
> replace an object A with the same object A, the result ought to be a
> no-op.  I wonder if it is makes more sense to
>
>  (1) do this unconditionally and silently, and
>  (2) when we prepare the replace_map, ignore self-referencing ones.
>
> instead.  If (2) makes sense entirely depends on the answer of an
> earlier question (i.e. "is there a reason why self-reference is more
> common than general loop?").

Forgot to add.  (1) could be done even at a lower layer, i.e. the
ref API could be told to reject such a replace ref creation/update
that maps an object name to itself.

gitgitgadget · 2024-11-15T03:19:08Z

On the Git mailing list, Elijah Newren wrote (reply to this):

On Thu, Nov 14, 2024 at 4:31 PM Junio C Hamano <[email protected]> wrote:
>
> "Elijah Newren via GitGitGadget" <[email protected]> writes:
>
> > Most git commands that you try to run in such a repository with a
> > self-pointing replace object will result in an error:
> >
> >     $ git log
> >     fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264
>
> After reading up to this point, with "Most git commands", I was
> afraid that you were going to say "... but I found this command that
> fails to stop gracefully, and instead infinitely spin".
>
> If that is not a case, then I am happy ;-) but in that case probably
> "Most" -> "All" is warranted.

Some git commands are not bothered by the existence of the problematic
replace ref; e.g.:

$ git show-ref
64026eaa60e0208a00946cdd3cb41b059c6675bd refs/heads/main
fb92ebc654641b310e7d0360d0a5a49316fd7264
refs/replace/fb92ebc654641b310e7d0360d0a5a49316fd7264
fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/tags/msg
14f901a95ebae912feb4805f40ef68f15b0192c2 refs/tags/test

So, it's not "all" git commands.  Maybe I should rephrase as:

"""
Some git commands will ignore such replace refs, but many will fail
with an error:

$ git log
fatal: replace depth too high for object
fb92ebc654641b310e7d0360d0a5a49316fd7264
"""

?

> > Avoid such problems by deleting replace refs that will simply end up
> > pointing to themselves at the end of our writing.  Warn the users when
> > we do so, unless they specify --quiet.
>
> This may need a bit of rephrasing.
>
> Even when they specify "--quiet", you'd refrain from creating a
> self-referencing replace entry, which makes sense, but it was not
> clear with only the above description.  I had to read the patch text
> to find it out.

Would reordering the two phrases of the last sentence do it?

"""
Avoid such problems by deleting replace refs that will simply end up
pointing to themselves at the end of our writing.  Unless users specify --quiet,
warn the users when we do so.
"""

> Is it reasonable to expect that a self referencing replace entry is
> the most common thing to happen, or loops with more than one
> elements are equally plausible to happen but the self-referencing
> one is singled out by this patch because it is trivial to notice,
> unlike other forms of equally problematic loops?

Loops with more than one element are possible, but I couldn't see an
angle by which they are plausible.  In particular, I know how a user
can (and did) accidentally trigger a self-referencing replace ref (if
you're curious, see
https://github.com/newren/git-filter-repo/issues/401; it's much less
likely to trigger now due to an unrelated change to the default for
`--replace-refs`, but could still be triggered in combination with
that option).  There are many variations on their theme of "do a
change, then undo it" (in combination with the old `--replace-refs`
default), but I don't see how to either vary their testcase or come up
with some other way to get git-filter-repo to trigger a replace ref
loop with more than one element.

> > diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> > index 76d5c20f141..51c8228cb7b 100644
> > --- a/builtin/fast-import.c
> > +++ b/builtin/fast-import.c
> > @@ -179,6 +179,7 @@ static unsigned long branch_load_count;
> >  static int failure;
> >  static FILE *pack_edges;
> >  static unsigned int show_stats = 1;
> > +static unsigned int quiet;
> >  static int global_argc;
> >  static const char **global_argv;
> >  static const char *global_prefix;
> > @@ -1602,7 +1603,19 @@ static int update_branch(struct branch *b)
> >       struct ref_transaction *transaction;
> >       struct object_id old_oid;
> >       struct strbuf err = STRBUF_INIT;
> > -
> > +     static const char *replace_prefix = "refs/replace/";
> > +
> > +     if (starts_with(b->name, replace_prefix) &&
>
> Curious how the "diff" machinery decided to represent the hunk like
> this, instead of say
>
> >       struct strbuf err = STRBUF_INIT;
> > +     static const char *replace_prefix = "refs/replace/";
> >
> > +     if (starts_with(b->name, replace_prefix) &&
>
> Of course that has nothing to do with this patch or its review.

Oh, that is kinda weird.  It's not a whitespace-change issue either;
both the removed and added blank-looking lines really are blank lines.
No idea why the diff machinery does that.

> > +         !strcmp(b->name + strlen(replace_prefix),
> > +                 oid_to_hex(&b->oid))) {
> > +             if (!quiet)
> > +                     warning("Dropping %s since it would point to "
> > +                             "itself (i.e. to %s)",
> > +                             b->name, oid_to_hex(&b->oid));
> > +             refs_delete_ref(get_main_ref_store(the_repository),
> > +                             NULL, b->name, NULL, 0);
> > +             return 0;
>
> I am not sure if a warning is even warranted.  If you decide to
> replace an object A with the same object A, the result ought to be a
> no-op.  I wonder if it is makes more sense to
>
>  (1) do this unconditionally and silently, and
>  (2) when we prepare the replace_map, ignore self-referencing ones.
>
> instead.  If (2) makes sense entirely depends on the answer of an
> earlier question (i.e. "is there a reason why self-reference is more
> common than general loop?").
>
> Thanks.

Since you added more in a subsequent response, I'll go respond to that.

gitgitgadget · 2024-11-15T03:19:10Z

User Elijah Newren <[email protected]> has been added to the cc: list.

gitgitgadget · 2024-11-15T03:43:07Z

On the Git mailing list, Elijah Newren wrote (reply to this):

On Thu, Nov 14, 2024 at 5:08 PM Junio C Hamano <[email protected]> wrote:
>
> Junio C Hamano <[email protected]> writes:
>
> > I am not sure if a warning is even warranted.  If you decide to
> > replace an object A with the same object A, the result ought to be a
> > no-op.

Do you mean that the result ought to be that there is no replacement?
If so, I agree, but that's not always equivalent to a no-op...

> >  I wonder if it is makes more sense to
> >
> >  (1) do this unconditionally and silently, and
> >  (2) when we prepare the replace_map, ignore self-referencing ones.
> >
> > instead.  If (2) makes sense entirely depends on the answer of an
> > earlier question (i.e. "is there a reason why self-reference is more
> > common than general loop?").
>
> Forgot to add.  (1) could be done even at a lower layer, i.e. the
> ref API could be told to reject such a replace ref creation/update
> that maps an object name to itself.

Perhaps rejecting the creation or update of a replace ref would make
sense at a lower level, but a no-op isn't what I want here -- it'd
just result in a different bug.  In particular, note that my patch
modifies fast-import to issue a delete where it would otherwise issue
an update instead of merely rejecting the update.  If the repository
had a value for the replace ref before fast-import was run, and the
replace ref was explicitly named in the fast-import stream, I don't
want the replace ref to be left with a pre-fast-import value.

gitgitgadget · 2024-11-15T03:54:51Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

Elijah Newren <[email protected]> writes:

> ...  If the repository
> had a value for the replace ref before fast-import was run, and the
> replace ref was explicitly named in the fast-import stream, I don't
> want the replace ref to be left with a pre-fast-import value.

Ah, right.  We want to honor the user's latest wish, i.e., if they
create a replace ref that maps A to A, when they ask for object A,
instead of returning it, we need to return what the replace ref
refers to, which happens to be object A.  So you are right. "You are
trying to map A to A, so we'll ignore that request" is a wrong thing
to do.  "In order to give you A when you ask for A, we will remove
the existing mapping for A" is absolutely the right thing to do,
which is what your patch does.

Thanks.

If someone replaces a commit with a modified version, then builds on that commit, and then later decides to rewrite history in a format like git fast-export --all | CMD_TO_TWEAK_THE_STREAM | git fast-import and CMD_TO_TWEAK_THE_STREAM undoes the modifications that the replacement did, then at the end you'd get a replace ref that points to itself. For example: $ git show-ref | grep replace fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/replace/fb92ebc654641b310e7d0360d0a5a49316fd7264 Git commands which pay attention to replace refs will die with an error when a self-referencing replace ref is present: $ git log fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264 Avoid such problems by deleting replace refs that will simply end up pointing to themselves at the end of our writing. Unless users specify --quiet, warn them when we delete such a replace ref. Two notes about this patch: * We are not ignoring the problematic update of the replace ref (turning it into a no-op), we are replacing the update with a delete. The logic here is that if the repository had a value for the replace ref before fast-import was run, and the replace ref was explicitly named in the fast-import stream, we don't want the replace ref to be left with a pre-fast-import value. * While loops with more than one element (e.g. refs/replace/A points to B, and refs/replace/B points to A) are possible, they seem much less plausible. It is pretty easy to create a sequence of git-filter-repo commands that will trigger a self-referencing replace ref, but I do not know how to trigger a scenario with a cycle length greater than 1. Signed-off-by: Elijah Newren <[email protected]>

newren · 2024-11-18T22:19:03Z

/submit

gitgitgadget · 2024-11-18T22:19:56Z

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1824/newren/fast-import-self-pointing-replace-ref-v2

To fetch this version to local tag pr-1824/newren/fast-import-self-pointing-replace-ref-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1824/newren/fast-import-self-pointing-replace-ref-v2

gitgitgadget · 2024-11-19T00:47:54Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Elijah Newren via GitGitGadget" <[email protected]> writes:

> +		grep "Dropping.*since it would point to itself" msgs &&
> +		git show-ref >refs &&
> +		! grep refs/replace refs

"test_grep !" may make a failure case easier to diagnose.

Other than that, looking good.

Thanks, will queue.

gitgitgadget · 2024-11-19T07:47:14Z

This patch series was integrated into seen via git@6fbd691.

gitgitgadget · 2024-11-20T06:47:52Z

This patch series was integrated into seen via git@c5784eb.

gitgitgadget · 2024-11-20T06:47:52Z

This patch series was integrated into next via git@751ee6b.

gitgitgadget · 2024-11-21T06:31:23Z

This patch series was integrated into seen via git@4cace49.

gitgitgadget · 2024-11-22T08:13:40Z

This patch series was integrated into seen via git@81a7312.

newren force-pushed the fast-import-self-pointing-replace-ref branch from 7402518 to cc77a00 Compare November 18, 2024 22:08

gitgitgadget bot added the seen label Nov 19, 2024

gitgitgadget bot added the next label Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fast-import: avoid making replace refs point to themselves #1824

fast-import: avoid making replace refs point to themselves #1824

newren commented Nov 7, 2024 •

edited

Loading

newren commented Nov 14, 2024

gitgitgadget bot commented Nov 14, 2024

gitgitgadget bot commented Nov 15, 2024

gitgitgadget bot commented Nov 15, 2024

gitgitgadget bot commented Nov 15, 2024

gitgitgadget bot commented Nov 15, 2024

gitgitgadget bot commented Nov 15, 2024

gitgitgadget bot commented Nov 15, 2024

newren commented Nov 18, 2024

gitgitgadget bot commented Nov 18, 2024

gitgitgadget bot commented Nov 19, 2024

gitgitgadget bot commented Nov 19, 2024

gitgitgadget bot commented Nov 20, 2024

gitgitgadget bot commented Nov 20, 2024

gitgitgadget bot commented Nov 21, 2024

gitgitgadget bot commented Nov 22, 2024

fast-import: avoid making replace refs point to themselves #1824

Are you sure you want to change the base?

fast-import: avoid making replace refs point to themselves #1824

Conversation

newren commented Nov 7, 2024 • edited Loading

newren commented Nov 14, 2024

gitgitgadget bot commented Nov 14, 2024

gitgitgadget bot commented Nov 15, 2024

gitgitgadget bot commented Nov 15, 2024

gitgitgadget bot commented Nov 15, 2024

gitgitgadget bot commented Nov 15, 2024

gitgitgadget bot commented Nov 15, 2024

gitgitgadget bot commented Nov 15, 2024

newren commented Nov 18, 2024

gitgitgadget bot commented Nov 18, 2024

gitgitgadget bot commented Nov 19, 2024

gitgitgadget bot commented Nov 19, 2024

gitgitgadget bot commented Nov 20, 2024

gitgitgadget bot commented Nov 20, 2024

gitgitgadget bot commented Nov 21, 2024

gitgitgadget bot commented Nov 22, 2024

newren commented Nov 7, 2024 •

edited

Loading