Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix propagation of conflict resolution with tree-level conflicts #2287

Merged
merged 2 commits into from
Sep 28, 2023

Conversation

martinvonz
Copy link
Member

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

All non-test callers already have a `Merge` object, so let's pass that
instead. We thereby simplify the callers a little, and we enforce the
"adds.len() == removes.len() + 1" constraint in the type.
Copy link
Contributor

@yuja yuja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm, but can you take another look, @ilyagr ?

@ilyagr
Copy link
Contributor

ilyagr commented Sep 24, 2023

This is a confusing bug. It might take me a bit of time to write everything down and think through what simplifications happen when.

(I'm assuming this is not urgent. If it is annoying enough, we could merge this and then think about moving/removing some simplifications later. We could also add a jj debug conflict simplify command in case similar bugs show up later).

@martinvonz martinvonz force-pushed the push-olnnyyklvlmp branch 2 times, most recently from bbfe3a8 to 82b54d2 Compare September 24, 2023 22:23
@martinvonz
Copy link
Member Author

(I'm assuming this is not urgent. If it is annoying enough, we could merge this and then think about moving/removing some simplifications later. We could also add a jj debug conflict simplify command in case similar bugs show up later).

Not urgent, but I would like to have it fixed before we recommend enabling tree-level conflicts to people. I think I've run into this bug twice myself, but I may misremember. If no one else who has enabled tree-level conflicts has run into it, then it's probably low priority.

lib/src/tree.rs Outdated Show resolved Hide resolved
@ilyagr
Copy link
Contributor

ilyagr commented Sep 27, 2023

I think I figured out a simpler way to understand this bug.

Ignoring the difference between Merge<Tree> and MergedTree, they have a .simplify() operation that simplifies identical trees and a .resolve() operation that tries to simplify conflicts in each file (hunk-by-hunk). The problem as I see it is that .resolve().simplify() is different from .resolve(). (The test in the PR uses MergedTree::merge(). That's simplified inside merge_trees, which is equivalent to .resolve() IIRC; I'm writing this from memory). If those operations were the same, the most worrying part of the bug you saw -- that the conflict was simplified differently in the working copy than in the repo -- would be impossible (unless I missed something).

I have a demo in https://github.com/ilyagr/jj/commits/my-pr/2287, see especially the commit with the artful name "resolve 1".

I think we should make it a rule that when we have two operations that simplify to different degrees, both should be idempotent, and the combination of the two in either order should be equivalent to the stronger of the two. (In principle, Merge<Tree> could have some intermediate degrees of simplification between simplify and resolve, like resolving only identical files file-by-file. If there are more than 2 levels of simplification, the rule should apply to each pair of them).

In the test you wrote, your fix fixes the problem I described as well as the bug you saw. However, I think that perhaps we should solve this in a way that more clearly corresponds to the problem as I described it. One alternative that seems to then exist is to put a loop inside merge_trees so that it simplifies the result and, if simplification changed something, reruns itself on the simplified result. This would also affect .resolve(). It should both solve the bug that you saw and perhaps related bugs (if they exist).

Your solution might easily be more efficient, but it feels like black magic and I can't think of a clear invariant it creates or decide whether it makes .resolve().simplify() equivalent to .resolve(). If we test that those two are the same, and your trick makes something faster, we could call it an optimization.

WDYT?

@martinvonz
Copy link
Member Author

I think I figured out a simpler way to understand this bug.

Ignoring the difference between Merge<Tree> and MergedTree, they have a .simplify() operation that simplifies identical trees and a .resolve() operation that tries to simplify conflicts in each file (hunk-by-hunk). The problem as I see it is that .resolve().simplify() is different from .resolve(). (The test in the PR uses MergedTree::merge(). That's simplified inside merge_trees, which is equivalent to .resolve() IIRC; I'm writing this from memory). If those operations were the same, the most worrying part of the bug you saw -- that the conflict was simplified differently in the working copy than in the repo -- would be impossible (unless I missed something).

I think that's correct, except that merge_trees() doesn't do simplification (because it's called recursively, and we need a consistent number of sides across the whole tree).

By the way, we don't actually use .resolve() anywhere right now. I think I had imagined we'd use it in MergedTree::merge() but ended up not needing it there.

I have a demo in https://github.com/ilyagr/jj/commits/my-pr/2287, see especially the commit with the artful name "resolve 1".

I think we should make it a rule that when we have two operations that simplify to different degrees, both should be idempotent, and the combination of the two in either order should be equivalent to the stronger of the two. (In principle, Merge<Tree> could have some intermediate degrees of simplification between simplify and resolve, like resolving only identical files file-by-file. If there are more than 2 levels of simplification, the rule should apply to each pair of them).

In the test you wrote, your fix fixes the problem I described as well as the bug you saw. However, I think that perhaps we should solve this in a way that more clearly corresponds to the problem as I described it. One alternative that seems to then exist is to put a loop inside merge_trees so that it simplifies the result and, if simplification changed something, reruns itself on the simplified result. This would also affect .resolve(). It should both solve the bug that you saw and perhaps related bugs (if they exist).

Right, that's one of the solutions I mentioned, except that I didn't suggest doing it in a loop. It was my first thought as well. However, I think the only thing that can be different on the next attempt to resolve conflicts is hunk-level merging. I don't think TreeValue merging can produce different results, because we usually don't recurse into them. When we do, it's because they're they're all trees, but there should be no difference after simplification of those, except for the file-level merging that my fix addressed.

Your solution might easily be more efficient, but it feels like black magic and I can't think of a clear invariant it creates or decide whether it makes .resolve().simplify() equivalent to .resolve(). If we test that those two are the same, and your trick makes something faster, we could call it an optimization.

Hopefully my explanation above make it feel a bit less like black magic.

@ilyagr
Copy link
Contributor

ilyagr commented Sep 28, 2023

Thank you for the explanation!

I think that's correct, except that merge_trees() doesn't do simplification (because it's called recursively, and we need a consistent number of sides across the whole tree).

It's a very good point that merge_trees can't easily do simplification. Fortunately, it's a non-public function. We could do the simplification in merge(), as I think you mentioned below.

By the way, we don't actually use .resolve() anywhere right now. I think I had imagined we'd use it in MergedTree::merge() but ended up not needing it there.

Idle thoughts: Ultimately, we could write Merge::new(vec![base_tree], vec![tree1, tree2]).resolve() instead of base_tree.merge(...). That requires either unifying MergedTree and Merge<Tree> or making the former into something like Enum(Merge<Tree>, LegacyTree).

Right, that's one of the solutions I mentioned, except that I didn't suggest doing it in a loop. It was my first thought as well.

Ah, I didn't realize that, but it would make sense.

However, I think the only thing that can be different on the next attempt to resolve conflicts is hunk-level merging. I don't think TreeValue merging can produce different results, because we usually don't recurse into them. When we do, it's because they're they're all trees, but there should be no difference after simplification of those, except for the file-level merging that my fix addressed.

This is not obvious; I will need to think about this more. Assuming you are correct, I'd still prefer some more explicit explanations of this invariant and verification that it holds. We could, for example, put it some debug_assert-s or mention it in tests.

BTW, the statement that .simplify().resolve() is equivalent to .resolve() would need to be tested separately. Perhaps we can have debug-assert-s for that too, though they would be less efficient.

At this point, my suggestions are a bit vague, so I leave it up to you to what degree (if at all) you want to address them in this PR. At this point, I think we're mostly on the same page at least about what the intention is of how the bugfix should work, and we can change things up beyond this PR.

Hopefully my explanation above make it feel a bit less like black magic.

Not right at this point, but maybe in a bit of time :).

I ran into a bug the other day where `jj status` said there was a
conflict in a file but there were no conflict markers in the working
copy. The commit was created when I squashed a conflict resolution
into the commit's parent. The rebased child commit then ended up in
this state. I.e., it looked something like this before squashing:

```
C (no conflict)
|
| B conflict
|/
A conflict
```

The conflict in B was different from the conflict in A. When I
squashed in C, jj would try to resolve the conflicts by first creating
a 7-way conflict (3 from A, 3 from B, 1 from C). Because of the exact
content-level changes, the 7-way conflict couldn't be automatically
resolved by `files::merge()` (the way it currently works
anyway). However, after simplifying the conflict, it could be
resolved. Because `MergedTree::merge()` does another round of conflict
simplification of the result at the end of the function, it was the
simplifed version that actually got stored in the commit. So when
inspecting the conflict later (e.g. in the working copy, as I did), it
could be automatically resolved.

I think there are at least two ways to solve this. One is to call
`merge_trees()` again after calling `tree.simplify()` in
`MergedTree::merge()`. However, I think it would only matter in the
case of content-level conflicts. Therefore, it seems better to make
the content-level resolution solve this case to start with. I've done
that by simplifying the conflict before passing it into
`files::merge()`. We could even do the fix in `files::merge()`, but
doing it before calling it has the advantage that we can avoid reading
some unchanged content from the backend.
@martinvonz
Copy link
Member Author

However, I think the only thing that can be different on the next attempt to resolve conflicts is hunk-level merging. I don't think TreeValue merging can produce different results, because we usually don't recurse into them. When we do, it's because they're they're all trees, but there should be no difference after simplification of those, except for the file-level merging that my fix addressed.

This is not obvious; I will need to think about this more. Assuming you are correct, I'd still prefer some more explicit explanations of this invariant and verification that it holds. We could, for example, put it some debug_assert-s or mention it in tests.

Yeah, that makes sense. I've added a debug_assert! for the idempotence. I'll merge this PR with that addition. Thanks for reviewing (both of you!) and thinking about these these.

@martinvonz martinvonz enabled auto-merge (rebase) September 28, 2023 05:06
@martinvonz martinvonz merged commit 7fda80f into main Sep 28, 2023
15 checks passed
@martinvonz martinvonz deleted the push-olnnyyklvlmp branch September 28, 2023 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants