Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk mode, recursive mode, and Octopus with recursive fallback mode #31

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

benba
Copy link
Contributor

@benba benba commented Mar 18, 2017

First a big thanks for the go rewrite, it such a breeze to read and test compared to the bash version!

This PR contains several changes:

  • Chunk mode as in the bash version
  • Bug fix in mergeHeads: the octopus was failing silently when merge-base was returning more than one base
  • Recursive mode
  • "Mixed" mode (octopus merge with a fallback to recursive mode)

Rational for the recursive mode:

After 9 month of octopus workflow usage, we sometimes encountered some weird merge conflicts due to an improper merge base selection by the octopus strategy
We also had some unfixable conflicts (with the current resolution recording mechanisms).

One example:

Let's say we have a file F renamed in branch B1 and modified in branch B2.
Octopus merge produce a conflict in that case.
Branch B2 is already rebased on top of another feature (B3)
Branch B2 is risky and may not make it to production
So we can't rebase B1 on top of B2 (because we are not sure to deliver it).
We can't rebase B2 in top of B1 because we would also need to rebase B3 on top of B1
And we can't merge B1 into B2 because then the octopus fail as well (in our specific case because of a wrong merge base selection)
The frustrating part is that in that particular case, a default recursive merge was working well, without any conflicts.

So there is a new option "-r" that allows to merge all branches, one by one with a standard merge.
It's implemented in a way that it autocommits conflicts resolved from a rerere resolution (so this modes allow conflict resolution as well).

It works fine, but the time to make a big merge tends to be much longer than the octopus strategy (150 branches with 6 conflicts in 7m15s instead of 4m30s for the bash version on my machine)

So this PR also include a combination of both mode if -r is specified with -s (let's say -s3 for example)
In that case the script will try to an octopus by chunks of 3, and if the octopus merge fails for one chunk it will rollback to the previous state, and retry with 3 recursive merge (one for each branch)
Surprisingly, this mode is faster than the bash version: 3m55s
I suspect the auto conflict resolution to be much faster in rerere.c than in the git-apply-conflict-resolution)

Summary

option result
default octopus strategy
-s num octopus strategy by chunks of num
-r recursive strategy (one branch at a time)
-r s num octopus strategy by chunks of num, if a chunk fails then recursive strategy for each branches in the chunk

I'll put it live on our CI on march 27, and will let you know if we reduce the number of unnecessary conflicts (and maybe make some improvements)

Last, I've made a lot of duplication in unit tests, if I have some spare time I'll factor them a bit.


This change is Reviewable

@codecov-io
Copy link

Codecov Report

Merging #31 into master will increase coverage by <.01%.
The diff coverage is 95%.

@@            Coverage Diff             @@
##           master      #31      +/-   ##
==========================================
+ Coverage    94.7%   94.71%   +<.01%     
==========================================
  Files           4        4              
  Lines         170      246      +76     
==========================================
+ Hits          161      233      +72     
- Misses          5        7       +2     
- Partials        4        6       +2
Impacted Files Coverage Δ
config/config.go 95.91% <100%> (+0.17%)
run/run.go 92.61% <94.8%> (+1.95%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4de89a5...8f20620. Read the comment docs.

@apflieger apflieger self-requested a review March 18, 2017 18:27
@apflieger
Copy link
Contributor

Hi @benba! It's been a while
You are tackling difficult stuff here, go easy on aspirin :)
I have a couple of questions and remarks

  • Do you currently use the go binary in production ?
  • The conflict mechanism is not implemented in go yet. Is this blocker for you to use it if you don't already ?
  • The performance you have is really bad. I guess it's windows. This should take less than a minute on linux.
  • rerere.c is definitely faster than our conflict resolution mechanism. That's not a surprise :)

Beside that, I give you my general thoughts on this.
You mentioned 2 problems actually:

  • Sometimes, dumb conflicts happen because of bad merge base detection. These conflicts are properly handled by the recursive merge strategy. We encountered this issue very early when we set up git-octopus. I didn't try to fix it at the time, instead, we chose to discourage the use of merge commits in our feature branches because merge commits are the cause of the bad merge base detection.
    Reproducing recursive merge is a good fix to support that though. I'll review the implementation in the review section of the pull request.
  • The conflict resolution mechanism has limits that makes some easy conflicts impossible to record. I know this issue as well. Here's the explanation for the example you described:
B1  B2
|   |
|   |
|   B3
|  /
| /
|/

Let's say that you recorded a conflict between B1 and B3 (call it C13). Then you have a 2nd conflict between B1 and B2. To record it you'll have to setup the conflict state by using git conflict from B2 with B1. At this point, the conflict state will be the combined conflict of C13 and you're new conflict. The resulting conflict resolution C12 will embed C13.
Still following me? :)
Now you run an octopus. Depending on the name of the branches, let's say that the merge order is B1 then B3 then B2.
HEAD is fast forwarded to B1.
B3 is merged. C13 is applied.
B2 is merged. The conflict signature will not be C12 because the C13 part of it has already been applied. git octopus won't find any resolution corresponding to this conflict and will fail.

This problem has several declination but it always comes back to branch order in the merge.
I think that the solution is to change the signature calculation of conflict but I don't plan to work on it any soon :/

@benba
Copy link
Contributor Author

benba commented Mar 19, 2017

Hi @apflieger,

Do you currently use the go binary in production ?

No yet, as I said I will deploy it next week (march 27th)

The conflict mechanism is not implemented in go yet. Is this blocker for you to use it if you don't already ?

I intend to use the octopus with recursive fallback strategy, so we will use rerere to record/resolve conflict.
I like the original script conflict resolution for it's easy sharing of conflicts, but rerere is working in all cases (rename, delete), it's more straightforward to view how the conflict was resolved, and you only redo one conflict at a time if a resolution break, so it seems good enough for us.
But maybe I'm missing something ? What was the motivation for the custom conflict resolution ?

The performance you have is really bad. I guess it's windows. This should take less than a minute on linux.
rerere.c is definitely faster than our conflict resolution mechanism. That's not a surprise :)

Windows ;(

we chose to discourage the use of merge commits in our feature branches because merge commits are the cause of the bad merge base detection

Didn't view it that way, but I guess it would be hard for us to do the same, because we build some feature branches on top of others (when we do some structural changes for example).
So downstream branches need to see changes in upstream one.
The solution is not git related though and would requires a better modularity of our app (so that we can depend on a versioned jar for example) but we not quite here yet.

Here's the explanation for the example you described:

Thanks for the explanation, if I'm understanding correctly it means that it could be less frequent with rerere since conflict signature are per conflict and not per branch.
It could even works for extreme cases, if we record the conflict by merging B3 to B1 first and B2 after (or even straight from the octopus merge, just after it breaks, because in our case we always merge in the same order to master).
C12 for a marker would then only contains the resolution from what's in conflict between B1+B3 and B2.
It would break if we remove B3 from the octopus, but could be again be solved with a second C12' resolution for that marker that exists only in that branch order (B1 then B2).
I start to getting very complicated but at a marker level this should be quite rare.
What do you think?

@benba
Copy link
Contributor Author

benba commented Mar 19, 2017

It would break if we remove B3 from the octopus, but could be again be solved with a second C12' resolution for that marker that exists only in that branch order (B1 then B2).

Actually forget about that part, it does not make any sense to remove B3 since it's contained in B2.
I guess I'm overheating ;-)

@apflieger
Copy link
Contributor

The motivation of conflict resolution was to push resolutions to the remote. How do you do that with rerere? I know some people who commit the rerere directory but it's a bit hacky.
This custom conflict mechanism is more extensible as well. By changing the conflict signature, as I mentioned, it could allow to do much more powerful things, such as changing code when 2 branches are merged even if there's no conflict.

I don't remember if rerere records at patch level (what you called marker I guess) or at file level.
In both cases, is has the same problem I described but the smaller the granularity of the resolution is, the fewer problem you will encounter. The current conflict signature is the worse case.

Copy link
Contributor

@apflieger apflieger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit rough but I don't think recursive merge should be an option because the usage it confusing. How do you choose to use it or not ? Somebody who installs octopus at first will probably not put any option. Then he will encounter those dumb conflict and activate -r and never come back to the standard strategy so the default becomes useless. This is weird.
It could be simply integrated in the standard strategy as a fallback.
The current merge logic is (https://github.com/lesfurets/git-octopus/blob/master/run/run.go#L116):

  • git read-tree: this merges trees but not files. Conflicting files are marked as conflicts in the index.
  • git merge-index: this tries to do simple merge on conflicting files from the index.
  • git apply-conflict-resolution: executed if the remaining conflict has a corresponding resolution. Not implemented in go yet.

We could add a step before git apply-conflict-resolution. git merge-file may do the job. I never used it but the doc says:

git merge-file is designed to be a minimal clone of RCS merge; that is, it implements all of RCS merge's functionality which is needed by git[1].

I understand it as "Implement git-merge at a file level"

_, err = context.Repo.Git("read-tree", "-u", "-m", "--aggressive", common, mrt, lsRemoteEntry.Sha1)
commonArray := strings.Split(common, "\n")
_, err = context.Repo.Git(append([]string{"read-tree", "-u", "-m", "--aggressive"},
append(commonArray, mrt, lsRemoteEntry.Sha1)...)...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems weird because read-tree can't take more than 3 trees. In case of commonArray having more than one tree, this might end up with unexpected result. I'm not sure of what we should do instead but taking the first tree of commonArray could be ok.

_, err = context.Repo.Git("read-tree", "-u", "-m", "--aggressive", commonArray[0], mrt, lsRemoteEntry.Sha1)

The response might be in git's code because they certainly handle this case on the merge command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's in https://github.com/git/git/blob/master/builtin/read-tree.c#L216
I'm not sure to fully get it but I guess it skips the first trees and only keep the last 3 one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Actually, it behaves the same way I was suggesting. The order of read-tree args is . The code you linked put head at max_args - 1. Then the 3 assignations are here https://github.com/git/git/blob/6e3a7b3398559305c7a239a42e447c21a8f39ff8/unpack-trees.c#L1720

  • head at L1740. Takes args[max_args - 2] from your link.
  • remote at L1720. Takes args[max_args - 1]
  • mergeBase at L1739 surprisingly called index. Takes args[0]

We'd better put what I suggested rather than relying on this hidden behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you wish ;)
But git-merge-octopus.sh don't bother with such care, it just pass all the bases: https://github.com/git/git/blob/master/git-merge-octopus.sh#L72

@benba
Copy link
Contributor Author

benba commented Mar 19, 2017

I perfectly understand your concern.
The way I view it is that the script allows to merge several branches to do automated merges.
The fact that it's done with a single merge point (octopus) or by several merging point (recursive) is an implementation detail.
In the end, the choice depends on the constraint you put on your workflow, and what you want to obtain in your "master" branch.
But It's true that the idea is not to switch back and forth between the strategies to solve a conflict, it's a choice that should be made at the beginning and not changed after (especially if conflict resolution recording mechanism is not the same).
So maybe it's best to implement this option in a separate command, that would share stuff with the octopus one (exclusions, pattern matching on branches, ...).

We could add a step before git apply-conflict-resolution. git merge-file may do the job.

I will try to put some thought by inserting a alternative merge strategies between the two commands, but I'm not sure git-merge-file is the answer though, since commit-tree is called at the very end, we will not be able to use a tool like merge-base to find a proper base between the current state of the octopus and the branch being merged.

I don't remember if rerere records at patch level

Yes patch level

How do you do that with rerere? I know some people who commit the rerere directory but it's a bit hacky.

There are several techniques, the one I lean to for the moment is to commit it but in separate repo (or in the same but in an orphan branch).

@apflieger
Copy link
Contributor

I don't get why it couldn't work in a unified program. How does it relates to the use of commit-tree? The biggest concern I have is how behaves git merge-file. For instance, it doesn't implement rerere so it has to be added on our side. But that's still simplier than having multiple strategies.

@benba
Copy link
Contributor Author

benba commented Mar 19, 2017

ok I'll try to put some thought on it this week if I have time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants