Git源码学习系列（八）——git rebase #167

soapgu · 2022-08-24T08:05:09Z

前言

终于学到了rebase，开始挑战git最难理解的命令还是有点小兴奋。
首先需要把前面的课补一下。

echo $result_commit > "$GIT_DIR"/HEAD
git-diff-tree -p $head $result_commit | git-apply --stat

这也是git merge 脚本的最后一步，我们没注意到语句
发生在commit以后

我们“脑补”的处理，直接把commit成功的index，checkout出来就够OK了嘛。
其实不是这样处理的

打patch
git-diff-tree -p $head $result_commit
这里就是把$head到$result_commit的变更-p参数打成补丁。
应用变化
git-apply --stat
给git-apply处理。应用到工作区（index），这里是pipeline上下文命令
git patch练习
演习目标及设定

假设我在reabse分支上最后一个修改要“飞”到master分支，而这个修改不会有冲突（降低难度）

guhui@guhuideMacBook-Pro GitLearn % git diff-tree 996261f8cbfc5a5345d050c98b0947f6947bff35 -p
996261f8cbfc5a5345d050c98b0947f6947bff35
diff --git a/meeting.js b/meeting.js
index 5ea0546..9d4943e 100644
--- a/meeting.js
+++ b/meeting.js
@@ -1,3 +1,4 @@
+//just add a commit
 var express = require('express');
 var router = express.Router();
 var M = require('../models/meeting');
guhui@guhuideMacBook-Pro GitLearn % git diff-tree 996261f8cbfc5a5345d050c98b0947f6947bff35 -p > mypatch.patch

打patch补丁
这里新版本的命令和0.99差别不大，一串sha1就是我的这个提交
注意默认git diff-tree是要提供2个tree的，如果只提供一个，那就是tree-> parent和tree比较。
cat下mypatch.patch，可以看到已经“实体化”补丁了

2.切换到master分支

guhui@guhuideMacBook-Pro GitLearn % git status
位于分支 master
您的分支与上游分支 'origin/master' 一致。

未跟踪的文件:
  （使用 "git add <文件>..." 以包含要提交的内容）
	.DS_Store
	mypatch.patch

提交为空，但是存在尚未跟踪的文件（使用 "git add" 建立跟踪

应用patch

guhui@guhuideMacBook-Pro GitLearn % git apply mypatch.patch 
guhui@guhuideMacBook-Pro GitLearn % git status
位于分支 master
您的分支与上游分支 'origin/master' 一致。

尚未暂存以备提交的变更：
  （使用 "git add <文件>..." 更新要提交的内容）
  （使用 "git restore <文件>..." 丢弃工作区的改动）
	修改：     meeting.js

未跟踪的文件:
  （使用 "git add <文件>..." 以包含要提交的内容）
	.DS_Store
	mypatch.patch

修改尚未加入提交（使用 "git add" 和/或 "git commit -a"）

用了git apply patchfile 以后就直接把修改应用到工作区了。
这里莽了一点，应该要用git apply check先检查下有没有冲突，再apply

提交push到远端

略

Rebase源码分析
预设战场

假设主分支和下游rebase分支，从建立分叉开始 master分支3个提交 rebase分支4个提交

guhui@guhuideMacBook-Pro GitLearn % git log --graph --oneline --decorate --boundary master...rebase
* 5d36a9c (HEAD -> rebase, origin/rebase) add file-a
* 996261f modify meetting.js
* 0d0bdfd modify my sh
* 4c853fa add sh file
| * df4f112 (origin/master, origin/HEAD, master) new line for readme
| * 70e5d66 apply patch for modify meeting.js
| * c700556 add 1111111
|/  
o f7943c3 add file

最下面的commit是汇聚点，可以忽略
70e5d66 apply patch for modify meeting.js 和 996261f modify meetting.js 的修改内容一致，是用git apply完成的。
其他都是各玩各的。

我们的目标是把rebase分支对master分支做变基（rebase）

git-rebase-script
非常意外的是，这么复杂的操作竟然用shell脚本就搞定了，才49行。
代码少不代表好理解！

解析upstream，head参数
使用git-rev-parse完成，如果是ref表示会帮忙转换sha1。
参数理解：
upstream/head：上游分支/变基分支。
其实结合工作场景。对于远端分支就是 orgin/aaa ， head就是 aaa，master分支和分出去的rebase分支又是另一种上下游分支的组合。所以这个upstream/head灵活适用不用场景
cache准备以及HEAD切换

git-read-tree -m -u $junio $linus &&
echo "$linus" >"$GIT_DIR/HEAD" || exit

这也是我第一个“卡住”的点
因为read-tree用了两个参数。
就是使用Two-way merge的逻辑
这其中有两大疑问点解决不了，我一下卡了好久
(1) 既然变基是用upstream为基，事实上HEAD以及直接指向过去了，传HEAD分支进去有意义吗？！因为内容肯定直接用的upstream的，既然不会用merge不是白merge，为啥多此一举？
(2) 文档说明和调用自相矛盾

Two Tree Merge
~~~~~~~~~~~~~~

Typically, this is invoked as "git-read-tree -m $H $M", where $H
is the head commit of the current repository, and $M is the head
of a foreign tree, which is simply ahead of $H (i.e. we are in a
fast forward situation).

文档里面说了 $H $M是fast forward。而我目前预设的场景明显就不可能fast forward！然后shell脚本和C代码前前后后查过了，也没有fast forward的检查。既然实际参数和要求不符，Linus Torvalds怎么会放这么撕裂的代码进去那？想不明白啊

插播Two Tree Merge和"carry forward" rule

这两个疑问直到我昨天想到下班才回过味来
仔细看merge法则，Two Tree Merge 所有的$H的信息都不会“带过来”

Two Tree Merge其实是三方比较。还有一个是index。

仔细看文档：
(1) The current index and work tree is derived from $H, but
the user may have local changes in them since $H;
(2) The user wants to fast-forward to $M.
In this case, the git read-tree -m $H $M command makes sure that no local change is lost as the result of this "merge".

我弄错了merge重点，这里的三方合并其实是index的合并！我想当然把index==HEAD，是同步的了。这里考虑了index是比HEAD要新的情况，甚至考虑了workspace也不是干净的情况！
这里就是演绎法。如果把$H换成$M，index带过去是否合理
就是$H -> $M
-> index
需要三角比较，具个栗子
如果$H和$M相等，那么我们就可以当成同一棵树去推演，那么index就应该比$M 新
而如果index和$H相等，那么就是$M新，index要果断抛弃
如果三者都不相同，就无法判断谁新来之能冲突出错。
所以说这里git-read-tree其实有3个用意

把index切到$M，主要目的
顺便把能带的cache改动带过去
$H 、 $M是不是fast-forward不重要，主要是index能不能带的依据。换句话说把$H 、 $M当作fast-foward来merge。反正$H的信息不会被merge进去也没关系。

回归rebase，进入git cherry

昨天Two Tree Merge已经搞不清楚。git cherry完全看不懂，心态直接爆炸。
其实前后是铺垫和收尾。git cherry才是rebase的真正“内核”
这个内核的代码也是shell脚本！86行，代码少不代表容易理解！代码少不代表容易理解！代码少不代表容易理解！
0.99的相关代码注释看了半天不理解。
好在git官网的文档好理解多了。虽然代码差异大，功能变化不大！

git-cherry

0.99的说明
  __*__*__*__*__> <upstream>
            /
  fork-point
            \__+__+__+__+__+__+__+__> <head>

Each commit between the fork-point and <head> is examined, and
compared against the change each commit between the fork-point and
<upstream> introduces.  If the change does not seem to be in the
upstream, it is shown on the standard output.
官网的说明：
Determine whether there are commits in <head>..<upstream> that are equivalent to those in the range <limit>..<head>.
The equivalence test is based on the diff, after removing whitespace and line numbers. git-cherry therefore detects when commits have been "copied" by means of [git-cherry-pick[1]](https://git-scm.com/docs/git-cherry-pick), [git-am[1]](https://git-scm.com/docs/git-am) or [git-rebase[1]](https://git-scm.com/docs/git-rebase).

Outputs the SHA1 of every commit in <limit>..<head>, prefixed with - for commits that have an equivalent in <upstream>, and + for commits that do not.

其实要一起看才懂，我反复看了好多遍啊好多遍！
重点

返回的是什么？是..之间的提交。如果是0.99版本就是fork-point到head之间的提交
把中的等价commit“摘出来”
git-cherry therefore detects when commits have been "copied"，就是前面的patch或者cherry-pick这种提交，虽然commit的id不一样，但是变更内容是一样的。

实现过程

抓取fork-point和的commit
抓取fork-point和的commit

这两个一起说。shell代码里面是这样的

# Note that these list commits in reverse order;
# not that the order in inup matters...
inup=`git-rev-list ^$junio $linus` &&
ours=`git-rev-list $junio ^$linus` || exit

这是一个范围我理解，前面再加个^什么鬼
这个需要仔细看文档
gitrevisions

The ... (three-dot) Symmetric Difference Notation

    A similar notation r1...r2 is called symmetric difference of r1 and r2 and is defined as r1 r2 --not $(git merge-base --all r1 r2). It is the set of commits that are reachable from either one of r1 (left side) or r2 (right side) but not from both.

这里以前的merge学的知识就可以接上，其实是一个（r1，base，r2）的历史
注意：排除base点

好了，上正题，^$junio $linus 和 $junio ^$linus 表达啥，差了一个^

^<rev> (caret) Notation

    To exclude commits reachable from a commit, a prefix ^ notation is used. E.g. ^r1 r2 means commits reachable from r2 but exclude the ones reachable from r1 (i.e. r1 and its ancestors).

英文只要4级过关的小伙伴应该能看懂，不翻译了直接上图了

自然把base点也去掉了，完美表达了，上游分支提交和下游分支提交。好，完美解释了语法。

上游分支commit patch分析
先上代码，刚看这段代码绝对是懵逼的

for c in $inup
do
	git-diff-tree -p $c
done | git-patch-id |
while read id name
do
	echo $name >>$patch/$id
done

首先循环所有的上游分支的commit
执行 git diff-tree $commit -p | git patch-id
这句话是啥意思那？前面半句我们学过patch知道是生成当前commit相对parent的diff的patch。
后面git patch-id是啥。
我们先执行下看看

guhui@guhuideMacBook-Pro GitLearn % git diff-tree e9de478b6844242b83e0a770028434ba55446cc5 -p | git patch-id
642185fde883720d9dc4a8424648e1d9f19f0580 e9de478b6844242b83e0a770028434ba55446cc5
guhui@guhuideMacBook-Pro GitLearn %

这里可以看到，产生了两个sha1，后面一个我认识就是commitid。前面那个是啥
git-patch-id - 计算补丁的唯一 ID
从标准输入中读取补丁并为其计算补丁 ID。

“补丁ID”只不过是与补丁相关的文件差异的 SHA-1之和，忽略空白和行号。因此，它“相当稳定”，但同时也是相当独特的，即具有相同“补丁ID”的两个补丁几乎保证是相同的东西。

IOW，你可以使用这个东西来寻找可能的重复提交。

git patch-id
所以，patchid只和差异有关。这就是patchid的妙用，用来筛掉那些“相等”的提交
read id name 就是把id=patchid，name=commit，再把commit存入 patch/$patchid “临时文件”。有点dictionary<string,string>的味道。

LF='
'

O=
for c in $ours
do
	set x `git-diff-tree -p $c | git-patch-id`
	if test "$2" != ""
	then
		if test -f "$patch/$2"
		then
			sign=-
		else
			sign=+
		fi
		case "$O" in
		'')	O="$sign $c" ;;
		*)	O="$sign $c$LF$O" ;;
		esac
	fi
done
case "$O" in
'') ;;
*)  echo "$O" ;;
esac

下半段循环代码仍然是“谜之代码”，一下子读和天书没两样！
我们玩下赖皮，从“结果”反推代码

guhui@guhuideMacBook-Pro GitLearn % git cherry -v  origin/master rebase                                     
+ 4c853fa059275b82319f4a0824d91b85e1f5b2a8 add sh file
+ 0d0bdfd9f2966dfa9dcb601240bfed524b953c1f modify my sh
- 996261f8cbfc5a5345d050c98b0947f6947bff35 modify meetting.js
+ 5d36a9cb66a0a6aa678d9db36edde3d3b3cfa1b1 add file-a
guhui@guhuideMacBook-Pro GitLearn %

这里4个提交就是下游分支的提交，其中有个已经打过patch的分支。
被误读的$2，这里主要shell脚本不熟搞了个乌龙。$2想当然的以为是脚本的输入的第二个参数，结果读不下去了。
此一时彼一时。前面set语句把参数重置了。也是写了个sh代码做实验才清楚

#!/bin/sh
echo 'begin get patch id'
set x `git diff-tree e9de478b6844242b83e0a770028434ba55446cc5 -p | git patch-id`
echo "start my sh>>>"
echo "$1"
echo "$2"
echo "$3"
echo "sh is end..."

guhui@guhuideMacBook-Pro GitLearn % ./test.sh
begin get patch id
start my sh>>>
x
642185fde883720d9dc4a8424648e1d9f19f0580
e9de478b6844242b83e0a770028434ba55446cc5
sh is end...

只是真的不明白，为啥要加个x，“占位符”。不纠结了往前走。
好了，再把patchid给取出来，如果有重复就在前面加-，不重复加+
接下来就是行的“拼接”，可以看出是逆序拼的，这细节先不究。

好了，git cherry是完全搞清楚了

rebase终章

while read sign commit
do
	case "$sign" in
	-) continue ;;
	esac
	S=`cat "$GIT_DIR/HEAD"` &&
        GIT_EXTERNAL_DIFF=git-apply-patch-script git-diff-tree -p $commit &&
	git-commit-script -m "$commit" || {
		echo $commit >>$fail
		git-read-tree --reset -u $S
	}
done

略去不重要的代码，主要代码已经不多了。

先自问一问自己，看看分析完以后能不能回答。

rebase后的commit是rebase前的commit吗？
rebase前后sha1会不会变化
假设是一样的，我还rebase个寂寞我直接复制不得了，假设是不一样的，那我变化的部分不是一样的嘛，为啥要做不一样的sha1嘛，blob不是只要内容一样sha1就一样嘛，这行为不一致嘛。（连环3问）你答一下

git-apply-patch-script做了啥？
和我们前面的默认git apply不同。这个shell脚本除了把改动变动到工作区以外还同时更新cache。
同时考虑到增删改三种操作，包括内容的变更以及mode权限的变更。
总结下来就是commit的预处理
git-commit-script 做了啥

git commit [-m existing-commit] [<path>...]
注意，这里的git commit不能和主版本的commit混起来。这里-m不是message而是git cherry出来需要重新再commit一次的commit！

这里通过git-cat-file commit命令，把原commit中的GIT_AUTHOR_NAME，GIT_AUTHOR_EMAIL，GIT_AUTHOR_DATE给抓出来用来作为这次提交，当然commit的message也是复制过来

为啥GIT_COMMITTER_NAME，GIT_COMMITTER_EMAIL , committer data不复制过来啊?
这才是变基的精髓，GIT_AUTHOR_NAME，GIT_AUTHOR_EMAIL，GIT_AUTHOR_DATE和Commit message都沿用，变更的diff也一样，但是committer data和committer author必须是你自己操作人。否则这个commit的痕迹也没留
变基后commit的sha1肯定不一样了！

随便举一个commit例子

guhui@guhuideMacBook-Pro GitLearn % git cat-file -p 5d36a9cb66a0a6aa678d9db36edde3d3b3cfa1b1
tree 138f238493ad84456cdb5d14ee7c9cba69944fbf
parent 996261f8cbfc5a5345d050c98b0947f6947bff35
author soapgu <[email protected]> 1661334685 +0800
committer soapgu <[email protected]> 1661334685 +0800

commit寸的内容就是这样，就算是rebase前后tree的内容一样（很可能也不一样）
但是committe的date一定变了，author可能变的。
而sha1就是内容的数字签名，是肯定不一样的！

rebase 验证

让我们回到预设目标
目前分支的情况

* 5d36a9c (HEAD -> rebase, origin/rebase) add file-a
* 996261f modify meetting.js
* 0d0bdfd modify my sh
* 4c853fa add sh file
| * df4f112 (origin/master, origin/HEAD, master) new line for readme
| * 70e5d66 apply patch for modify meeting.js
| * c700556 add 1111111
|/  
o f7943c3 add file

会变的三个带+带commit

guhui@guhuideMacBook-Pro GitLearn % git cherry -v  origin/master rebase                                     
+ 4c853fa059275b82319f4a0824d91b85e1f5b2a8 add sh file
+ 0d0bdfd9f2966dfa9dcb601240bfed524b953c1f modify my sh
- 996261f8cbfc5a5345d050c98b0947f6947bff35 modify meetting.js
+ 5d36a9cb66a0a6aa678d9db36edde3d3b3cfa1b1 add file-a

开始rebase

guhui@guhuideMacBook-Pro GitLearn % git rebase master
warning: 跳过了先前已应用的提交 996261f
提示：使用 --reapply-cherry-picks 来包括跳过的提交
提示：Disable this message with "git config advice.skippedCherryPicks false"
成功变基并更新 refs/heads/rebase。

看下warning，跳过了先前已应用的提交 996261f，好像能对上，再看下变基后的commit

guhui@guhuideMacBook-Pro GitLearn % git log --oneline
32a4e70 (HEAD -> rebase) add file-a
fce747c modify my sh
8f491e1 add sh file
df4f112 (origin/master, origin/HEAD, master) new line for readme
70e5d66 apply patch for modify meeting.js
c700556 add 1111111
f7943c3 add file
7063cdb add ++++
a2cf6ea add----
1bc2fa7 add ~~~line

OK，master变到中间来了，master后面追加了三个commit，看看是不是前面的+

+ 4c853fa059275b82319f4a0824d91b85e1f5b2a8 add sh file  ----------->   8f491e1 add sh file
+ 0d0bdfd9f2966dfa9dcb601240bfed524b953c1f modify my sh---------->   fce747c modify my sh
- 996261f8cbfc5a5345d050c98b0947f6947bff35 modify meetting.js----->   跳过了先前已应用的提交 996261f
+ 5d36a9cb66a0a6aa678d9db36edde3d3b3cfa1b1 add file-a------------->  32a4e70 (HEAD -> rebase) add file-a

最后再看下rebase后的commit的的commiter和date和author
git log默认是不显示的，需要加上--pretty=fuller参数

guhui@guhuideMacBook-Pro GitLearn % git log -5 --pretty=fuller
commit 32a4e704b3d9b3c39f078564bd607fdf5956761a (HEAD -> rebase)
Author:     soapgu <[email protected]>
AuthorDate: Wed Aug 24 17:51:25 2022 +0800
Commit:     soapgu <[email protected]>
CommitDate: Fri Aug 26 21:36:17 2022 +0800

    add file-a

commit fce747c8e0235af216262686b949b43660396c63
Author:     soapgu <[email protected]>
AuthorDate: Wed Aug 24 15:37:12 2022 +0800
Commit:     soapgu <[email protected]>
CommitDate: Fri Aug 26 21:36:17 2022 +0800

    modify my sh

commit 8f491e129c3ea2b8e8c5038cdcfa6f508120cb78
Author:     soapgu <[email protected]>
AuthorDate: Wed Aug 24 15:24:23 2022 +0800
Commit:     soapgu <[email protected]>
CommitDate: Fri Aug 26 21:36:17 2022 +0800

    add sh file

commit df4f1128474b8fc002712de038b825bba0ee14ea (origin/master, origin/HEAD, master)
Author:     soapgu <[email protected]>
AuthorDate: Wed Aug 24 17:47:02 2022 +0800
Commit:     GitHub <[email protected]>
CommitDate: Wed Aug 24 17:47:02 2022 +0800

    new line for readme

commit 70e5d665cf4793f09b8d1c93f246868cf50eb228
Author:     soapgu <[email protected]>
AuthorDate: Wed Aug 24 16:41:49 2022 +0800
Commit:     soapgu <[email protected]>
CommitDate: Wed Aug 24 16:41:49 2022 +0800

    apply patch for modify meeting.js

可以看到三条记录，Author，AuthorDate，message都被抄过来了
CommitDate是“新的”，当然如果换一个人rebase， commiter author也是会变的。
验证通过！
变基前

变基后

可以看出github的Network graph 这时间线是按CommitDate来标横坐标的

Rebase真相

先上两张盗图，我觉得总结很好

The text was updated successfully, but these errors were encountered:

soapgu added IDE Good for newcomers Git labels Sep 9, 2022

soapgu changed the title ~~Git源码学习系列（八）~~ Git源码学习系列（八）——git rebase Sep 9, 2022

soapgu added a commit that referenced this issue Sep 13, 2022

update to #167

4de5154

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Git源码学习系列（八）——git rebase #167

Git源码学习系列（八）——git rebase #167

soapgu commented Aug 24, 2022 •

edited

Loading

前言

git patch练习

Rebase源码分析

插播Two Tree Merge和"carry forward" rule

回归rebase，进入git cherry

rebase终章

rebase 验证

Rebase真相

Git源码学习系列（八）——git rebase #167

Git源码学习系列（八）——git rebase #167

Comments

soapgu commented Aug 24, 2022 • edited Loading

前言

git patch练习

Rebase源码分析

插播Two Tree Merge和"carry forward" rule

回归rebase，进入git cherry

rebase终章

rebase 验证

Rebase真相

soapgu commented Aug 24, 2022 •

edited

Loading