Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR to run functional + perf tests on unpack_trees() optimization #16

Closed
wants to merge 4 commits into from
Closed

PR to run functional + perf tests on unpack_trees() optimization #16

wants to merge 4 commits into from

Conversation

benpeart
Copy link

@benpeart benpeart commented Aug 6, 2018

Do not merge, just created to get build for testing.

We're going to optimize unpack_trees() a bit in the following
patches. Let's add some tracing to measure how long it takes before
and after. This is the baseline ("git checkout -" on gcc.git, 80k
files on worktree)

    0.018239226 s: read cache .git/index
    0.052541655 s: preload index
    0.001537598 s: refresh index
    0.168167768 s: unpack trees
    0.002897186 s: update worktree after a merge
    0.131661745 s: repair cache-tree
    0.075389117 s: write index, changed mask = 2a
    0.111702023 s: unpack trees
    0.000023245 s: update worktree after a merge
    0.111793866 s: diff-index
    0.587933288 s: git command: /home/pclouds/w/git/git checkout -

Signed-off-by: Nguyễn Thái Ngọc Duy <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
In order to merge one or many trees with the index, unpack-trees code
walks multiple trees in parallel with the index and performs n-way
merge. If we find out at start of a directory that all trees are the
same (by comparing OID) and cache-tree happens to be available for
that directory as well, we could avoid walking the trees because we
already know what these trees contain: it's flattened in what's called
"the index".

The upside is of course a lot less I/O since we can potentially skip
lots of trees (think subtrees). We also save CPU because we don't have
to inflate and the apply deltas. The downside is of course more
fragile code since the logic in some functions are now duplicated
elsewhere.

"checkout -" with this patch on gcc.git:

    baseline      new
  --------------------------------------------------------------------
    0.018239226   0.019365414 s: read cache .git/index
    0.052541655   0.049605548 s: preload index
    0.001537598   0.001571695 s: refresh index
    0.168167768   0.049677212 s: unpack trees
    0.002897186   0.002845256 s: update worktree after a merge
    0.131661745   0.136597522 s: repair cache-tree
    0.075389117   0.075422517 s: write index, changed mask = 2a
    0.111702023   0.032813253 s: unpack trees
    0.000023245   0.000022002 s: update worktree after a merge
    0.111793866   0.032933140 s: diff-index
    0.587933288   0.398924370 s: git command: /home/pclouds/w/git/git

Another measurement from Ben's running "git checkout" with over 500k
trees (on the whole series):

    baseline        new
  ----------------------------------------------------------------------
    0.535510167     0.556558733     s: read cache .git/index
    0.3057373       0.3147105       s: initialize name hash
    0.0184082       0.023558433     s: preload index
    0.086910967     0.089085967     s: refresh index
    7.889590767     2.191554433     s: unpack trees
    0.120760833     0.131941267     s: update worktree after a merge
    2.2583504       2.572663167     s: repair cache-tree
    0.8916137       0.959495233     s: write index, changed mask = 28
    3.405199233     0.2710663       s: unpack trees
    0.000999667     0.0021554       s: update worktree after a merge
    3.4063306       0.273318333     s: diff-index
    16.9524923      9.462943133     s: git command: git.exe checkout

This command calls unpack_trees() twice, the first time on 2way merge
and the second 1way merge. In both times, "unpack trees" time is
reduced to one third. Overall time reduction is not that impressive of
course because index operations take a big chunk. And there's that
repair cache-tree line.

Signed-off-by: Nguyễn Thái Ngọc Duy <[email protected]>
This is a micro optimization that probably only shines on repos with
deep directory structure. Instead of allocating and freeing a new
cache_entry in every iteration, we reuse the last one and only update
the parts that are new each iteration.

Signed-off-by: Nguyễn Thái Ngọc Duy <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
With the new cache-tree, we could mostly avoid I/O (due to odb access)
the code mostly becomes a loop of "check this, check that, add the
entry to the index". We could skip a couple checks in this giant loop
to go faster:

- We know here that we're copying entries from the source index to the
  result one. All paths in the source index must have been validated
  at load time already (and we're not taking strange paths from tree
  objects) which means we can skip verify_path() without compromise.

- We also know that D/F conflicts can't happen for all these entries
  (since cache-tree and all the trees are the same) so we can skip
  that as well.

This gives rather nice speedups for "unpack trees" rows where "unpack
trees" time is now cut in half compared to when
traverse_by_cache_tree() is added, or 1/7 of the original "unpack
trees" time.

   baseline      cache-tree    this patch
 --------------------------------------------------------------------
   0.018239226   0.019365414   0.020519621 s: read cache .git/index
   0.052541655   0.049605548   0.048814384 s: preload index
   0.001537598   0.001571695   0.001575382 s: refresh index
   0.168167768   0.049677212   0.024719308 s: unpack trees
   0.002897186   0.002845256   0.002805555 s: update worktree after a merge
   0.131661745   0.136597522   0.134891617 s: repair cache-tree
   0.075389117   0.075422517   0.074832291 s: write index, changed mask = 2a
   0.111702023   0.032813253   0.008616479 s: unpack trees
   0.000023245   0.000022002   0.000026630 s: update worktree after a merge
   0.111793866   0.032933140   0.008714071 s: diff-index
   0.587933288   0.398924370   0.380452871 s: git command: /home/pclouds/w/git/git

Total saving of this new patch looks even less impressive, now that
time spent in unpacking trees is so small. Which is why the next
attempt should be on that "repair cache-tree" line.

Signed-off-by: Nguyễn Thái Ngọc Duy <[email protected]>
@kewillford kewillford closed this Aug 6, 2018
@kewillford kewillford reopened this Aug 6, 2018
@benpeart benpeart closed this Aug 7, 2018
derrickstolee pushed a commit that referenced this pull request Mar 29, 2021
…sponse

query_result can be be an empty strbuf (STRBUF_INIT) - in that case
trying to read 3 bytes triggers a buffer overflow read (as
query_result.buf = '\0').

Therefore we need to check query_result's length before trying to read 3
bytes.

This overflow was introduced in:
  940b94f (fsmonitor: log invocation of FSMonitor hook to trace2, 2021-02-03)
It was found when running the test-suite against ASAN, and can be most
easily reproduced with the following command:

make GIT_TEST_OPTS="-v" DEFAULT_TEST_TARGET="t7519-status-fsmonitor.sh" \
SANITIZE=address DEVELOPER=1 test

==2235==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0000019e6e5e at pc 0x00000043745c bp 0x7fffd382c520 sp 0x7fffd382bcc8
READ of size 3 at 0x0000019e6e5e thread T0
    #0 0x43745b in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long) /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:842:7
    #1 0x43786d in bcmp /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:887:10
    #2 0x80b146 in fsmonitor_is_trivial_response /home/ahunt/oss-fuzz/git/fsmonitor.c:192:10
    #3 0x80b146 in query_fsmonitor /home/ahunt/oss-fuzz/git/fsmonitor.c:175:7
    #4 0x80a749 in refresh_fsmonitor /home/ahunt/oss-fuzz/git/fsmonitor.c:267:21
    #5 0x80bad1 in tweak_fsmonitor /home/ahunt/oss-fuzz/git/fsmonitor.c:429:4
    #6 0x90f040 in read_index_from /home/ahunt/oss-fuzz/git/read-cache.c:2321:3
    #7 0x8e5d08 in repo_read_index_preload /home/ahunt/oss-fuzz/git/preload-index.c:164:15
    #8 0x52dd45 in prepare_index /home/ahunt/oss-fuzz/git/builtin/commit.c:363:6
    #9 0x52a188 in cmd_commit /home/ahunt/oss-fuzz/git/builtin/commit.c:1588:15
    #10 0x4ce77e in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
    #11 0x4ccb18 in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
    #12 0x4cb01c in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
    #13 0x4cb01c in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
    #14 0x6aca8d in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
    #15 0x7fb027bf5349 in __libc_start_main (/lib64/libc.so.6+0x24349)
    #16 0x4206b9 in _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120

0x0000019e6e5e is located 2 bytes to the left of global variable 'strbuf_slopbuf' defined in 'strbuf.c:51:6' (0x19e6e60) of size 1
  'strbuf_slopbuf' is ascii string ''
0x0000019e6e5e is located 126 bytes to the right of global variable 'signals' defined in 'sigchain.c:11:31' (0x19e6be0) of size 512
SUMMARY: AddressSanitizer: global-buffer-overflow /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:842:7 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long)
Shadow bytes around the buggy address:
  0x000080334d70: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
  0x000080334d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080334d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080334da0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080334db0: 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9
=>0x000080334dc0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9[f9]01 f9 f9 f9
  0x000080334dd0: f9 f9 f9 f9 03 f9 f9 f9 f9 f9 f9 f9 02 f9 f9 f9
  0x000080334de0: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9
  0x000080334df0: f9 f9 f9 f9 01 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
  0x000080334e00: f9 f9 f9 f9 00 00 00 00 f9 f9 f9 f9 01 f9 f9 f9
  0x000080334e10: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc

Signed-off-by: Andrzej Hunt <[email protected]>
Acked-by: Jeff Hostetler <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
ldennington pushed a commit to ldennington/git that referenced this pull request Jun 2, 2021
The primary goal of this change is to stop leaking init_db_template_dir.
This leak can happen because:
 1. git_init_db_config() allocates new memory into init_db_template_dir
    without first freeing the existing value.
 2. init_db_template_dir might already contain data, either because:
  2.1 git_config() can be invoked twice with this callback in a single
      process - at least 2 allocations are likely.
  2.2 A single git_config() allocation can invoke the callback multiple
      times for a given key (see further explanation in the function
      docs) - each of those calls will trigger another leak.

The simplest fix for the leak would be to free(init_db_template_dir)
before overwriting it. Instead we choose to convert to fetching
init.templatedir via git_config_get_value() as that is more explicit,
more efficient, and avoids allocations (the returned result is owned by
the config cache, so we aren't responsible for freeing it).

If we remove init_db_template_dir, git_init_db_config() ends up being
responsible only for forwarding core.* config values to
platform_core_config(). However platform_core_config() already ignores
non-core.* config values, so we can safely remove git_init_db_config()
and invoke git_config() directly with platform_core_config() as the
callback.

The platform_core_config forwarding was originally added in:
  2878533 (mingw: respect core.hidedotfiles = false in git-init again, 2019-03-11
And I suspect the potential for a leak existed since the original
implementation of git_init_db_config in:
  90b4518 (Add `init.templatedir` configuration variable., 2010-02-17)

LSAN output from t0001:

Direct leak of 73 byte(s) in 1 object(s) allocated from:
    #0 0x49a859 in realloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    microsoft#1 0x9a7276 in xrealloc /home/ahunt/oss-fuzz/git/wrapper.c:126:8
    microsoft#2 0x9362ad in strbuf_grow /home/ahunt/oss-fuzz/git/strbuf.c:98:2
    microsoft#3 0x936eaa in strbuf_add /home/ahunt/oss-fuzz/git/strbuf.c:295:2
    microsoft#4 0x868112 in strbuf_addstr /home/ahunt/oss-fuzz/git/./strbuf.h:304:2
    microsoft#5 0x86a8ad in expand_user_path /home/ahunt/oss-fuzz/git/path.c:758:2
    microsoft#6 0x720bb1 in git_config_pathname /home/ahunt/oss-fuzz/git/config.c:1287:10
    microsoft#7 0x5960e2 in git_init_db_config /home/ahunt/oss-fuzz/git/builtin/init-db.c:161:11
    microsoft#8 0x7255b8 in configset_iter /home/ahunt/oss-fuzz/git/config.c:1982:7
    microsoft#9 0x7253fc in repo_config /home/ahunt/oss-fuzz/git/config.c:2311:2
    microsoft#10 0x725ca7 in git_config /home/ahunt/oss-fuzz/git/config.c:2399:2
    microsoft#11 0x593e8d in create_default_files /home/ahunt/oss-fuzz/git/builtin/init-db.c:225:2
    microsoft#12 0x5935c6 in init_db /home/ahunt/oss-fuzz/git/builtin/init-db.c:449:11
    microsoft#13 0x59588e in cmd_init_db /home/ahunt/oss-fuzz/git/builtin/init-db.c:714:9
    microsoft#14 0x4cd60d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
    microsoft#15 0x4cb2da in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
    microsoft#16 0x4ccc37 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
    microsoft#17 0x4cac29 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
    microsoft#18 0x69c4de in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
    microsoft#19 0x7f23552d6349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Signed-off-by: Andrzej Hunt <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
ldennington pushed a commit to ldennington/git that referenced this pull request Jun 2, 2021
versions could be an empty string_list. In that case, versions->items is
NULL, and we shouldn't be trying to perform pointer arithmetic with it (as
that results in undefined behaviour).

Moreover we only use the results of this calculation once when calling
QSORT. Therefore we choose to skip creating relevant_entries and call
QSORT directly with our manipulated pointers (but only if there's data
requiring sorting). This lets us avoid abusing the string_list API,
and saves us from having to explain why this abuse is OK.

Finally, an assertion is added to make sure that write_tree() is called
with a valid offset.

This issue has probably existed since:
  ee4012d (merge-ort: step 2 of tree writing -- function to create tree object, 2020-12-13)
But it only started occurring during tests since tests started using
merge-ort:
  f3b964a (Add testing with merge-ort merge strategy, 2021-03-20)

For reference - here's the original UBSAN commit that implemented this
check, it sounds like this behaviour isn't actually likely to cause any
issues (but we might as well fix it regardless):
https://reviews.llvm.org/D67122

UBSAN output from t3404 or t5601:

merge-ort.c:2669:43: runtime error: applying zero offset to null pointer
    #0 0x78bb53 in write_tree merge-ort.c:2669:43
    microsoft#1 0x7856c9 in process_entries merge-ort.c:3303:2
    microsoft#2 0x782317 in merge_ort_nonrecursive_internal merge-ort.c:3744:2
    microsoft#3 0x77feef in merge_incore_nonrecursive merge-ort.c:3853:2
    microsoft#4 0x6f6a5c in do_recursive_merge sequencer.c:640:3
    microsoft#5 0x6f6a5c in do_pick_commit sequencer.c:2221:9
    microsoft#6 0x6ef055 in single_pick sequencer.c:4814:9
    microsoft#7 0x6ef055 in sequencer_pick_revisions sequencer.c:4867:10
    microsoft#8 0x4fb392 in run_sequencer revert.c:225:9
    microsoft#9 0x4fa5b0 in cmd_revert revert.c:235:8
    microsoft#10 0x42abd7 in run_builtin git.c:453:11
    microsoft#11 0x429531 in handle_builtin git.c:704:3
    microsoft#12 0x4282fb in run_argv git.c:771:4
    microsoft#13 0x4282fb in cmd_main git.c:902:19
    microsoft#14 0x524b63 in main common-main.c:52:11
    microsoft#15 0x7fc2ca340349 in __libc_start_main (/lib64/libc.so.6+0x24349)
    microsoft#16 0x4072b9 in _start start.S:120

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior merge-ort.c:2669:43 in

Signed-off-by: Andrzej Hunt <[email protected]>
Reviewed-by: Elijah Newren <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
ldennington pushed a commit to ldennington/git that referenced this pull request Jun 2, 2021
limit_list() iterates over the original revs->commits list, and consumes
many of its entries via pop_commit. However we might stop iterating over
the list early (e.g. if we realise that the rest of the list is
uninteresting). If we do stop iterating early, list will be pointing to
the unconsumed portion of revs->commits - and we need to free this list
to avoid a leak. (revs->commits itself will be an invalid pointer: it
will have been free'd during the first pop_commit.)

However the list pointer is later reused to iterate over our new list,
but only for the limiting_can_increase_treesame() branch. We therefore
need to introduce a new variable for that branch - and while we're here
we can rename the original list to original_list as that makes its
purpose more obvious.

This leak was found while running t0090. It's not likely to be very
impactful, but it can happen quite early during some checkout
invocations, and hence seems to be worth fixing:

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x49a85d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
    microsoft#1 0x9ac084 in do_xmalloc wrapper.c:41:8
    microsoft#2 0x9ac05a in xmalloc wrapper.c:62:9
    microsoft#3 0x7175d6 in commit_list_insert commit.c:540:33
    microsoft#4 0x71800f in commit_list_insert_by_date commit.c:604:9
    microsoft#5 0x8f8d2e in process_parents revision.c:1128:5
    microsoft#6 0x8f2f2c in limit_list revision.c:1418:7
    microsoft#7 0x8f210e in prepare_revision_walk revision.c:3577:7
    microsoft#8 0x514170 in orphaned_commit_warning builtin/checkout.c:1185:6
    microsoft#9 0x512f05 in switch_branches builtin/checkout.c:1250:3
    microsoft#10 0x50f8de in checkout_branch builtin/checkout.c:1646:9
    microsoft#11 0x50ba12 in checkout_main builtin/checkout.c:2003:9
    microsoft#12 0x5086c0 in cmd_checkout builtin/checkout.c:2055:8
    microsoft#13 0x4cd91d in run_builtin git.c:467:11
    microsoft#14 0x4cb5f3 in handle_builtin git.c:719:3
    microsoft#15 0x4ccf47 in run_argv git.c:808:4
    microsoft#16 0x4caf49 in cmd_main git.c:939:19
    microsoft#17 0x69dc0e in main common-main.c:52:11
    microsoft#18 0x7faaabd0e349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 48 byte(s) in 3 object(s) allocated from:
    #0 0x49a85d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
    microsoft#1 0x9ac084 in do_xmalloc wrapper.c:41:8
    microsoft#2 0x9ac05a in xmalloc wrapper.c:62:9
    microsoft#3 0x717de6 in commit_list_append commit.c:1609:35
    microsoft#4 0x8f1f9b in prepare_revision_walk revision.c:3554:12
    microsoft#5 0x514170 in orphaned_commit_warning builtin/checkout.c:1185:6
    microsoft#6 0x512f05 in switch_branches builtin/checkout.c:1250:3
    microsoft#7 0x50f8de in checkout_branch builtin/checkout.c:1646:9
    microsoft#8 0x50ba12 in checkout_main builtin/checkout.c:2003:9
    microsoft#9 0x5086c0 in cmd_checkout builtin/checkout.c:2055:8
    microsoft#10 0x4cd91d in run_builtin git.c:467:11
    microsoft#11 0x4cb5f3 in handle_builtin git.c:719:3
    microsoft#12 0x4ccf47 in run_argv git.c:808:4
    microsoft#13 0x4caf49 in cmd_main git.c:939:19
    microsoft#14 0x69dc0e in main common-main.c:52:11
    microsoft#15 0x7faaabd0e349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Signed-off-by: Andrzej Hunt <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
ldennington pushed a commit to ldennington/git that referenced this pull request Jun 2, 2021
real_ref was previously populated by dwim_ref(), which allocates new
memory. We need to make sure to free real_ref when discarding it.
(real_ref is already being freed at the end of create_branch() - but
if we discard it early then it will leak.)

This fixes the following leak found while running t0002-t0099:

Direct leak of 5 byte(s) in 1 object(s) allocated from:
    #0 0x486954 in strdup /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    microsoft#1 0xdd6484 in xstrdup wrapper.c:29:14
    microsoft#2 0xc0f658 in expand_ref refs.c:671:12
    microsoft#3 0xc0ecf1 in repo_dwim_ref refs.c:644:22
    microsoft#4 0x8b1184 in dwim_ref ./refs.h:162:9
    microsoft#5 0x8b0b02 in create_branch branch.c:284:10
    microsoft#6 0x550cbb in update_refs_for_switch builtin/checkout.c:1046:4
    microsoft#7 0x54e275 in switch_branches builtin/checkout.c:1274:2
    microsoft#8 0x548828 in checkout_branch builtin/checkout.c:1668:9
    microsoft#9 0x541306 in checkout_main builtin/checkout.c:2025:9
    microsoft#10 0x5395fa in cmd_checkout builtin/checkout.c:2077:8
    microsoft#11 0x4d02a8 in run_builtin git.c:467:11
    microsoft#12 0x4cbfe9 in handle_builtin git.c:719:3
    microsoft#13 0x4cf04f in run_argv git.c:808:4
    microsoft#14 0x4cb85a in cmd_main git.c:939:19
    microsoft#15 0x820cf6 in main common-main.c:52:11
    microsoft#16 0x7f30bd9dd349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Signed-off-by: Andrzej Hunt <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
ldennington pushed a commit to ldennington/git that referenced this pull request Jun 2, 2021
parse_pathspec() allocates new memory into pathspec, therefore we need
to free it when we're done.

An UNLEAK would probably be just as good here - but clear_pathspec() is
not much more work so we might as well use it. check_ignore() is either
called once directly from cmd_check_ignore() (in which case the leak
really doesnt matter), or it can be called multiple times in a loop from
check_ignore_stdin_paths(), in which case we're potentially leaking
multiple times - but even in this scenario the leak is so small as to
have no real consequence.

Found while running t0008:

Direct leak of 112 byte(s) in 1 object(s) allocated from:
    #0 0x49a85d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
    microsoft#1 0x9aca44 in do_xmalloc wrapper.c:41:8
    microsoft#2 0x9aca1a in xmalloc wrapper.c:62:9
    microsoft#3 0x873c17 in parse_pathspec pathspec.c:582:2
    microsoft#4 0x503eb8 in check_ignore builtin/check-ignore.c:90:2
    microsoft#5 0x5038af in cmd_check_ignore builtin/check-ignore.c:190:17
    microsoft#6 0x4cd91d in run_builtin git.c:467:11
    microsoft#7 0x4cb5f3 in handle_builtin git.c:719:3
    microsoft#8 0x4ccf47 in run_argv git.c:808:4
    microsoft#9 0x4caf49 in cmd_main git.c:939:19
    microsoft#10 0x69e43e in main common-main.c:52:11
    microsoft#11 0x7f18bb0dd349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 65 byte(s) in 1 object(s) allocated from:
    #0 0x49ab79 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    microsoft#1 0x9acc46 in xrealloc wrapper.c:126:8
    microsoft#2 0x93baed in strbuf_grow strbuf.c:98:2
    microsoft#3 0x93d696 in strbuf_vaddf strbuf.c:392:3
    microsoft#4 0x9400c6 in xstrvfmt strbuf.c:979:2
    microsoft#5 0x940253 in xstrfmt strbuf.c:989:8
    microsoft#6 0x92b72a in prefix_path_gently setup.c:115:15
    microsoft#7 0x87442d in init_pathspec_item pathspec.c:439:11
    microsoft#8 0x873cef in parse_pathspec pathspec.c:589:3
    microsoft#9 0x503eb8 in check_ignore builtin/check-ignore.c:90:2
    microsoft#10 0x5038af in cmd_check_ignore builtin/check-ignore.c:190:17
    microsoft#11 0x4cd91d in run_builtin git.c:467:11
    microsoft#12 0x4cb5f3 in handle_builtin git.c:719:3
    microsoft#13 0x4ccf47 in run_argv git.c:808:4
    microsoft#14 0x4caf49 in cmd_main git.c:939:19
    microsoft#15 0x69e43e in main common-main.c:52:11
    microsoft#16 0x7f18bb0dd349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 2 byte(s) in 1 object(s) allocated from:
    #0 0x486834 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    microsoft#1 0x9ac9e8 in xstrdup wrapper.c:29:14
    microsoft#2 0x874542 in init_pathspec_item pathspec.c:468:20
    microsoft#3 0x873cef in parse_pathspec pathspec.c:589:3
    microsoft#4 0x503eb8 in check_ignore builtin/check-ignore.c:90:2
    microsoft#5 0x5038af in cmd_check_ignore builtin/check-ignore.c:190:17
    microsoft#6 0x4cd91d in run_builtin git.c:467:11
    microsoft#7 0x4cb5f3 in handle_builtin git.c:719:3
    microsoft#8 0x4ccf47 in run_argv git.c:808:4
    microsoft#9 0x4caf49 in cmd_main git.c:939:19
    microsoft#10 0x69e43e in main common-main.c:52:11
    microsoft#11 0x7f18bb0dd349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 179 byte(s) leaked in 3 allocation(s).

Signed-off-by: Andrzej Hunt <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
ldennington pushed a commit to ldennington/git that referenced this pull request Jun 2, 2021
add_pending_object() populates rev.pending, we need to take care of
clearing it once we're done.

This code is run close to the end of a checkout, therefore this leak
seems like it would have very little impact. See also LSAN output
from t0020 below:

Direct leak of 2048 byte(s) in 1 object(s) allocated from:
    #0 0x49ab79 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    microsoft#1 0x9acc46 in xrealloc wrapper.c:126:8
    microsoft#2 0x83e3a3 in add_object_array_with_path object.c:337:3
    microsoft#3 0x8f672a in add_pending_object_with_path revision.c:329:2
    microsoft#4 0x8eaeab in add_pending_object_with_mode revision.c:336:2
    microsoft#5 0x8eae9d in add_pending_object revision.c:342:2
    microsoft#6 0x5154a0 in show_local_changes builtin/checkout.c:602:2
    microsoft#7 0x513b00 in merge_working_tree builtin/checkout.c:979:3
    microsoft#8 0x512cb3 in switch_branches builtin/checkout.c:1242:9
    microsoft#9 0x50f8de in checkout_branch builtin/checkout.c:1646:9
    microsoft#10 0x50ba12 in checkout_main builtin/checkout.c:2003:9
    microsoft#11 0x5086c0 in cmd_checkout builtin/checkout.c:2055:8
    microsoft#12 0x4cd91d in run_builtin git.c:467:11
    microsoft#13 0x4cb5f3 in handle_builtin git.c:719:3
    microsoft#14 0x4ccf47 in run_argv git.c:808:4
    microsoft#15 0x4caf49 in cmd_main git.c:939:19
    microsoft#16 0x69e43e in main common-main.c:52:11
    microsoft#17 0x7f5dd1d50349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 2048 byte(s) leaked in 1 allocation(s).
Signed-off-by: Andrzej Hunt <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
derrickstolee pushed a commit that referenced this pull request Oct 7, 2021
In a sparse index it is possible for the tree that is being verified
to be freed while it is being verified. This happens when the index is
sparse but the cache tree is not and index_name_pos() looks up a path
from the cache tree that is a descendant of a sparse index entry. That
triggers a call to ensure_full_index() which frees the cache tree that
is being verified.  Carrying on trying to verify the tree after this
results in a use-after-free bug. Instead restart the verification if a
sparse index is converted to a full index. This bug is triggered by a
call to reset_head() in "git rebase --apply". Thanks to René Scharfe
and Derrick Stolee for their help analyzing the problem.

==74345==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000001b20 at pc 0x557cbe82d3a2 bp 0x7ffdfee08090 sp 0x7ffdfee08080
READ of size 4 at 0x606000001b20 thread T0
    #0 0x557cbe82d3a1 in verify_one /home/phil/src/git/cache-tree.c:863
    #1 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #2 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #3 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #4 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
    #5 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
    #6 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
    #7 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #8 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #9 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #10 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #11 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #12 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #13 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
    #14 0x557cbe5bcb8d in _start (/home/phil/src/git/git+0x1b9b8d)

0x606000001b20 is located 0 bytes inside of 56-byte region [0x606000001b20,0x606000001b58)
freed by thread T0 here:
    #0 0x7fdd4bacff19 in __interceptor_free /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:127
    #1 0x557cbe82af60 in cache_tree_free /home/phil/src/git/cache-tree.c:35
    #2 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #3 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #4 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #5 0x557cbeb2557a in ensure_full_index /home/phil/src/git/sparse-index.c:310
    #6 0x557cbea45c4a in index_name_stage_pos /home/phil/src/git/read-cache.c:588
    #7 0x557cbe82ce37 in verify_one /home/phil/src/git/cache-tree.c:850
    #8 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #9 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #10 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #11 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
    #12 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
    #13 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
    #14 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #15 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #16 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #17 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #18 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #19 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #20 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

previously allocated by thread T0 here:
    #0 0x7fdd4bad0459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x557cbebc1807 in xcalloc /home/phil/src/git/wrapper.c:140
    #2 0x557cbe82b7d8 in cache_tree /home/phil/src/git/cache-tree.c:17
    #3 0x557cbe82b7d8 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:763
    #4 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
    #5 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
    #6 0x557cbe8304e1 in prime_cache_tree /home/phil/src/git/cache-tree.c:779
    #7 0x557cbeab7fa7 in reset_head /home/phil/src/git/reset.c:85
    #8 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #9 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #10 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #11 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #12 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #13 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #14 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

Signed-off-by: Phillip Wood <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
derrickstolee added a commit that referenced this pull request Oct 26, 2021
When I was playing around with trace2 data and creating flamegraphs, I tried a `git fetch` call to see how the `git-remote-https` command would show up. What I didn't expect was an `ensure_full_index()` region!

It turns out that `git fetch` and `git pull` need to check the index for a `.gitmodules` file to see if it should recurse into any submodules. Here is the stack trace from a debugger:

```
#0  ensure_full_index (istate=0x555555ac1c80 <the_index>) at sparse-index.c:404
#1  0x000055555571a979 in do_read_index (istate=istate@entry=0x555555ac1c80 <the_index>, path=path@entry=0x555555ad7b90 ".git/index", must_exist=must_exist@entry=0) at read-cache.c:2386
#2  0x000055555571eb7d in do_read_index (must_exist=0, path=0x555555ad7b90 ".git/index", istate=0x555555ac1c80 <the_index>) at hash.h:244
#3  read_index_from (istate=0x555555ac1c80 <the_index>, path=0x555555ad7b90 ".git/index", gitdir=0x555555ad7b30 ".git") at read-cache.c:2426
#4  0x000055555573f4c2 in repo_read_index (repo=repo@entry=0x555555ac1da0 <the_repo>) at repository.c:286
#5  0x00005555556f14d0 in get_oid_with_context_1 (repo=repo@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", flags=flags@entry=0, prefix=prefix@entry=0x0, 
    oid=oid@entry=0x7fffffffdb00, oc=oc@entry=0x7fffffffda70) at object-name.c:1850
#6  0x00005555556f1f53 in get_oid_with_context (oc=0x7fffffffda70, oid=0x7fffffffdb00, flags=0, str=0x55555582c022 ":.gitmodules", repo=0x555555ac1da0 <the_repo>) at object-name.c:1947
#7  repo_get_oid (r=r@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", oid=oid@entry=0x7fffffffdb00) at object-name.c:1603
#8  0x000055555577330f in config_from_gitmodules (fn=fn@entry=0x555555773460 <gitmodules_fetch_config>, repo=0x555555ac1da0 <the_repo>, data=data@entry=0x7fffffffdb60) at submodule-config.c:650
#9  0x000055555577462d in config_from_gitmodules (data=0x7fffffffdb60, repo=<optimized out>, fn=0x555555773460 <gitmodules_fetch_config>) at submodule-config.c:638
#10 fetch_config_from_gitmodules (max_children=<optimized out>, recurse_submodules=<optimized out>) at submodule-config.c:800
#11 0x00005555555b9e41 in cmd_fetch (argc=1, argv=0x7fffffffe090, prefix=0x0) at builtin/fetch.c:1999
#12 0x0000555555573ff6 in run_builtin (argv=<optimized out>, argc=<optimized out>, p=<optimized out>) at git.c:528
#13 handle_builtin (argc=<optimized out>, argv=<optimized out>) at git.c:785
#14 0x000055555557528c in run_argv (argv=0x7fffffffddf0, argcp=0x7fffffffddfc) at git.c:857
#15 cmd_main (argc=<optimized out>, argv=<optimized out>) at git.c:993
#16 0x0000555555573ac8 in main (argc=3, argv=0x7fffffffe088) at common-main.c:52
```

The operations these commands use are guarded by items such as `index_name_pos()` and others. Since the `.gitmodules` file is always at root, we would not need to expand, anyway.
neerajsi-msft pushed a commit to neerajsi-msft/git that referenced this pull request Oct 26, 2021
In a sparse index it is possible for the tree that is being verified
to be freed while it is being verified. This happens when the index is
sparse but the cache tree is not and index_name_pos() looks up a path
from the cache tree that is a descendant of a sparse index entry. That
triggers a call to ensure_full_index() which frees the cache tree that
is being verified.  Carrying on trying to verify the tree after this
results in a use-after-free bug. Instead restart the verification if a
sparse index is converted to a full index. This bug is triggered by a
call to reset_head() in "git rebase --apply". Thanks to René Scharfe
and Derrick Stolee for their help analyzing the problem.

==74345==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000001b20 at pc 0x557cbe82d3a2 bp 0x7ffdfee08090 sp 0x7ffdfee08080
READ of size 4 at 0x606000001b20 thread T0
    #0 0x557cbe82d3a1 in verify_one /home/phil/src/git/cache-tree.c:863
    #1 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    microsoft#2 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    microsoft#3 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    microsoft#4 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
    microsoft#5 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
    microsoft#6 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
    microsoft#7 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    microsoft#8 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    microsoft#9 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    microsoft#10 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    microsoft#11 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    microsoft#12 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    microsoft#13 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
    microsoft#14 0x557cbe5bcb8d in _start (/home/phil/src/git/git+0x1b9b8d)

0x606000001b20 is located 0 bytes inside of 56-byte region [0x606000001b20,0x606000001b58)
freed by thread T0 here:
    #0 0x7fdd4bacff19 in __interceptor_free /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:127
    #1 0x557cbe82af60 in cache_tree_free /home/phil/src/git/cache-tree.c:35
    microsoft#2 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    microsoft#3 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    microsoft#4 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    microsoft#5 0x557cbeb2557a in ensure_full_index /home/phil/src/git/sparse-index.c:310
    microsoft#6 0x557cbea45c4a in index_name_stage_pos /home/phil/src/git/read-cache.c:588
    microsoft#7 0x557cbe82ce37 in verify_one /home/phil/src/git/cache-tree.c:850
    microsoft#8 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    microsoft#9 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    microsoft#10 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    microsoft#11 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
    microsoft#12 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
    microsoft#13 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
    microsoft#14 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    microsoft#15 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    microsoft#16 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    microsoft#17 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    microsoft#18 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    microsoft#19 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    microsoft#20 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

previously allocated by thread T0 here:
    #0 0x7fdd4bad0459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x557cbebc1807 in xcalloc /home/phil/src/git/wrapper.c:140
    microsoft#2 0x557cbe82b7d8 in cache_tree /home/phil/src/git/cache-tree.c:17
    microsoft#3 0x557cbe82b7d8 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:763
    microsoft#4 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
    microsoft#5 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
    microsoft#6 0x557cbe8304e1 in prime_cache_tree /home/phil/src/git/cache-tree.c:779
    microsoft#7 0x557cbeab7fa7 in reset_head /home/phil/src/git/reset.c:85
    microsoft#8 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    microsoft#9 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    microsoft#10 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    microsoft#11 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    microsoft#12 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    microsoft#13 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    microsoft#14 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

Signed-off-by: Phillip Wood <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
dscho pushed a commit that referenced this pull request Oct 30, 2021
In a sparse index it is possible for the tree that is being verified
to be freed while it is being verified. This happens when the index is
sparse but the cache tree is not and index_name_pos() looks up a path
from the cache tree that is a descendant of a sparse index entry. That
triggers a call to ensure_full_index() which frees the cache tree that
is being verified.  Carrying on trying to verify the tree after this
results in a use-after-free bug. Instead restart the verification if a
sparse index is converted to a full index. This bug is triggered by a
call to reset_head() in "git rebase --apply". Thanks to René Scharfe
and Derrick Stolee for their help analyzing the problem.

==74345==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000001b20 at pc 0x557cbe82d3a2 bp 0x7ffdfee08090 sp 0x7ffdfee08080
READ of size 4 at 0x606000001b20 thread T0
    #0 0x557cbe82d3a1 in verify_one /home/phil/src/git/cache-tree.c:863
    #1 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #2 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #3 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #4 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
    #5 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
    #6 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
    #7 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #8 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #9 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #10 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #11 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #12 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #13 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
    #14 0x557cbe5bcb8d in _start (/home/phil/src/git/git+0x1b9b8d)

0x606000001b20 is located 0 bytes inside of 56-byte region [0x606000001b20,0x606000001b58)
freed by thread T0 here:
    #0 0x7fdd4bacff19 in __interceptor_free /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:127
    #1 0x557cbe82af60 in cache_tree_free /home/phil/src/git/cache-tree.c:35
    #2 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #3 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #4 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #5 0x557cbeb2557a in ensure_full_index /home/phil/src/git/sparse-index.c:310
    #6 0x557cbea45c4a in index_name_stage_pos /home/phil/src/git/read-cache.c:588
    #7 0x557cbe82ce37 in verify_one /home/phil/src/git/cache-tree.c:850
    #8 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #9 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #10 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #11 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
    #12 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
    #13 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
    #14 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #15 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #16 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #17 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #18 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #19 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #20 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

previously allocated by thread T0 here:
    #0 0x7fdd4bad0459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x557cbebc1807 in xcalloc /home/phil/src/git/wrapper.c:140
    #2 0x557cbe82b7d8 in cache_tree /home/phil/src/git/cache-tree.c:17
    #3 0x557cbe82b7d8 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:763
    #4 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
    #5 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
    #6 0x557cbe8304e1 in prime_cache_tree /home/phil/src/git/cache-tree.c:779
    #7 0x557cbeab7fa7 in reset_head /home/phil/src/git/reset.c:85
    #8 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #9 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #10 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #11 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #12 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #13 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #14 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

Signed-off-by: Phillip Wood <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
dscho pushed a commit that referenced this pull request Oct 30, 2021
When I was playing around with trace2 data and creating flamegraphs, I tried a `git fetch` call to see how the `git-remote-https` command would show up. What I didn't expect was an `ensure_full_index()` region!

It turns out that `git fetch` and `git pull` need to check the index for a `.gitmodules` file to see if it should recurse into any submodules. Here is the stack trace from a debugger:

```
#0  ensure_full_index (istate=0x555555ac1c80 <the_index>) at sparse-index.c:404
#1  0x000055555571a979 in do_read_index (istate=istate@entry=0x555555ac1c80 <the_index>, path=path@entry=0x555555ad7b90 ".git/index", must_exist=must_exist@entry=0) at read-cache.c:2386
#2  0x000055555571eb7d in do_read_index (must_exist=0, path=0x555555ad7b90 ".git/index", istate=0x555555ac1c80 <the_index>) at hash.h:244
#3  read_index_from (istate=0x555555ac1c80 <the_index>, path=0x555555ad7b90 ".git/index", gitdir=0x555555ad7b30 ".git") at read-cache.c:2426
#4  0x000055555573f4c2 in repo_read_index (repo=repo@entry=0x555555ac1da0 <the_repo>) at repository.c:286
#5  0x00005555556f14d0 in get_oid_with_context_1 (repo=repo@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", flags=flags@entry=0, prefix=prefix@entry=0x0, 
    oid=oid@entry=0x7fffffffdb00, oc=oc@entry=0x7fffffffda70) at object-name.c:1850
#6  0x00005555556f1f53 in get_oid_with_context (oc=0x7fffffffda70, oid=0x7fffffffdb00, flags=0, str=0x55555582c022 ":.gitmodules", repo=0x555555ac1da0 <the_repo>) at object-name.c:1947
#7  repo_get_oid (r=r@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", oid=oid@entry=0x7fffffffdb00) at object-name.c:1603
#8  0x000055555577330f in config_from_gitmodules (fn=fn@entry=0x555555773460 <gitmodules_fetch_config>, repo=0x555555ac1da0 <the_repo>, data=data@entry=0x7fffffffdb60) at submodule-config.c:650
#9  0x000055555577462d in config_from_gitmodules (data=0x7fffffffdb60, repo=<optimized out>, fn=0x555555773460 <gitmodules_fetch_config>) at submodule-config.c:638
#10 fetch_config_from_gitmodules (max_children=<optimized out>, recurse_submodules=<optimized out>) at submodule-config.c:800
#11 0x00005555555b9e41 in cmd_fetch (argc=1, argv=0x7fffffffe090, prefix=0x0) at builtin/fetch.c:1999
#12 0x0000555555573ff6 in run_builtin (argv=<optimized out>, argc=<optimized out>, p=<optimized out>) at git.c:528
#13 handle_builtin (argc=<optimized out>, argv=<optimized out>) at git.c:785
#14 0x000055555557528c in run_argv (argv=0x7fffffffddf0, argcp=0x7fffffffddfc) at git.c:857
#15 cmd_main (argc=<optimized out>, argv=<optimized out>) at git.c:993
#16 0x0000555555573ac8 in main (argc=3, argv=0x7fffffffe088) at common-main.c:52
```

The operations these commands use are guarded by items such as `index_name_pos()` and others. Since the `.gitmodules` file is always at root, we would not need to expand, anyway.
derrickstolee added a commit that referenced this pull request Oct 30, 2021
When I was playing around with trace2 data and creating flamegraphs, I tried a `git fetch` call to see how the `git-remote-https` command would show up. What I didn't expect was an `ensure_full_index()` region!

It turns out that `git fetch` and `git pull` need to check the index for a `.gitmodules` file to see if it should recurse into any submodules. Here is the stack trace from a debugger:

```
#0  ensure_full_index (istate=0x555555ac1c80 <the_index>) at sparse-index.c:404
#1  0x000055555571a979 in do_read_index (istate=istate@entry=0x555555ac1c80 <the_index>, path=path@entry=0x555555ad7b90 ".git/index", must_exist=must_exist@entry=0) at read-cache.c:2386
#2  0x000055555571eb7d in do_read_index (must_exist=0, path=0x555555ad7b90 ".git/index", istate=0x555555ac1c80 <the_index>) at hash.h:244
#3  read_index_from (istate=0x555555ac1c80 <the_index>, path=0x555555ad7b90 ".git/index", gitdir=0x555555ad7b30 ".git") at read-cache.c:2426
#4  0x000055555573f4c2 in repo_read_index (repo=repo@entry=0x555555ac1da0 <the_repo>) at repository.c:286
#5  0x00005555556f14d0 in get_oid_with_context_1 (repo=repo@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", flags=flags@entry=0, prefix=prefix@entry=0x0, 
    oid=oid@entry=0x7fffffffdb00, oc=oc@entry=0x7fffffffda70) at object-name.c:1850
#6  0x00005555556f1f53 in get_oid_with_context (oc=0x7fffffffda70, oid=0x7fffffffdb00, flags=0, str=0x55555582c022 ":.gitmodules", repo=0x555555ac1da0 <the_repo>) at object-name.c:1947
#7  repo_get_oid (r=r@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", oid=oid@entry=0x7fffffffdb00) at object-name.c:1603
#8  0x000055555577330f in config_from_gitmodules (fn=fn@entry=0x555555773460 <gitmodules_fetch_config>, repo=0x555555ac1da0 <the_repo>, data=data@entry=0x7fffffffdb60) at submodule-config.c:650
#9  0x000055555577462d in config_from_gitmodules (data=0x7fffffffdb60, repo=<optimized out>, fn=0x555555773460 <gitmodules_fetch_config>) at submodule-config.c:638
#10 fetch_config_from_gitmodules (max_children=<optimized out>, recurse_submodules=<optimized out>) at submodule-config.c:800
#11 0x00005555555b9e41 in cmd_fetch (argc=1, argv=0x7fffffffe090, prefix=0x0) at builtin/fetch.c:1999
#12 0x0000555555573ff6 in run_builtin (argv=<optimized out>, argc=<optimized out>, p=<optimized out>) at git.c:528
#13 handle_builtin (argc=<optimized out>, argv=<optimized out>) at git.c:785
#14 0x000055555557528c in run_argv (argv=0x7fffffffddf0, argcp=0x7fffffffddfc) at git.c:857
#15 cmd_main (argc=<optimized out>, argv=<optimized out>) at git.c:993
#16 0x0000555555573ac8 in main (argc=3, argv=0x7fffffffe088) at common-main.c:52
```

The operations these commands use are guarded by items such as `index_name_pos()` and others. Since the `.gitmodules` file is always at root, we would not need to expand, anyway.
derrickstolee added a commit that referenced this pull request Oct 31, 2021
When I was playing around with trace2 data and creating flamegraphs, I tried a `git fetch` call to see how the `git-remote-https` command would show up. What I didn't expect was an `ensure_full_index()` region!

It turns out that `git fetch` and `git pull` need to check the index for a `.gitmodules` file to see if it should recurse into any submodules. Here is the stack trace from a debugger:

```
#0  ensure_full_index (istate=0x555555ac1c80 <the_index>) at sparse-index.c:404
#1  0x000055555571a979 in do_read_index (istate=istate@entry=0x555555ac1c80 <the_index>, path=path@entry=0x555555ad7b90 ".git/index", must_exist=must_exist@entry=0) at read-cache.c:2386
#2  0x000055555571eb7d in do_read_index (must_exist=0, path=0x555555ad7b90 ".git/index", istate=0x555555ac1c80 <the_index>) at hash.h:244
#3  read_index_from (istate=0x555555ac1c80 <the_index>, path=0x555555ad7b90 ".git/index", gitdir=0x555555ad7b30 ".git") at read-cache.c:2426
#4  0x000055555573f4c2 in repo_read_index (repo=repo@entry=0x555555ac1da0 <the_repo>) at repository.c:286
#5  0x00005555556f14d0 in get_oid_with_context_1 (repo=repo@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", flags=flags@entry=0, prefix=prefix@entry=0x0, 
    oid=oid@entry=0x7fffffffdb00, oc=oc@entry=0x7fffffffda70) at object-name.c:1850
#6  0x00005555556f1f53 in get_oid_with_context (oc=0x7fffffffda70, oid=0x7fffffffdb00, flags=0, str=0x55555582c022 ":.gitmodules", repo=0x555555ac1da0 <the_repo>) at object-name.c:1947
#7  repo_get_oid (r=r@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", oid=oid@entry=0x7fffffffdb00) at object-name.c:1603
#8  0x000055555577330f in config_from_gitmodules (fn=fn@entry=0x555555773460 <gitmodules_fetch_config>, repo=0x555555ac1da0 <the_repo>, data=data@entry=0x7fffffffdb60) at submodule-config.c:650
#9  0x000055555577462d in config_from_gitmodules (data=0x7fffffffdb60, repo=<optimized out>, fn=0x555555773460 <gitmodules_fetch_config>) at submodule-config.c:638
#10 fetch_config_from_gitmodules (max_children=<optimized out>, recurse_submodules=<optimized out>) at submodule-config.c:800
#11 0x00005555555b9e41 in cmd_fetch (argc=1, argv=0x7fffffffe090, prefix=0x0) at builtin/fetch.c:1999
#12 0x0000555555573ff6 in run_builtin (argv=<optimized out>, argc=<optimized out>, p=<optimized out>) at git.c:528
#13 handle_builtin (argc=<optimized out>, argv=<optimized out>) at git.c:785
#14 0x000055555557528c in run_argv (argv=0x7fffffffddf0, argcp=0x7fffffffddfc) at git.c:857
#15 cmd_main (argc=<optimized out>, argv=<optimized out>) at git.c:993
#16 0x0000555555573ac8 in main (argc=3, argv=0x7fffffffe088) at common-main.c:52
```

The operations these commands use are guarded by items such as `index_name_pos()` and others. Since the `.gitmodules` file is always at root, we would not need to expand, anyway.
derrickstolee added a commit that referenced this pull request Nov 4, 2021
When I was playing around with trace2 data and creating flamegraphs, I tried a `git fetch` call to see how the `git-remote-https` command would show up. What I didn't expect was an `ensure_full_index()` region!

It turns out that `git fetch` and `git pull` need to check the index for a `.gitmodules` file to see if it should recurse into any submodules. Here is the stack trace from a debugger:

```
#0  ensure_full_index (istate=0x555555ac1c80 <the_index>) at sparse-index.c:404
#1  0x000055555571a979 in do_read_index (istate=istate@entry=0x555555ac1c80 <the_index>, path=path@entry=0x555555ad7b90 ".git/index", must_exist=must_exist@entry=0) at read-cache.c:2386
#2  0x000055555571eb7d in do_read_index (must_exist=0, path=0x555555ad7b90 ".git/index", istate=0x555555ac1c80 <the_index>) at hash.h:244
#3  read_index_from (istate=0x555555ac1c80 <the_index>, path=0x555555ad7b90 ".git/index", gitdir=0x555555ad7b30 ".git") at read-cache.c:2426
#4  0x000055555573f4c2 in repo_read_index (repo=repo@entry=0x555555ac1da0 <the_repo>) at repository.c:286
#5  0x00005555556f14d0 in get_oid_with_context_1 (repo=repo@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", flags=flags@entry=0, prefix=prefix@entry=0x0, 
    oid=oid@entry=0x7fffffffdb00, oc=oc@entry=0x7fffffffda70) at object-name.c:1850
#6  0x00005555556f1f53 in get_oid_with_context (oc=0x7fffffffda70, oid=0x7fffffffdb00, flags=0, str=0x55555582c022 ":.gitmodules", repo=0x555555ac1da0 <the_repo>) at object-name.c:1947
#7  repo_get_oid (r=r@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", oid=oid@entry=0x7fffffffdb00) at object-name.c:1603
#8  0x000055555577330f in config_from_gitmodules (fn=fn@entry=0x555555773460 <gitmodules_fetch_config>, repo=0x555555ac1da0 <the_repo>, data=data@entry=0x7fffffffdb60) at submodule-config.c:650
#9  0x000055555577462d in config_from_gitmodules (data=0x7fffffffdb60, repo=<optimized out>, fn=0x555555773460 <gitmodules_fetch_config>) at submodule-config.c:638
#10 fetch_config_from_gitmodules (max_children=<optimized out>, recurse_submodules=<optimized out>) at submodule-config.c:800
#11 0x00005555555b9e41 in cmd_fetch (argc=1, argv=0x7fffffffe090, prefix=0x0) at builtin/fetch.c:1999
#12 0x0000555555573ff6 in run_builtin (argv=<optimized out>, argc=<optimized out>, p=<optimized out>) at git.c:528
#13 handle_builtin (argc=<optimized out>, argv=<optimized out>) at git.c:785
#14 0x000055555557528c in run_argv (argv=0x7fffffffddf0, argcp=0x7fffffffddfc) at git.c:857
#15 cmd_main (argc=<optimized out>, argv=<optimized out>) at git.c:993
#16 0x0000555555573ac8 in main (argc=3, argv=0x7fffffffe088) at common-main.c:52
```

The operations these commands use are guarded by items such as `index_name_pos()` and others. Since the `.gitmodules` file is always at root, we would not need to expand, anyway.
derrickstolee added a commit that referenced this pull request Nov 4, 2021
When I was playing around with trace2 data and creating flamegraphs, I tried a `git fetch` call to see how the `git-remote-https` command would show up. What I didn't expect was an `ensure_full_index()` region!

It turns out that `git fetch` and `git pull` need to check the index for a `.gitmodules` file to see if it should recurse into any submodules. Here is the stack trace from a debugger:

```
#0  ensure_full_index (istate=0x555555ac1c80 <the_index>) at sparse-index.c:404
#1  0x000055555571a979 in do_read_index (istate=istate@entry=0x555555ac1c80 <the_index>, path=path@entry=0x555555ad7b90 ".git/index", must_exist=must_exist@entry=0) at read-cache.c:2386
#2  0x000055555571eb7d in do_read_index (must_exist=0, path=0x555555ad7b90 ".git/index", istate=0x555555ac1c80 <the_index>) at hash.h:244
#3  read_index_from (istate=0x555555ac1c80 <the_index>, path=0x555555ad7b90 ".git/index", gitdir=0x555555ad7b30 ".git") at read-cache.c:2426
#4  0x000055555573f4c2 in repo_read_index (repo=repo@entry=0x555555ac1da0 <the_repo>) at repository.c:286
#5  0x00005555556f14d0 in get_oid_with_context_1 (repo=repo@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", flags=flags@entry=0, prefix=prefix@entry=0x0, 
    oid=oid@entry=0x7fffffffdb00, oc=oc@entry=0x7fffffffda70) at object-name.c:1850
#6  0x00005555556f1f53 in get_oid_with_context (oc=0x7fffffffda70, oid=0x7fffffffdb00, flags=0, str=0x55555582c022 ":.gitmodules", repo=0x555555ac1da0 <the_repo>) at object-name.c:1947
#7  repo_get_oid (r=r@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", oid=oid@entry=0x7fffffffdb00) at object-name.c:1603
#8  0x000055555577330f in config_from_gitmodules (fn=fn@entry=0x555555773460 <gitmodules_fetch_config>, repo=0x555555ac1da0 <the_repo>, data=data@entry=0x7fffffffdb60) at submodule-config.c:650
#9  0x000055555577462d in config_from_gitmodules (data=0x7fffffffdb60, repo=<optimized out>, fn=0x555555773460 <gitmodules_fetch_config>) at submodule-config.c:638
#10 fetch_config_from_gitmodules (max_children=<optimized out>, recurse_submodules=<optimized out>) at submodule-config.c:800
#11 0x00005555555b9e41 in cmd_fetch (argc=1, argv=0x7fffffffe090, prefix=0x0) at builtin/fetch.c:1999
#12 0x0000555555573ff6 in run_builtin (argv=<optimized out>, argc=<optimized out>, p=<optimized out>) at git.c:528
#13 handle_builtin (argc=<optimized out>, argv=<optimized out>) at git.c:785
#14 0x000055555557528c in run_argv (argv=0x7fffffffddf0, argcp=0x7fffffffddfc) at git.c:857
#15 cmd_main (argc=<optimized out>, argv=<optimized out>) at git.c:993
#16 0x0000555555573ac8 in main (argc=3, argv=0x7fffffffe088) at common-main.c:52
```

The operations these commands use are guarded by items such as `index_name_pos()` and others. Since the `.gitmodules` file is always at root, we would not need to expand, anyway.
derrickstolee added a commit that referenced this pull request Nov 10, 2021
When I was playing around with trace2 data and creating flamegraphs, I tried a `git fetch` call to see how the `git-remote-https` command would show up. What I didn't expect was an `ensure_full_index()` region!

It turns out that `git fetch` and `git pull` need to check the index for a `.gitmodules` file to see if it should recurse into any submodules. Here is the stack trace from a debugger:

```
#0  ensure_full_index (istate=0x555555ac1c80 <the_index>) at sparse-index.c:404
#1  0x000055555571a979 in do_read_index (istate=istate@entry=0x555555ac1c80 <the_index>, path=path@entry=0x555555ad7b90 ".git/index", must_exist=must_exist@entry=0) at read-cache.c:2386
#2  0x000055555571eb7d in do_read_index (must_exist=0, path=0x555555ad7b90 ".git/index", istate=0x555555ac1c80 <the_index>) at hash.h:244
#3  read_index_from (istate=0x555555ac1c80 <the_index>, path=0x555555ad7b90 ".git/index", gitdir=0x555555ad7b30 ".git") at read-cache.c:2426
#4  0x000055555573f4c2 in repo_read_index (repo=repo@entry=0x555555ac1da0 <the_repo>) at repository.c:286
#5  0x00005555556f14d0 in get_oid_with_context_1 (repo=repo@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", flags=flags@entry=0, prefix=prefix@entry=0x0, 
    oid=oid@entry=0x7fffffffdb00, oc=oc@entry=0x7fffffffda70) at object-name.c:1850
#6  0x00005555556f1f53 in get_oid_with_context (oc=0x7fffffffda70, oid=0x7fffffffdb00, flags=0, str=0x55555582c022 ":.gitmodules", repo=0x555555ac1da0 <the_repo>) at object-name.c:1947
#7  repo_get_oid (r=r@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", oid=oid@entry=0x7fffffffdb00) at object-name.c:1603
#8  0x000055555577330f in config_from_gitmodules (fn=fn@entry=0x555555773460 <gitmodules_fetch_config>, repo=0x555555ac1da0 <the_repo>, data=data@entry=0x7fffffffdb60) at submodule-config.c:650
#9  0x000055555577462d in config_from_gitmodules (data=0x7fffffffdb60, repo=<optimized out>, fn=0x555555773460 <gitmodules_fetch_config>) at submodule-config.c:638
#10 fetch_config_from_gitmodules (max_children=<optimized out>, recurse_submodules=<optimized out>) at submodule-config.c:800
#11 0x00005555555b9e41 in cmd_fetch (argc=1, argv=0x7fffffffe090, prefix=0x0) at builtin/fetch.c:1999
#12 0x0000555555573ff6 in run_builtin (argv=<optimized out>, argc=<optimized out>, p=<optimized out>) at git.c:528
#13 handle_builtin (argc=<optimized out>, argv=<optimized out>) at git.c:785
#14 0x000055555557528c in run_argv (argv=0x7fffffffddf0, argcp=0x7fffffffddfc) at git.c:857
#15 cmd_main (argc=<optimized out>, argv=<optimized out>) at git.c:993
#16 0x0000555555573ac8 in main (argc=3, argv=0x7fffffffe088) at common-main.c:52
```

The operations these commands use are guarded by items such as `index_name_pos()` and others. Since the `.gitmodules` file is always at root, we would not need to expand, anyway.
derrickstolee added a commit that referenced this pull request Nov 15, 2021
When I was playing around with trace2 data and creating flamegraphs, I tried a `git fetch` call to see how the `git-remote-https` command would show up. What I didn't expect was an `ensure_full_index()` region!

It turns out that `git fetch` and `git pull` need to check the index for a `.gitmodules` file to see if it should recurse into any submodules. Here is the stack trace from a debugger:

```
#0  ensure_full_index (istate=0x555555ac1c80 <the_index>) at sparse-index.c:404
#1  0x000055555571a979 in do_read_index (istate=istate@entry=0x555555ac1c80 <the_index>, path=path@entry=0x555555ad7b90 ".git/index", must_exist=must_exist@entry=0) at read-cache.c:2386
#2  0x000055555571eb7d in do_read_index (must_exist=0, path=0x555555ad7b90 ".git/index", istate=0x555555ac1c80 <the_index>) at hash.h:244
#3  read_index_from (istate=0x555555ac1c80 <the_index>, path=0x555555ad7b90 ".git/index", gitdir=0x555555ad7b30 ".git") at read-cache.c:2426
#4  0x000055555573f4c2 in repo_read_index (repo=repo@entry=0x555555ac1da0 <the_repo>) at repository.c:286
#5  0x00005555556f14d0 in get_oid_with_context_1 (repo=repo@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", flags=flags@entry=0, prefix=prefix@entry=0x0, 
    oid=oid@entry=0x7fffffffdb00, oc=oc@entry=0x7fffffffda70) at object-name.c:1850
#6  0x00005555556f1f53 in get_oid_with_context (oc=0x7fffffffda70, oid=0x7fffffffdb00, flags=0, str=0x55555582c022 ":.gitmodules", repo=0x555555ac1da0 <the_repo>) at object-name.c:1947
#7  repo_get_oid (r=r@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", oid=oid@entry=0x7fffffffdb00) at object-name.c:1603
#8  0x000055555577330f in config_from_gitmodules (fn=fn@entry=0x555555773460 <gitmodules_fetch_config>, repo=0x555555ac1da0 <the_repo>, data=data@entry=0x7fffffffdb60) at submodule-config.c:650
#9  0x000055555577462d in config_from_gitmodules (data=0x7fffffffdb60, repo=<optimized out>, fn=0x555555773460 <gitmodules_fetch_config>) at submodule-config.c:638
#10 fetch_config_from_gitmodules (max_children=<optimized out>, recurse_submodules=<optimized out>) at submodule-config.c:800
#11 0x00005555555b9e41 in cmd_fetch (argc=1, argv=0x7fffffffe090, prefix=0x0) at builtin/fetch.c:1999
#12 0x0000555555573ff6 in run_builtin (argv=<optimized out>, argc=<optimized out>, p=<optimized out>) at git.c:528
#13 handle_builtin (argc=<optimized out>, argv=<optimized out>) at git.c:785
#14 0x000055555557528c in run_argv (argv=0x7fffffffddf0, argcp=0x7fffffffddfc) at git.c:857
#15 cmd_main (argc=<optimized out>, argv=<optimized out>) at git.c:993
#16 0x0000555555573ac8 in main (argc=3, argv=0x7fffffffe088) at common-main.c:52
```

The operations these commands use are guarded by items such as `index_name_pos()` and others. Since the `.gitmodules` file is always at root, we would not need to expand, anyway.
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 19, 2022
When I was playing around with trace2 data and creating flamegraphs, I tried a `git fetch` call to see how the `git-remote-https` command would show up. What I didn't expect was an `ensure_full_index()` region!

It turns out that `git fetch` and `git pull` need to check the index for a `.gitmodules` file to see if it should recurse into any submodules. Here is the stack trace from a debugger:

```
#0  ensure_full_index (istate=0x555555ac1c80 <the_index>) at sparse-index.c:404
microsoft#1  0x000055555571a979 in do_read_index (istate=istate@entry=0x555555ac1c80 <the_index>, path=path@entry=0x555555ad7b90 ".git/index", must_exist=must_exist@entry=0) at read-cache.c:2386
microsoft#2  0x000055555571eb7d in do_read_index (must_exist=0, path=0x555555ad7b90 ".git/index", istate=0x555555ac1c80 <the_index>) at hash.h:244
microsoft#3  read_index_from (istate=0x555555ac1c80 <the_index>, path=0x555555ad7b90 ".git/index", gitdir=0x555555ad7b30 ".git") at read-cache.c:2426
microsoft#4  0x000055555573f4c2 in repo_read_index (repo=repo@entry=0x555555ac1da0 <the_repo>) at repository.c:286
microsoft#5  0x00005555556f14d0 in get_oid_with_context_1 (repo=repo@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", flags=flags@entry=0, prefix=prefix@entry=0x0, 
    oid=oid@entry=0x7fffffffdb00, oc=oc@entry=0x7fffffffda70) at object-name.c:1850
microsoft#6  0x00005555556f1f53 in get_oid_with_context (oc=0x7fffffffda70, oid=0x7fffffffdb00, flags=0, str=0x55555582c022 ":.gitmodules", repo=0x555555ac1da0 <the_repo>) at object-name.c:1947
microsoft#7  repo_get_oid (r=r@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", oid=oid@entry=0x7fffffffdb00) at object-name.c:1603
microsoft#8  0x000055555577330f in config_from_gitmodules (fn=fn@entry=0x555555773460 <gitmodules_fetch_config>, repo=0x555555ac1da0 <the_repo>, data=data@entry=0x7fffffffdb60) at submodule-config.c:650
microsoft#9  0x000055555577462d in config_from_gitmodules (data=0x7fffffffdb60, repo=<optimized out>, fn=0x555555773460 <gitmodules_fetch_config>) at submodule-config.c:638
microsoft#10 fetch_config_from_gitmodules (max_children=<optimized out>, recurse_submodules=<optimized out>) at submodule-config.c:800
microsoft#11 0x00005555555b9e41 in cmd_fetch (argc=1, argv=0x7fffffffe090, prefix=0x0) at builtin/fetch.c:1999
microsoft#12 0x0000555555573ff6 in run_builtin (argv=<optimized out>, argc=<optimized out>, p=<optimized out>) at git.c:528
microsoft#13 handle_builtin (argc=<optimized out>, argv=<optimized out>) at git.c:785
microsoft#14 0x000055555557528c in run_argv (argv=0x7fffffffddf0, argcp=0x7fffffffddfc) at git.c:857
microsoft#15 cmd_main (argc=<optimized out>, argv=<optimized out>) at git.c:993
microsoft#16 0x0000555555573ac8 in main (argc=3, argv=0x7fffffffe088) at common-main.c:52
```

The operations these commands use are guarded by items such as `index_name_pos()` and others. Since the `.gitmodules` file is always at root, we would not need to expand, anyway.
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 20, 2022
When fetching packfiles, we write a bunch of lockfiles for the packfiles
we're writing into the repository. In order to not leave behind any
cruft in case we exit or receive a signal, we register both an exit
handler as well as signal handlers for common signals like SIGINT. These
handlers will then unlink the locks and free the data structure tracking
them. We have observed a deadlock in this logic though:

    (gdb) bt
    #0  __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    microsoft#1  0x00007f4932bea2cd in _int_free (av=0x7f4932f2eb20 <main_arena>, p=0x3e3e4200, have_lock=0) at malloc.c:3969
    microsoft#2  0x00007f4932bee58c in __GI___libc_free (mem=<optimized out>) at malloc.c:2975
    microsoft#3  0x0000000000662ab1 in string_list_clear ()
    microsoft#4  0x000000000044f5bc in unlock_pack_on_signal ()
    microsoft#5  <signal handler called>
    microsoft#6  _int_free (av=0x7f4932f2eb20 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:4024
    microsoft#7  0x00007f4932bee58c in __GI___libc_free (mem=<optimized out>) at malloc.c:2975
    microsoft#8  0x000000000065afd5 in strbuf_release ()
    microsoft#9  0x000000000066ddb9 in delete_tempfile ()
    microsoft#10 0x0000000000610d0b in files_transaction_cleanup.isra ()
    microsoft#11 0x0000000000611718 in files_transaction_abort ()
    microsoft#12 0x000000000060d2ef in ref_transaction_abort ()
    microsoft#13 0x000000000060d441 in ref_transaction_prepare ()
    microsoft#14 0x000000000060e0b5 in ref_transaction_commit ()
    microsoft#15 0x00000000004511c2 in fetch_and_consume_refs ()
    microsoft#16 0x000000000045279a in cmd_fetch ()
    microsoft#17 0x0000000000407c48 in handle_builtin ()
    microsoft#18 0x0000000000408df2 in cmd_main ()
    microsoft#19 0x00000000004078b5 in main ()

The process was killed with a signal, which caused the signal handler to
kick in and try free the data structures after we have unlinked the
locks. It then deadlocks while calling free(3P).

The root cause of this is that it is not allowed to call certain
functions in async-signal handlers, as specified by signal-safety(7).
Next to most I/O functions, this list of disallowed functions also
includes memory-handling functions like malloc(3P) and free(3P) because
they may not be reentrant. As a result, if we execute such functions in
the signal handler, then they may operate on inconistent state and fail
in unexpected ways.

Fix this bug by not calling non-async-signal-safe functions when running
in the signal handler. We're about to re-raise the signal anyway and
will thus exit, so it's not much of a problem to keep the string list of
lockfiles untouched. Note that it's fine though to call unlink(2), so
we'll still clean up the lockfiles correctly.

Signed-off-by: Patrick Steinhardt <[email protected]>
Reviewed-by: brian m. carlson <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
dscho added a commit that referenced this pull request Nov 5, 2023
The `linux-leaks` job has become a lot more aggressive, failing the
following test cases:

- t0001.52 extensions.objectFormat is not allowed with repo version 0
- t1302.3 gitdir selection on unsupported repo
- t1302.4 gitdir not required mode
- t1302.5 gitdir required mode
- t1302.9 abort version=1 no-such-extension
- t1302.12 abort version=0 noop-v1

The reason is that the `commondir` strbuf _is_ sometimes initialized
_even if_ `discover_git_directory()` fails. The symptom:

    Direct leak of 24 byte(s) in 1 object(s) allocated from:
        #0 0x7f2e80ed8293 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
        #1 0x5603ff6ae674 in xrealloc wrapper.c:138
        #2 0x5603ff661801 in strbuf_grow strbuf.c:101
        #3 0x5603ff662417 in strbuf_add strbuf.c:300
        #4 0x5603ff653e2b in strbuf_addstr strbuf.h:310
        #5 0x5603ff65703b in setup_git_directory_gently_1 setup.c:1329
        #6 0x5603ff65731b in discover_git_directory_reason setup.c:1388
        #7 0x5603ff55b6c0 in discover_git_directory setup.h:79
        #8 0x5603ff55b7ce in hook_path_early hook.c:35
        #9 0x5603ff55b97d in find_hook hook.c:77
        #10 0x5603ff55bcd3 in run_hooks_opt hook.c:189
        #11 0x5603ff37ca05 in run_pre_command_hook git.c:457
        #12 0x5603ff37ce0a in run_builtin git.c:532
        #13 0x5603ff37d37e in handle_builtin git.c:798
        #14 0x5603ff37d65a in run_argv git.c:867
        #15 0x5603ff37dc8e in cmd_main git.c:1007
        #16 0x5603ff48f4ee in main common-main.c:62
        #17 0x7f2e80cabd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

The fix is easy: always release the `strbuf`s, even when the discovery
of the Git directory failed.

Signed-off-by: Johannes Schindelin <[email protected]>
dscho added a commit that referenced this pull request Oct 9, 2024
An internal customer reported a segfault when running `git
sparse-checkout set` with the `index.sparse` config enabled. I was
unable to reproduce it locally, but with their help we debugged into the
failing process and discovered the following stacktrace:

```
#0  0x00007ff6318fb7b0 in rehash (map=0x3dfb00d0440, newsize=1048576) at hashmap.c:125
#1  0x00007ff6318fbc66 in hashmap_add (map=0x3dfb00d0440, entry=0x3dfb5c58bc8) at hashmap.c:247
#2  0x00007ff631937a70 in hash_index_entry (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:122
#3  0x00007ff631938a2f in add_name_hash (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:638
#4  0x00007ff631a064de in set_index_entry (istate=0x3dfb00d0400, nr=8291, ce=0x3dfb5c58bc8) at sparse-index.c:255
#5  0x00007ff631a06692 in add_path_to_index (oid=0x5ff130, base=0x5ff580, path=0x3dfb4b725da "<redacted>", mode=33188, context=0x5ff570)    at sparse-index.c:307
#6  0x00007ff631a3b48c in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41f60, base=0x5ff580, depth=2, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:46
#7  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41e80, base=0x5ff580, depth=1, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#8  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41ac8, base=0x5ff580, depth=0, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#9  0x00007ff631a06a95 in expand_index (istate=0x3dfb00d0100, pl=0x0) at sparse-index.c:422
#10 0x00007ff631a06cbd in ensure_full_index (istate=0x3dfb00d0100) at sparse-index.c:456
#11 0x00007ff631990d08 in index_name_stage_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21, stage=0,    search_mode=EXPAND_SPARSE) at read-cache.c:556
#12 0x00007ff631990d6c in index_name_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21) at read-cache.c:566
#13 0x00007ff63180dbb5 in sanitize_paths (argc=185, argv=0x3dfb0030018, prefix=0x0, skip_checks=0) at builtin/sparse-checkout.c:756
#14 0x00007ff63180de50 in sparse_checkout_set (argc=185, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:860
#15 0x00007ff63180e6c5 in cmd_sparse_checkout (argc=186, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:1063
#16 0x00007ff6317234cb in run_builtin (p=0x7ff631ad9b38 <commands+2808>, argc=187, argv=0x3dfb0030018) at git.c:548
#17 0x00007ff6317239c0 in handle_builtin (argc=187, argv=0x3dfb0030018) at git.c:808
#18 0x00007ff631723c7d in run_argv (argcp=0x5ffdd0, argv=0x5ffd78) at git.c:877
#19 0x00007ff6317241d1 in cmd_main (argc=187, argv=0x3dfb0030018) at git.c:1017
#20 0x00007ff631838b60 in main (argc=190, argv=0x3dfb0030000) at common-main.c:64 
```

The very bottom of the stack being the `rehash()` method from
`hashmap.c` as called within the `name-hash` API made me look at where
these hashmaps were being used in the sparse index logic. These were
being copied across indexes, which seems dangerous. Indeed, clearing
these hashmaps and setting them as not initialized fixes the segfault.

The second commit is a response to a test failure that happens in
`t1092-sparse-checkout-compatibility.sh` where `git stash pop` starts to
fail because the underlying `git checkout-index` process fails due to
colliding files. Passing the `-f` flag appears to work, but it's unclear
why this name-hash change causes that change in behavior.
dscho added a commit that referenced this pull request Oct 9, 2024
An internal customer reported a segfault when running `git
sparse-checkout set` with the `index.sparse` config enabled. I was
unable to reproduce it locally, but with their help we debugged into the
failing process and discovered the following stacktrace:

```
#0  0x00007ff6318fb7b0 in rehash (map=0x3dfb00d0440, newsize=1048576) at hashmap.c:125
#1  0x00007ff6318fbc66 in hashmap_add (map=0x3dfb00d0440, entry=0x3dfb5c58bc8) at hashmap.c:247
#2  0x00007ff631937a70 in hash_index_entry (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:122
#3  0x00007ff631938a2f in add_name_hash (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:638
#4  0x00007ff631a064de in set_index_entry (istate=0x3dfb00d0400, nr=8291, ce=0x3dfb5c58bc8) at sparse-index.c:255
#5  0x00007ff631a06692 in add_path_to_index (oid=0x5ff130, base=0x5ff580, path=0x3dfb4b725da "<redacted>", mode=33188, context=0x5ff570)    at sparse-index.c:307
#6  0x00007ff631a3b48c in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41f60, base=0x5ff580, depth=2, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:46
#7  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41e80, base=0x5ff580, depth=1, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#8  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41ac8, base=0x5ff580, depth=0, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#9  0x00007ff631a06a95 in expand_index (istate=0x3dfb00d0100, pl=0x0) at sparse-index.c:422
#10 0x00007ff631a06cbd in ensure_full_index (istate=0x3dfb00d0100) at sparse-index.c:456
#11 0x00007ff631990d08 in index_name_stage_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21, stage=0,    search_mode=EXPAND_SPARSE) at read-cache.c:556
#12 0x00007ff631990d6c in index_name_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21) at read-cache.c:566
#13 0x00007ff63180dbb5 in sanitize_paths (argc=185, argv=0x3dfb0030018, prefix=0x0, skip_checks=0) at builtin/sparse-checkout.c:756
#14 0x00007ff63180de50 in sparse_checkout_set (argc=185, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:860
#15 0x00007ff63180e6c5 in cmd_sparse_checkout (argc=186, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:1063
#16 0x00007ff6317234cb in run_builtin (p=0x7ff631ad9b38 <commands+2808>, argc=187, argv=0x3dfb0030018) at git.c:548
#17 0x00007ff6317239c0 in handle_builtin (argc=187, argv=0x3dfb0030018) at git.c:808
#18 0x00007ff631723c7d in run_argv (argcp=0x5ffdd0, argv=0x5ffd78) at git.c:877
#19 0x00007ff6317241d1 in cmd_main (argc=187, argv=0x3dfb0030018) at git.c:1017
#20 0x00007ff631838b60 in main (argc=190, argv=0x3dfb0030000) at common-main.c:64 
```

The very bottom of the stack being the `rehash()` method from
`hashmap.c` as called within the `name-hash` API made me look at where
these hashmaps were being used in the sparse index logic. These were
being copied across indexes, which seems dangerous. Indeed, clearing
these hashmaps and setting them as not initialized fixes the segfault.

The second commit is a response to a test failure that happens in
`t1092-sparse-checkout-compatibility.sh` where `git stash pop` starts to
fail because the underlying `git checkout-index` process fails due to
colliding files. Passing the `-f` flag appears to work, but it's unclear
why this name-hash change causes that change in behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants