[Performance] pack-index-from-data #82

Byron · 2021-05-09T01:23:24Z

Byron
May 9, 2021
Maintainer

Previously the main testing ground for 'the heaviest possible pack to index' was the linux kernel. Now, however, there is a new contender being the android base pack which contains some big blobs. The latter isn't the case for the kernel pack.

Android Base

Here is the gitoxide run.

➜  android-base.git git:(master) /usr/bin/time -lp gixp -v pack-index-from-data -p objects/pack/pack-2350eb47b6dd64b94abe5fd49d68a6db993b0657.pack
 9:01:35 read pack done 3.3GB in 30.19s (109.7MB/s)
 9:01:35  indexing done 4.2M objects in 30.19s (140.0k objects/s)
 9:01:35 decompressing done 6.9GB in 30.19s (228.7MB/s)
 9:02:58     Resolving done 4.2M objects in 82.89s (51.0k objects/s)
 9:02:58      Decoding done 241.4GB in 82.89s (2.9GB/s)
 9:02:59 writing index file done 118.3MB in 0.22s (534.9MB/s)
 9:02:59  create index file done 4.2M objects in 114.17s (37.0k objects/s)
index: 43ad33001d9dc789d238ace1c1c149278f359486
pack: 2350eb47b6dd64b94abe5fd49d68a6db993b0657
real       114.23
user       489.76
sys         28.87
          5024792576  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
             2598529  page reclaims
              209547  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                3201  voluntary context switches
              951738  involuntary context switches
       4815852773037  instructions retired
       1331129914882  cycles elapsed
          5289334784  peak memory footprint

Note that we don't actually write the index, whereas git does.

And here is something similar with git.

➜  android-base.git git:(master) /usr/bin/time -lp git index-pack --stdin -v < objects/pack/pack-2350eb47b6dd64b94abe5fd49d68a6db993b0657.pack
Receiving objects: 100% (4226525/4226525), 3.08 GiB | 82.67 MiB/s, done.
Resolving deltas: 100% (2845294/2845294), done.
pack    2350eb47b6dd64b94abe5fd49d68a6db993b0657
real       130.18
user       232.54
sys         39.22
          3558850560  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              826634  page reclaims
               58651  page faults
                   0  swaps
                   0  block input operations
                   2  block output operations
                   0  messages sent
                   0  messages received
                 128  signals received
              122861  voluntary context switches
             3881026  involuntary context switches
       1346271473532  instructions retired
        812331917498  cycles elapsed
           984481600  peak memory footprint

It's clear that gitoxide uses much more memory when handling these large blobs. What comes to mind is that it keeps the base objects in memory to be able to process its direct deltas, breadth first, so if there are many deltas a lot of decoded objects hang in memory. When all deltas are computed, the base goes out of memory and the process repeats. Maybe there is a bug there?

Note that these measurements are on an M1 with unoptimized Sha1 computation, slowing down the process of decoding objects significantly. With a fast implementation like the one git uses, about 230 CPU seconds could be saved.

The linux kernel

➜  linux.git git:(master) /usr/bin/time -lp gixp -v pack-index-from-data -p objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack
 9:18:45 read pack done 1.4GB in 15.18s (89.7MB/s)
 9:18:45  indexing done 7.6M objects in 15.18s (500.8k objects/s)
 9:18:45 decompressing done 2.7GB in 15.18s (179.0MB/s)
 9:19:11     Resolving done 7.6M objects in 26.52s (286.6k objects/s)
 9:19:11      Decoding done 95.6GB in 26.52s (3.6GB/s)
 9:19:13 writing index file done 212.8MB in 0.40s (537.5MB/s)
index: 3fe49647a452e0f7ee6a857fb65ee243f6e38bb3==============================================================================================================>-----------------------------------------]
pack: 3ee05b0f4e4c2cb59757c95c68e2d13c0a491289
 9:19:13  create index file done 7.6M objects in 43.73s (173.8k objects/s)
real        43.75
user       207.30
sys          9.19
          2422931456  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               89855  page reclaims
               72158  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                  27  voluntary context switches
              410888  involuntary context switches
       2060457964705  instructions retired
        565456123909  cycles elapsed
           879120776  peak memory footprint

With a fast ASM based Sha1 implementation gitoxide could save about 100 CPU seconds.

And the same with git.

➜  linux.git git:(master) /usr/bin/time -lp git index-pack --stdin -v < objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack
Receiving objects: 100% (7600359/7600359), 1.26 GiB | 51.99 MiB/s, done.
Resolving deltas: 100% (6396700/6396700), done.
pack    3ee05b0f4e4c2cb59757c95c68e2d13c0a491289
real        83.92
user       124.94
sys         23.48
          2677342208  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              343567  page reclaims
               11139  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                  80  signals received
               47431  voluntary context switches
             8479185  involuntary context switches
        902255291480  instructions retired
        447011056784  cycles elapsed
          1205543344  peak memory footprint

gitoxide performs better here not only in time but also in memory usage.

taralx · 2021-05-09T02:17:56Z

taralx
May 9, 2021

AFAIR git uses a hard-coded cutoff. Below the cutoff, git caches the object during the initial pass. Above the cutoff, the object data is decompressed into a fixed, pre-allocated buffer and that is used for sha1/crc32 updates.

2 replies

taralx May 9, 2021

Although we have the delta tree (git doesn't), it might still be very much worth doing this time-memory tradeoff for the large number of small objects.

Byron May 9, 2021
Maintainer Author

Oh, that's interesting. I thought the indexing phase must be a delta tree, how else would it be so fast. Apparently, sometimes being convinced of something that's not actually there is a good thing. Maybe it's another kind of index though that works with a single buffer somehow.

When writing an index from streamed entries the object size isn't available yet, so the only way to change the memory strategy is during the tree-based traversal.

I would find it very promising to do some research in that area given that the current indexing implementation outperforms git both in terms of time and memory at least on the linux kernel, while it consumes too much memory on the android base repo. With these two extremes, it should be possible to adapt the algorithm.

taralx · 2021-05-09T03:50:37Z

taralx
May 9, 2021

Something else to consider -- not every traversal cares about the contents of every object. For example, a traversal to build the commit-graph does not care about anything except commits. It may be worth augmenting the API to allow the caller to specify an allowlist of object types, and skip decoding anything that is not in the list.

7 replies

taralx May 9, 2021

Looks like ancestor traversal requires a set of heads, where pack traversal does not (heads are implicit).

Byron May 12, 2021
Maintainer Author

Sorry for the delay and for being a little slow in understanding. I think what I am missing is a usecase for the filter. To recap, here is what I know about tree traversal:

it's applicable to a single pack only due to offsets being used as identifier for entries
it's meant to speed up the decoding stage, i.e. when every object should be decoded like it's needed when receiving packs to build an index or when verifying the contents of a pack like in fsck.

Adding a filter to only traverse a subset of entries/objects is definitely possible, I just don't know what the application for this would be in practical terms. Thanks for helping me understand.

taralx May 12, 2021

It's mostly for cases where one needs to traverse "all existing commits" or "everything but blobs". Consider the following:

A tool to find "lost" heads.
Building the commit-graph for an existing pack.
Constructing reachability bitmaps.

taralx May 12, 2021

In each of these cases you have to decode a lot of objects, many of which are deltas (especially trees). But not all, and importantly not blobs, which can be large and expensive to decode.

Byron May 13, 2021
Maintainer Author

Ah, thanks, these usecases is exactly what I was looking for. Building a commitgraph and reachability bitmaps seems like a thing gitoxide will eventually have to support.

If you plan to work on any of these, you are very welcome to prepare this by adding some sort of filter capability. Ideally though, one could use what's available here first, get performance numbers (especially in comparison to git), and then optimize accordingly. That way complexity is added as a user facing feature demands it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] pack-index-from-data #82

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Performance] pack-index-from-data #82

Byron May 9, 2021 Maintainer

Android Base

The linux kernel

Replies: 2 comments · 9 replies

taralx May 9, 2021

taralx May 9, 2021

Byron May 9, 2021 Maintainer Author

taralx May 9, 2021

taralx May 9, 2021

Byron May 12, 2021 Maintainer Author

taralx May 12, 2021

taralx May 12, 2021

Byron May 13, 2021 Maintainer Author

Byron
May 9, 2021
Maintainer

Replies: 2 comments 9 replies

taralx
May 9, 2021

Byron May 9, 2021
Maintainer Author

taralx
May 9, 2021

Byron May 12, 2021
Maintainer Author

Byron May 13, 2021
Maintainer Author