[Performance] pack-index-from-data #82
Replies: 2 comments 9 replies
-
AFAIR git uses a hard-coded cutoff. Below the cutoff, git caches the object during the initial pass. Above the cutoff, the object data is decompressed into a fixed, pre-allocated buffer and that is used for sha1/crc32 updates. |
Beta Was this translation helpful? Give feedback.
-
Something else to consider -- not every traversal cares about the contents of every object. For example, a traversal to build the commit-graph does not care about anything except commits. It may be worth augmenting the API to allow the caller to specify an allowlist of object types, and skip decoding anything that is not in the list. |
Beta Was this translation helpful? Give feedback.
-
Previously the main testing ground for 'the heaviest possible pack to index' was the linux kernel. Now, however, there is a new contender being the android base pack which contains some big blobs. The latter isn't the case for the kernel pack.
Android Base
Here is the
gitoxide
run.Note that we don't actually write the index, whereas git does.
And here is something similar with git.
It's clear that
gitoxide
uses much more memory when handling these large blobs. What comes to mind is that it keeps the base objects in memory to be able to process its direct deltas, breadth first, so if there are many deltas a lot of decoded objects hang in memory. When all deltas are computed, the base goes out of memory and the process repeats. Maybe there is a bug there?Note that these measurements are on an M1 with unoptimized Sha1 computation, slowing down the process of decoding objects significantly. With a fast implementation like the one git uses, about 230 CPU seconds could be saved.
The linux kernel
With a fast ASM based Sha1 implementation
gitoxide
could save about 100 CPU seconds.And the same with
git
.gitoxide
performs better here not only in time but also in memory usage.Beta Was this translation helpful? Give feedback.
All reactions