Diffing Performance (formerly included traversal numbers) #74
Replies: 4 comments 1 reply
-
And here is the results on the linux kernel:
A typical linux kernel tree has up to 75k files which seems to slow down |
Beta Was this translation helpful? Give feedback.
-
Another repository with a complex directory tree with thousands of folders and files, each of the few commits contains many changed paths. Thus it's a good approximation for the linux kernel tree and thus a decent test environment for finding better diff options.
Despite the apples vs oranges issue we might be having, the less scalable parallel version of libgit2 is 10% faster than a single thread of gitoxide. |
Beta Was this translation helpful? Give feedback.
-
When testing once again a repository with huge trees…
…it shows that a lot of time is actually spent parsing tree entries themselves. Even though |
Beta Was this translation helpful? Give feedback.
-
Re-running the tree-diffing on a repository with a single pack but massive trees yields the following results (but not the program was refactored to use memory constrained caches at all time, with a bigger cache nonetheless).
Here the multi-threaded version only gains a 2.5x speedup over the single-threaded version, which is still about 50% faster than the parallel libgit2 variant. It's hard to see where exactly all the time is spent but shows that using more advanced caches available to git should really be implemented eventually to speed this up. |
Beta Was this translation helpful? Give feedback.
-
Using the
traversal
program both commit graph iteration as well as tree diff performance can be tested on real repositories with various cache settings.Here is the results on the Rust repository.
Traversing commits is fast for both, and I have a feeling that it's mostly limited by object access performance.
gitoxide
is about 25% faster when traversing commits on a single thread.Producing tree deltas is even more intense when it comes to object access performance, and thus having a better cache is critical especially since many objects are reused-similar in two trees that are just a commit apart.
gitoxide
allows implementing any caching in that regard - it's transparent to the tree diff implementation. It appears that memory-capped LRU caches work similarly well here as they do for the pack cache. And sometimes, no pack cache is actually better as there are not enough hits to improve performance or amortize the cost.As usual,
gitoxide
has a near 4x performance gain in multithreaded mode, whereaslibgit2
gets a similar scaling factor as for object-access. Even with multithreadinglibgit2
is about 30% behindgitoxide
, which certainly is due to the fact that it's an apples and oranges comparison. Better tuning of the algorithm options may help in that regard.In smaller repositories commit iteration is relatively close with tree diffing once again being far behind which encourages playing with libgit2 diff options more, currently they use the
default()
settings.Beta Was this translation helpful? Give feedback.
All reactions