Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloning large repo in libgit2 slower than git clone #4674

Open
Arcterus opened this issue Jun 6, 2018 · 15 comments
Open

Cloning large repo in libgit2 slower than git clone #4674

Arcterus opened this issue Jun 6, 2018 · 15 comments

Comments

@Arcterus
Copy link

Arcterus commented Jun 6, 2018

I didn't see this when I briefly checked the list of issues, so sorry if it's already been reported. Both libgit2 and git clone spend roughly the same amount of time downloading (around 30 seconds or so), but the "Resolving deltas" stage is much slower using libgit2.

Reproduction steps

I just cloned git://sourceware.org/git/glibc.git using the example code in the repository.

Expected behavior

Both libgit2 and git clone take roughly the same amount of time.

Actual behavior

libgit2 takes around 4 minutes, whereas git clone takes about 1.5 minutes.

Version of libgit2 (release number or SHA1)

0.27.0 (I've also tested whatever the Rust crate git2 downloads by default, and the speed was about the same).

Operating system(s) tested

Linux alex-linux 4.16.13-2-ARCH #1 SMP PREEMPT Fri Jun 1 18:46:11 UTC 2018 x86_64 GNU/Linux

@mw-ding
Copy link

mw-ding commented Sep 28, 2018

+1 on this

@eaigner
Copy link
Contributor

eaigner commented Feb 6, 2019

+1 clone is very, very slow compared to the offical git binary. After the cli git client received all objects the Resolving deltas and Checkout phase is relatively quick. libgits transfer progress callback shows that indexed objects and deltas take about 1 second per 20.

@eaigner
Copy link
Contributor

eaigner commented Feb 6, 2019

What I found helps a little is setting opts.checkout_opts.disable_filters = 1, but it is still magnitues slower than a git clone.

@eaigner
Copy link
Contributor

eaigner commented Feb 6, 2019

Disabling GIT_OPT_ENABLE_STRICT_HASH_VERIFICATION helped also a bit, but the clone time is still a long way from fast.

@eaigner
Copy link
Contributor

eaigner commented Feb 6, 2019

What also helps is chose the appropriate SHA1 backend when building. I found this rather undocumented flag in the CMake files SHA1_BACKEND.

If i complie with (on macOS/iOS)

-DSHA1_BACKEND=CommonCrypto

it's already quite a bit faster than the default SHA1 c implementation, so maybe set this to a value appropriate for your system.

@tiennou
Copy link
Contributor

tiennou commented Feb 6, 2019

@eaigner This will disable the only SHAttered-enabled hashing backend we have (AFAIK CommonCrypto doesn't detect it), hence YMMV.

@eaigner
Copy link
Contributor

eaigner commented Feb 6, 2019

So how else can I improve clone performance? Like I said with default settings its not really usable and more than 10 or 100x slower than a git clone in resolving deltas.

@FranklinYu
Copy link

FranklinYu commented Feb 27, 2020

More data points:

cgit2 clone git 24.70s user 0.98s system 79% cpu 32.178 total
git clone git 18.82s user 1.75s system 124% cpu 16.577 total
cgit2 clone linux 491.47s user 15.81s system 88% cpu 9:30.34 total
git clone linux 343.69s user 36.13s system 151% cpu 4:09.93 total

So it is about half the performance (on my machine). Note that I was testing 0.27.7 (the version on Debian Buster).

@greened
Copy link

greened commented Oct 19, 2020

I'm assuming this has to do with the fact that native git uses multiple threads to resolve deltas while I can't find any info on whether libgit2 does the same, which leads me to believe it doesn't.

@neithernut
Copy link
Contributor

Judging from the reported CPU-usage (>100% for git, <100% for cgit2), this is likely one contributing factor.

However, the "user" time (which I assume is the sum of the times spend in user-space over all threads) is also higher for cgit2, implying some potential for speed-ups. If the measurement covers all the processing done by git (e.g. does not omit some child process), that is.

@greened
Copy link

greened commented Jan 4, 2021

Agreed, there is probably some lower-hanging fruit here beforer going multitheraded. I've run git clones with threrading disabled and they don't take as long as libgit2.

@ryankopf
Copy link

+1 this issue. I am also experiencing slowdowns due to this issue.

@hpe-ykoehler
Copy link

Can it be that "git clone" command support parallel job where as libgit2 may not use parallel jobs? On big repo I am seeing why more than 1 mins delay, sometimes more like over 10 mins slower using libgit2

@mommy742
Copy link

i am also expieriencing this issue as well.

@ethomson
Copy link
Member

Yep, apologies but I've been making incremental progress on this in a branch.

Keno added a commit to JuliaLang/Pkg.jl that referenced this issue Nov 8, 2024
Libgit2's "Resolving Deltas" code is extremely slow (libgit2/libgit2#4674) on
larger repositories, so it is important to have an accurate progress bar to avoid users thinking
the download is stuck. We had this implemented. However, we were never actually switching to it,
because the progress meter thought the progress was jumping backwards and wouldn't actually update
because of it. Fix that by resetting it on the first switch to resolving deltas.
KristofferC pushed a commit to JuliaLang/Pkg.jl that referenced this issue Nov 15, 2024
Libgit2's "Resolving Deltas" code is extremely slow (libgit2/libgit2#4674) on
larger repositories, so it is important to have an accurate progress bar to avoid users thinking
the download is stuck. We had this implemented. However, we were never actually switching to it,
because the progress meter thought the progress was jumping backwards and wouldn't actually update
because of it. Fix that by resetting it on the first switch to resolving deltas.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests