-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pull-mirror sync seems unnecessarily slow #18352
Labels
Milestone
Comments
This comment was marked as outdated.
This comment was marked as outdated.
petergardfjall
changed the title
A mirror-sync on a mirror without releases is unnecessarily slow
pull-mirror sync seems unnecessarily slow
Mar 18, 2022
6543
pushed a commit
that referenced
this issue
Mar 31, 2022
This addresses #18352 It aims to improve performance (and resource use) of the `SyncReleasesWithTags` operation for pull-mirrors. For large repositories with many tags, `SyncReleasesWithTags` can be a costly operation (taking several minutes to complete). The reason is two-fold: 1. on sync, every upstream repo tag is compared (for changes) against existing local entries in the release table to ensure that they are up-to-date. 2. the procedure for getting _each tag_ involves a series of git operations ```bash git show-ref --tags -- v8.2.4477 git cat-file -t 29ab6ce9f36660cffaad3c8789e71162e5db5d2f git cat-file -p 29ab6ce9f36660cffaad3c8789e71162e5db5d2f git rev-list --count 29ab6ce9f36660cffaad3c8789e71162e5db5d2f ``` of which the `git rev-list --count` can be particularly heavy. This PR optimizes performance for pull-mirrors. We utilize the fact that a pull-mirror is always identical to its upstream and rebuild the entire release table on every sync and use a batch `git for-each-ref .. refs/tags` call to retrieve all tags in one go. For large mirror repos, with hundreds of annotated tags, this brings down the duration of the sync operation from several minutes to a few seconds. A few unscientific examples run on my local machine: - https://github.com/spring-projects/spring-boot (223 tags) - before: `0m28,673s` - after: `0m2,244s` - https://github.com/kubernetes/kubernetes (890 tags) - before: `8m00s` - after: `0m8,520s` - https://github.com/vim/vim (13954 tags) - before: `14m20,383s` - after: `0m35,467s` I added a `foreachref` package which contains a flexible way of specifying which reference fields are of interest (`git-for-each-ref(1)`) and to produce a parser for the expected output. These could be reused in other places where `for-each-ref` is used. I'll add unit tests for those if the overall PR looks promising.
Closing this issue since #19125 is now merged. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Gitea Version
6c7084c
Git Version
2.25.1
Operating System
Linux (Ubuntu 20.04)
How are you running Gitea?
Built from source, run locally on the host.
Database
PostgreSQL
Can you reproduce the bug on the Gitea demo site?
No
Log Gist
No response
Description
A mirror-sync operation not only runs
git remote update
, but also tries to sync any repo releases with the available repo tags. For large repositories with many tags, this can be a costly operation, both in time and computational resources.I noticed that when doing a "plain git mirror" (which doesn't include any releases) of a big repo (such as Kubernetes with about 900 tags), the synchronize operation (
SyncReleasesWithTags
) spent a lot of time (about six minutes) listing/syncing tags with releases.This appears to be caused by repetitive calls like (one for each repo tag):
In particular
git rev-list --count'
can be heavy for large repos with many commits.It seems like there is an opportunity to improve performance and reduce resource use by making this procedure more efficient for pull-mirrors.
The text was updated successfully, but these errors were encountered: