-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image Update Automation stops working when git clone
takes a long time
#296
Comments
Using shallow clones makes it difficult or impossible to switch branches when someone specifies a "push branch" -- see I'm surprised that it only needs latency of >8s to fail. That suggests there's something else going on here.
Failing how? Does it fail to clone the repo, but continue running? Or crash, or stall? Or something else. |
Similar situation here: {
"level": "error",
"ts": "2022-01-19T13:01:34.563Z",
"logger": "controller.imageupdateautomation",
"msg": "Reconciler error",
"reconciler group": "image.toolkit.fluxcd.io",
"reconciler kind": "ImageUpdateAutomation",
"name": "image-update-automation",
"namespace": "demo",
"error": "unable to clone: failed to connect to some.gitlab.com: Connection timed out"
} the FYI, we use default timeout settings everywhere in flux resources, our gitlab instance does have some performance problems now and then which could cause |
Would one of you be able to try out the image from #297? This brings the timeout logic back to the shape it was in around the time of the If this resolves the issue, we need to have another look at how libgit2 reacts to the (cancelling) callbacks. |
With the release of Flux |
@martinzellner thank you very much for reporting this. Today we are releasing version v0.21.0 which introduces an experimental transport that should fix the issue in which the controller stops working in some scenarios. The experimental transport needs to be opted-in by setting the environment variable Can you test it again using the version |
This should be fixed as part of Managed Transport being made default. Latest release candidates with this changes:
Closing for lack of activity - happy to reopen in case others report recurrence. |
I believe that this issue is still present in v0.32.0 as part of the 2.0.0 release candidate. We have a large kustomize tree in a ~72MB git repository source (when zipped) containing a total of 24 image policies referred to by policy markers in a single kustomization.yaml. ImagePolicies are matching the correct tag to be applied. Whenever the ImageUpdateAutomation runs the controller downloads the source branch, unpacks these files into /tmp/somefolder/ (82MB on disk) and quickly deletes the files when the total usage within this folder is approaching ~100MB. Because my target branch is different to the source I think it then downloads the latest commit for comparison but then also deletes the files immediately. Finally the logs indicate that no changes were applied.
Typically the process outlined above takes between 13-15s for our repository but no further log output is generated (note DEBUG is enabled above). If I define a different Kustomization tree from a smaller git repo for a subset of the ImagePolicy markers (28MB zip archive) then commits are made successfully to the target branch. Would you consider reopening this please @pjbgf? |
Thank you for reopening the issue. In order to validate if it is the same problem I have tested this against a shallow clone of the original branch, removing some ~50MB of unnecessary files which reduced the processing time to only 6 seconds. I still see the same behaviour, namely
|
Just want to check in with users who reported here. Are there any users actively tracking this issue who can say if it persists in Flux 2.0.0, which has been released this week? @steviestainz The multiple imagepolicy markers in the same manifest are definitely supported. But one of your tags looks like a possibly invalid semver. |
Describe the bug
For our git repo with a lot of commits, the image update automation stopped working.
As increasing the timeouts for the git clone [1] did not help we decided to squash a large part of the git history which dramatically reduced the time to clone the repo and also fixed the image update automation. Of course, this comes with the cost of losing the commit history.
Therefore we kindly ask if it would be possible to enable the use of git's shallow clone functionality [2], which would enable faster cloning without having to squash the git history.
[1] https://fluxcd.io/docs/components/source/api/#source.toolkit.fluxcd.io/v1beta1.GitRepository
[2] https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---depthltdepthgt
Steps to reproduce
Expected behavior
We would like flux to use GIT's shallow clone functionality [1].
[1] https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---depthltdepthgt
Screenshots and recordings
No response
OS / Distro
N/A
Flux version
N/A
Flux check
N/A
Git provider
Bitbucket
Container Registry provider
No response
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: