-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Startup slow due to fsGroup with large data volume #14
Comments
…ent label (runatlantis#14) * Updating cluster-overprovisioner label * Bumb chart version * Update README
* Implement runWithUser/fsGroup within entrypoint Addresses #14 * Bump chart version
This changes has reduced our startup duration from over 25 minutes to 60 seconds. Thank you for the implementation. However, we've had to temporarily turn it back on for deploying a brand new test instance. Has that edge case been tested? |
I thought I tested this when originally setting up the PR, but it was about a year and a half before it was merged, so it's possible something changed since then, or that I have a bad recollection. |
In fact this change broke our deployment because we enforce a non-root user for atlantis container, which cannot |
are we saying that this should be reverted? |
Hi @jamengual. I don't think we should revert it, I believe it works well now. This change moved a By the time of releases https://github.com/runatlantis/helm-charts/releases/tag/atlantis-4.0.3 and https://github.com/runatlantis/atlantis/releases/tag/v0.19.4 the changes were properly coordinated. Now deploying a new image captures having to run Helm chart version 4.0.0 introduced this change on May 16 but Atlantis core did not release it until June 6 or so. They should have been released together. |
Yes, there were some issues on the releases, now we have a new process and
we will continue to improve it.
…On Tue, Jun 21, 2022 at 12:34 PM Gabor Maghera ***@***.***> wrote:
Hi @jamengual <https://github.com/jamengual>. I don't think we should
revert it, I believe it works well now.
This change moved a values.yaml setting of the Helm chart from
runatlantis/helm-charts into the entrypoint defined in
runatlantis/atlantis. Apparently the releases across the two repositories
weren't sufficiently coordinated. The Helm chart release predated the one
of Atlantis core, which resulted in restarts of the Atlantis container
immediately speeding up, but resulting in an error during the edge case of
deploying a new version of the image.
By the time of releases
https://github.com/runatlantis/helm-charts/releases/tag/atlantis-4.0.3
and https://github.com/runatlantis/atlantis/releases/tag/v0.19.4 the
changes were properly coordinated. Now deploying a new image captures
having to run chown and we incur the performance penalty, but subsequent
restarts are lightning fast.
Helm chart version 4.0.0 introduced this change on May 16 but Atlantis
core did not release it until June 6 or so. They should have been released
together.
—
Reply to this email directly, view it on GitHub
<#14 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQ3ERDNYIIVNW3GFZWZ2STVQIKNRANCNFSM4T7ZAYAA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The change in the helm chart removed default security context settings in IMHO:
|
After looking at the change
which removes the People that need to run it as root, can change the values on the chart since they are already exposed. |
Hmm, I see it run as atlantis inside the pod, and when I deployed a new version of the chart and image I saw file ownership updates indicating the contrary. I saved a screenshot to share with my coworkers that we now have both sides of the implementation. This is from inside the pod from Helm chart 4.0.3 using our own image, based on Atlantis 1.9.4:
|
I just merged and updated the chart, did you use the new chart ? |
That screenshot was taken Tuesday on June 14th using chart version 4.0.3. |
Use fsGroupChangePolicy: "OnRootMismatch" (stable in k8s 1.23) to eliminate unnecessary chmod that can be slow for large volumes. See runatlantis#14 See https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods
Use fsGroupChangePolicy: "OnRootMismatch" (stable in k8s 1.23) to eliminate unnecessary chmod that can be slow for large volumes. See #14 See https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods
@nitrocode IIRC the improvement was rolled back for some reason. |
Closing due to merging of #158 |
Cross-posting from runatlantis/atlantis#342 (comment) for visibility. The Atlantis chart uses the
securityContext.fsGroup
option to initially set filesystem permissions. I presume this is because persistent volumes are often initialized to have a filesystem owned by root. However,fsGroup
causes Kubernetes to run a recursivechown
on the entire volume at every startup. For a large data volume with many clones of large repositories, this can take upwards of 10 minutes. Since Atlantis runs as a singleton this equates to 10 extra minutes of downtime on every deploy.In our private fork of Atlantis I have dealt with this problem by doing the recursive
chown
in the Docker entrypoint, if and only if the top-level directory is not owned by the Atlantis user. Kubernetes now provides this feature natively, but only in alpha: see fsGroupChangePolicy.The text was updated successfully, but these errors were encountered: