-
Notifications
You must be signed in to change notification settings - Fork 16.7k
[stable/concourse] properly cleanup btrfs subvolume and children #12398
Conversation
When a `dind` (Docker in Docker) image is used with btrfs to e.g. run integration tests as per https://hub.docker.com/r/amidos/dcind/ in some occasions like job errors or interruptions the btrfs subvolumes are left not cleaned. So what happens then is that when the `rm -rf /concourse-worker-dir` runs it fails with `Operation not permitted` error which then causes `Init:Error` and ends in `Init:CrashLoopBackOff`. The solution is to take that into account and properly delete all of the btrfs subvolumes. This can be achieved either with the suggested script or with the mount option [user_subvol_rm_allowed](https://askubuntu.com/questions/509292/how-to-set-user-subvol-rm-allowed-capability) that is tricky to apply or with that delete script that seems as a better option. Signed-off-by: Radoslav Kirilov <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: smoke If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @smoke. Thanks for your PR. I'm waiting for a helm member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @cirocosta |
Hi @smoke, Thanks for the PR! If I understood correctly, this is only needed in the specific use case where one uses I'm concerned that this might be a very specific use case, which could be covered instead with the possibility of allowing users of the Helm chart to supply their own Going with the approach I mentioned, you could have in your worker:
initContainers:
- name: my-init-container
image: concourse/concourse
securityContext: { privileged: true }
command: [ /bin/sh ]
args:
- -ce
- |-
for v in $(btrfs subvolume list --sort=-ogen /your-work-dir | awk '{print $9}'); do
btrfs subvolume delete /your-work-dir/$v
done
volumeMounts:
- name: concourse-work-dir
mountPath: /your-work-dir Wdyt? Thanks! |
@cirocosta Yes, you have got the case correctly and only if the I was wondering if prepending the initContainer conditionally would make more sense, but I am not sure if there is like a standard way to switch to btrfs. For instance in my case we had to define a specific storage class, beside switching the baggageclaim driver resulting in the following:
concourse:
worker:
baggageclaim:
driver: btrfs
worker:
replicas: 2
persistence:
worker:
size: 150Gi
storageClass: gp2-btrfs
## Create 'gp2-btrfs' storage class
## `kubectl apply -f storage-btrfs.yaml`
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp2-btrfs
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
fsType: btrfs
allowVolumeExpansion: true and then in {{- if eq .Values.worker.baggageclaim.driver "btrfs" }}
initContainers:
- name: {{ template "concourse.worker.fullname" . }}-init-btrfs-subvolumes-remove
{{- if .Values.imageDigest }}
image: "{{ .Values.image }}@{{ .Values.imageDigest }}"
{{- else }}
image: "{{ .Values.image }}:{{ .Values.imageTag }}"
{{- end }}
imagePullPolicy: {{ .Values.imagePullPolicy | quote }}
securityContext:
privileged: true
command:
- /bin/sh
args:
- -ce
- |-
for v in $(btrfs subvolume list --sort=-ogen {{ .Values.concourse.worker.workDir }} | awk '{print $9}'); do
btrfs subvolume delete {{ .Values.concourse.worker.workDir }}/$v
done
volumeMounts:
- name: concourse-work-dir
mountPath: {{ .Values.concourse.worker.workDir | quote }}
{{- end }} Wdyt? |
@cirocosta Sorry to bother, but any thoughts on my previous comment? |
Hi @smoke, thanks for the detailed reply!
oh, sorry! I didn't mean to imply that it's something possible to do right now, that's indeed something that is not there.
Yeah, that'd be much better indeed, as someone using It'd also be interesting to have this extra
Wdyt? Some more context - looking at a not so far away future, we should really not rely too much on having those Right now, one can use that through a combination of charts/stable/concourse/values.yaml Lines 890 to 904 in ececcab
worker.cleanUpWorkDirOnStart=false , which are not being set as defaults at the moment as we're not yet with all of the coverage of the edge cases that might happen for particular drivers.
Once we get that lifecycle better covered, we can definitely move more towards that 🙌 |
Yes it all makes sense and may be by then we can merge my PR as quick and easy and refactor accordingly in the bear future with the init containers. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions. |
This issue is being automatically closed due to inactivity. |
Oh damn, this one should definitely not be closed - sorry for the long time! We've been quite busy these weeks, but we'll get back to this one shortly. Sorry! |
Hi, @smoke! Sorry for the super super super slow turn over here 🐢, but we're finally getting this in 😁 Would you mind opening another PR with the original content? @taylorsilva and @pivotal-bin-ju added tests around this (see concourse/concourse#3997) after setting up a reproducible, and this would really help 👍 I was a bit hesitant in the beginning of having the Thanks!! |
@cirocosta I am on it! |
@cirocosta thanks for taking care of, I really appreciate it! Please move it forward as you see fit. Thanks again, |
Hey @smoke, Actively reviewing the other PR, should hopefully get it in by EOD today. |
What this PR does / why we need it:
When a
dind
(Docker in Docker) image is used with btrfs to e.g. run integration tests as per https://hub.docker.com/r/amidos/dcind/in some occasions like job errors or interruptions the btrfs subvolumes are left not cleaned.
So what happens then is that when the
rm -rf /concourse-worker-dir
runs it fails withOperation not permitted
errorwhich then causes
Init:Error
and ends inInit:CrashLoopBackOff
.The solution is to take that into account and properly delete all of the btrfs subvolumes.
This can be achieved either with the suggested script or with the mount option user_subvol_rm_allowed
that is tricky to apply or with that delete script that seems as a better option.
Signed-off-by: Radoslav Kirilov [email protected]
Which issue this PR fixes
(optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged)Special notes for your reviewer:
@cirocosta @william-tran Please check this PR. I have tested this in our Concourse CI and it fixes the problem, however I do not have a setup that uses non btrfs storage.
Checklist
[Place an '[x]' (no spaces) in all applicable fields. Please remove unrelated fields.]