Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

[stable/concourse] properly cleanup btrfs subvolume and children #12398

Closed
wants to merge 1 commit into from
Closed

[stable/concourse] properly cleanup btrfs subvolume and children #12398

wants to merge 1 commit into from

Conversation

smoke
Copy link
Contributor

@smoke smoke commented Mar 20, 2019

What this PR does / why we need it:

When a dind (Docker in Docker) image is used with btrfs to e.g. run integration tests as per https://hub.docker.com/r/amidos/dcind/
in some occasions like job errors or interruptions the btrfs subvolumes are left not cleaned.
So what happens then is that when the rm -rf /concourse-worker-dir runs it fails with Operation not permitted error
which then causes Init:Error and ends in Init:CrashLoopBackOff.
The solution is to take that into account and properly delete all of the btrfs subvolumes.
This can be achieved either with the suggested script or with the mount option user_subvol_rm_allowed
that is tricky to apply or with that delete script that seems as a better option.

Signed-off-by: Radoslav Kirilov [email protected]

Which issue this PR fixes

(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged)

  • fixes N/A

Special notes for your reviewer:

@cirocosta @william-tran Please check this PR. I have tested this in our Concourse CI and it fixes the problem, however I do not have a setup that uses non btrfs storage.

Checklist

[Place an '[x]' (no spaces) in all applicable fields. Please remove unrelated fields.]

  • DCO signed
  • Chart Version bumped
  • Variables are documented in the README.md

When a `dind` (Docker in Docker) image is used with btrfs to e.g. run integration tests as per https://hub.docker.com/r/amidos/dcind/
in some occasions like job errors or interruptions the btrfs subvolumes are left not cleaned.
So what happens then is that when the `rm -rf /concourse-worker-dir` runs it fails with `Operation not permitted` error
which then causes `Init:Error` and ends in `Init:CrashLoopBackOff`.
The solution is to take that into account and properly delete all of the btrfs subvolumes.
This can be achieved either with the suggested script or with the mount option [user_subvol_rm_allowed](https://askubuntu.com/questions/509292/how-to-set-user-subvol-rm-allowed-capability)
that is tricky to apply or with that delete script that seems as a better option.

Signed-off-by: Radoslav Kirilov <[email protected]>
@helm-bot helm-bot added Contribution Allowed If the contributor has signed the DCO or the CNCF CLA (prior to the move to a DCO). size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 20, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: smoke
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: cirocosta

If they are not already assigned, you can assign the PR to them by writing /assign @cirocosta in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 20, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @smoke. Thanks for your PR.

I'm waiting for a helm member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@smoke
Copy link
Contributor Author

smoke commented Mar 20, 2019

/assign @cirocosta

@smoke smoke changed the title fix(stable/concourse): properly cleanup btrfs subvolume and children [stable/concourse] properly cleanup btrfs subvolume and children Mar 20, 2019
@cirocosta
Copy link
Collaborator

Hi @smoke,

Thanks for the PR!


If I understood correctly, this is only needed in the specific use case where one uses btrfs within tasks; is that right?

I'm concerned that this might be a very specific use case, which could be covered instead with the possibility of allowing users of the Helm chart to supply their own initContainers.

Going with the approach I mentioned, you could have in your values.yaml:

worker:
  initContainers:
  - name: my-init-container
    image: concourse/concourse
    securityContext: { privileged: true }
    command: [ /bin/sh ]
    args:
      - -ce
      - |-
        for v in $(btrfs subvolume list --sort=-ogen /your-work-dir | awk '{print $9}'); do
          btrfs subvolume delete /your-work-dir/$v
        done
    volumeMounts:
      - name: concourse-work-dir
        mountPath: /your-work-dir

Wdyt?

Thanks!

@smoke
Copy link
Contributor Author

smoke commented Mar 20, 2019

@cirocosta Yes, you have got the case correctly and only if the values.yaml supported the option of having additional initContainers running prior the {{ template "concourse.worker.fullname" . }}-init-rm this would have worked, however this is not the case right now.
I have even tested with your suggested addition in values.yaml and it does not work.

I was wondering if prepending the initContainer conditionally would make more sense, but I am not sure if there is like a standard way to switch to btrfs.

For instance in my case we had to define a specific storage class, beside switching the baggageclaim driver resulting in the following:

values.yaml

concourse:
  worker:
    baggageclaim:
      driver: btrfs
worker:
  replicas: 2
persistence:
  worker:
    size: 150Gi
    storageClass: gp2-btrfs

storage-btrfs.yaml

## Create 'gp2-btrfs' storage class
## `kubectl apply -f storage-btrfs.yaml`
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: gp2-btrfs
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  fsType: btrfs
allowVolumeExpansion: true

and then in worker-stateful.yaml may be something like

      {{- if eq .Values.worker.baggageclaim.driver "btrfs" }}
      initContainers:
        - name: {{ template "concourse.worker.fullname" . }}-init-btrfs-subvolumes-remove
          {{- if .Values.imageDigest }}
          image: "{{ .Values.image }}@{{ .Values.imageDigest }}"
          {{- else }}
          image: "{{ .Values.image }}:{{ .Values.imageTag }}"
          {{- end }}
          imagePullPolicy: {{ .Values.imagePullPolicy | quote }}
          securityContext:
            privileged: true
          command:
            - /bin/sh
          args:
            - -ce
            - |-
              for v in $(btrfs subvolume list --sort=-ogen {{ .Values.concourse.worker.workDir }} | awk '{print $9}'); do
                btrfs subvolume delete {{ .Values.concourse.worker.workDir }}/$v
              done
          volumeMounts:
            - name: concourse-work-dir
              mountPath: {{ .Values.concourse.worker.workDir | quote }}
      {{- end }}

Wdyt?

@smoke
Copy link
Contributor Author

smoke commented Mar 27, 2019

@cirocosta Sorry to bother, but any thoughts on my previous comment?

@cirocosta
Copy link
Collaborator

Hi @smoke, thanks for the detailed reply!


however this is not the case right now.
I have even tested with your suggested addition in values.yaml and it does not work.

oh, sorry! I didn't mean to imply that it's something possible to do right now, that's indeed something that is not there.

I was wondering if prepending the initContainer conditionally would make more sense,

Yeah, that'd be much better indeed, as someone using overlay or even naive wouldn't need such initContainer 👍

It'd also be interesting to have this extra initContainer under the condition that checks if the rm -rf init container should even be used (

{{- if .Values.worker.cleanUpWorkDirOnStart }}
).

Wdyt?


Some more context - looking at a not so far away future, we should really not rely too much on having those initContainers for removing all of the content in the disk as Concourse itself supports the concept of landing workers.

Right now, one can use that through a combination of concourse.worker.shutdownSignal=SIGUSR1 (

## Signal to send to the worker container when shutting down.
## Possible values:
##
## - SIGUSR1: land the worker, and
## - SIGUSR2: retire the worker.
##
## Note.: using SIGUSR2 with persistence enabled implies the use of an
## initContainer that removes any data the existed previously under
## `concourse.worker.workDir` as the action of `retire`ing a worker implies
## that no state comes back with it when re-registering.
##
## Ref: https://concourse-ci.org/concourse-worker.html
## Ref: https://concourse-ci.org/worker-internals.html
##
shutdownSignal: SIGUSR2
) and worker.cleanUpWorkDirOnStart=false, which are not being set as defaults at the moment as we're not yet with all of the coverage of the edge cases that might happen for particular drivers.

Once we get that lifecycle better covered, we can definitely move more towards that 🙌

@smoke
Copy link
Contributor Author

smoke commented Mar 28, 2019

Yes it all makes sense and may be by then we can merge my PR as quick and easy and refactor accordingly in the bear future with the init containers.

@stale
Copy link

stale bot commented Apr 27, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2019
@stale
Copy link

stale bot commented May 11, 2019

This issue is being automatically closed due to inactivity.

@stale stale bot closed this May 11, 2019
@cirocosta
Copy link
Collaborator

Oh damn, this one should definitely not be closed - sorry for the long time! We've been quite busy these weeks, but we'll get back to this one shortly.

Sorry!

@cirocosta
Copy link
Collaborator

Hi, @smoke!

Sorry for the super super super slow turn over here 🐢, but we're finally getting this in 😁

Would you mind opening another PR with the original content? @taylorsilva and @pivotal-bin-ju added tests around this (see concourse/concourse#3997) after setting up a reproducible, and this would really help 👍

I was a bit hesitant in the beginning of having the btrfs subvolume list going on regardless of the driver, but now having the tests covering that and indeed not being a problem, I'd say we can leave it there as is (specially since one could be coming from btrfs to overlay; or simply using detect, where an if based on the driver would make the deletion not happen).

Thanks!!

@smoke
Copy link
Contributor Author

smoke commented Jun 12, 2019

@cirocosta I am on it!

@cirocosta
Copy link
Collaborator

Hi @smoke ,

I hope you don't mind, but I cherry-picked the commit and submitted a PR with it: #14941

It contains your git signature there - if you'd like me to remove or change it somehow, please let me know!

Thanks!

@smoke
Copy link
Contributor Author

smoke commented Jun 21, 2019

@cirocosta thanks for taking care of, I really appreciate it! Please move it forward as you see fit.
Sorry I didn't do it on time, I got stuck on some tough work emergencies.

Thanks again,
I am actively waiting for the fix to land to spin the next Concourse with btrfs!

@YoussB
Copy link
Collaborator

YoussB commented Jun 21, 2019

Hey @smoke, Actively reviewing the other PR, should hopefully get it in by EOD today.

@smoke smoke deleted the concourse-btrfs-init-rm branch April 22, 2020 08:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Contribution Allowed If the contributor has signed the DCO or the CNCF CLA (prior to the move to a DCO). lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants