If Kubelet Tries To Chown Files Any Time It Sees fsGroup Then That Can Catastrophically Impact Shared File Systems #93802

Mr-Howard-Roark · 2020-08-07T20:56:16Z

What happened:
Based on the below references, I am concerned about the implications of including fsGroup in a pod that mounts shared file system volumes. Many companies have multiple workflows that utilize the same shared file system whether onprem, in the cloud, or in containers and if someone mounts a pod that utilizes a Volume for that shared file system and unwittingly includes fsGroup because he/she sees that you must do that to access IAM projected web identity tokens for example, if Kubelet chowns all of the files, that would potentially break processes across the company that utilize that share.

The purpose of this Issue is to confirm that the below references are correct that if fsGroup is included Kubelet will try to chown all of the files on the volume. I am unsure of whether that is true because when I use hostPath to mount CIFS and NFS shares, I don't see that happen. When I mount NFS Persistent Volumes I also don't see that happen (ie- the Pods run quickly and I don't see the file permissions change). When I mount CIFS Persistent Volumes, I do think I see that happen where Volumes timeout and cannot be mounted.

Overall, I have tried to understand this via the Kubernetes Discussion forum, StackOverflow, and calls with AWS Container representatives, and no one seems to know how this works. If that really is the case that it is trying to chown all of the files, I think that is very dangerous and it would impact my platform for my company where I would I guess need to use Gatekeeper to prevent people from including fsGroup when shared volumes are mounted.

Kubernetes Chowning Files References

https://docs.microsoft.com/en-us/azure/aks/troubleshooting

Since gid and uid are mounted as root or 0 by default. If gid or uid are set as non-root, for example 1000, Kubernetes will use chown to change all directories and files under that disk. This operation can be time consuming and may make mounting the disk very slow.

Kubernetes Chowning Files References
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods

By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod's securityContext when that volume is mounted. For large volumes, checking and changing ownership and permissions can take a lot of time, slowing Pod startup.

Environment:

Kubernetes version (use kubectl version): 1.15.11
Cloud provider or hardware configuration: AWS/EKS
OS (e.g: cat /etc/os-release): Amazon Linux 2

The text was updated successfully, but these errors were encountered:

Mr-Howard-Roark · 2020-08-07T21:21:23Z

/sig architecture

liggitt · 2020-08-08T00:23:14Z

I think this is covered by https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/695-skip-permission-change which is a feature in progress to allow controlling whether a permission change is attempted

/remove-sig architecture
/sig storage
/remove-kind bug
/kind support

k8s-ci-robot · 2020-08-08T00:23:16Z

@liggitt: The label(s) kind/support cannot be applied, because the repository doesn't have them

In response to this:

I think this is covered by https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/695-skip-permission-change which is a feature in progress to allow controlling whether a permission change is attempted

/remove-sig architecture
/sig storage
/remove-kind bug
/kind support

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mr-Howard-Roark · 2020-08-08T17:09:48Z

It seems like most people are focusing on the performance issues and occasionally similar to the issue you link to under Motivation they mention that one pod that mounts a shared file system volume with fsGroup could impact another pod that uses that volume; but, it doesn't seem like people are thinking about the very bad consequences for companies that rely on shared file systems throughout their organization. In those cases, if a developer mounts an existing shared filesystem used by his/her entire organization into his/her container via a PersistentVolume and includes fsGroup, seemingly the consequence will be that all of the processes throughout the organization will be impacted.

There are many github issues and docs that advise using fsGroup for various scenarios and they don't mention how dangerous setting this is for shared filesystem volumes. The AWS Docs themselves even advise setting fsGroup to be the nobody user so that non-root containers can read their projected service account token for pod IAM roles. In cases like that, a developer would correctly follow the doc and if he/she is also mounting an important nfs share, the result will be he/she will unwittingly changes the permissions to that share for the entire organization.

I am not sure why this sort of thing is not considered to be more dangerous if that's really how this works--one developer can include fsGroup and impact an entire shared filesystem for an organization.

Mr-Howard-Roark · 2020-08-10T20:42:58Z

To make one last comment, if the below two items are true then why isn't this considered to be a more critical problem (mistake?):

Kubelet tries to chown all of the files on a volume to reflect fsGroup
If a process (ie- Kubelet) changes the permissions on files in a nfs mount, those changes will impact everything in a company that uses those shares

So, one developer could come across a tutorial that says to use fsGroup and include it (ie- to read a projected service account token) and the consequence will be that the entire organization's filesystem may become unreadable by many of their processes and people.

It seems like I must be misunderstanding something if the community doesn't consider this to be a critical problem. So, if I am, please let me know before I try to design a way to prevent developers from including fsGroup when they are also mounting nfs volumes.

msau42 · 2020-10-28T01:58:58Z

Right now fsgroup support is implemented and decided on per plugin. So exactly for the reasons you specify, fsgroup is ignored for shared filesystems like nfs but enabled for rwo volume types like cloud block disks.

Questions about your scenario:

Do all the containers across the different teams and environments need shared access to the entire share, or can it be constrained by subdirectories?
If they need access to the whole share, do they need writable access, or can it be read only?
If it needs writable, shared access all pods, what permissions are the containers running as today so that they can all read/write the same files?
Can you use something like PSP to restrict the uids/gids/fsgroup of the pods?

@kubernetes/sig-storage-misc

Mr-Howard-Roark · 2020-10-28T17:14:20Z

Thanks for your response! That is such a relief that fsgroup ignores nfs. Based on the behavior I see, I think it may not ignore the smb flex volume plugin we use to mount CIFS shares. Whenever a developer includes fsGroup in conjunction with a CIFS volume mounted via this plugin, the volume fails to mount due to a timeout. I think that the timeout is because Kubelet is trying to chown the volume's files. I am less nervous about this because I can't picture what it would mean to chown files on a Windows file share. However, it is still an issue because they can't use Pod IAM roles if they are also mounting CIFS shares (ie- they must include fsGroup to use pod IAM roles).

To answer your questions:

In the CIFS share world, each team mounts shares via their Service Accounts and then accesses whatever files their Service Accounts can access (ie- the flex volume plugin we're using has a field for a secret that contains Service Account creds). I do restrict them to use subpaths in some cases and should probably come up with a way to enforce that via GateKeeper. I think that teams mount entire shares largely because that's what they're used to doing onprem.
In the NFS share world, there is a mix of use cases. I do restrict mounts to be ReadOnly where possible
I can use something like PSP or Gatekeeper. I think that Developers use fsGroup without understanding it-- ie- for their Pod IAM Roles or because a community helm chart might use it or because they hit an error and see that adding fsGroup fixes it. So, I need to come up with a plan to ensure that I don't break current workflows or impede development. In the end though, it will never be ok for permissions to be dynamically changed on these shared file systems by a process running in Kubernetes, and so probably the GateKeeper rule would be as simple as "If fsGroup and xyz-pvc ; reject".

gnufied · 2020-10-28T19:55:25Z

@Mr-Howard-Roark for flexvolume plugins - a plugin can opt-out of fsgroup based permission change by returning FSGroup: false. We have similar mechanism to opt-out of recursive permission change for csi plugins.

Mr-Howard-Roark · 2020-10-29T19:45:52Z

Ok thanks for sharing @gnufied. I was not aware of that and will try to make the adjustment to the flexvolume plugin we're using.

gnufied · 2020-10-29T20:02:02Z

I am marking this as closed, hoping that k8s has enough hooks in place to ensure shared filesystems don't accidently get chwoned.

/close

k8s-ci-robot · 2020-10-29T20:02:16Z

@gnufied: Closing this issue.

In response to this:

I am marking this as closed, hoping that k8s has enough hooks in place to ensure shared filesystems don't accidently get chwoned.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

nerzhul · 2021-11-24T15:37:31Z

i confirm we have the same issue on our side on 1.21 series. on a very very huge fileshare (millions of files) it chown for minutes, strace on kubelet process showed it.

newfstatat(AT_FDCWD, "/var/lib/kubelet/pods/2bde3207-157d-42cb-bdf0-152a1097330f/volumes/fstab~cifs/mnt-ventes-en-cours/VENTES/2-Apr-Jun/xxx/ADV/Zone_Echange_ADV/Folder Settings/background.jpg", {st_mode=S_IFREG|0777, st_size=68446, ...}, AT_SYMLINK_NOFOLLOW) = 0
fchownat(AT_FDCWD, "/var/lib/kubelet/pods/2bde3207-157d-42cb-bdf0-152a1097330f/volumes/fstab~cifs/mnt-ventes-en-cours/VENTES/2-Apr-Jun/xxx/ADV/Zone_Echange_ADV/Folder Settings/background.jpg", -1, 1000, AT_SYMLINK_NOFOLLOW) = 0
fchmodat(AT_FDCWD, "/var/lib/kubelet/pods/2bde3207-157d-42cb-bdf0-152a1097330f/volumes/fstab~cifs/mnt-ventes-en-cours/VENTES/2-Apr-Jun/xxx/ADV/Zone_Echange_ADV/Folder Settings/background.jpg", 0777) = 0
newfstatat(AT_FDCWD, "/var/lib/kubelet/pods/2bde3207-157d-42cb-bdf0-152a1097330f/volumes/fstab~cifs/mnt-ventes-en-cours/VENTES/2-Apr-Jun/xxx/ADV/Zone_Echange_ADV/Folder Settings/thumbs.db", {st_mode=S_IFREG|0777, st_size=4608, ...}, AT_SYMLINK_NOFOLLOW) = 0

patching our CIFS driver with FSGroup: false solved the issue

Mr-Howard-Roark added the kind/bug Categorizes issue or PR as related to a bug. label Aug 7, 2020

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 7, 2020

k8s-ci-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 7, 2020

k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Aug 8, 2020

k8s-ci-robot removed sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. kind/bug Categorizes issue or PR as related to a bug. labels Aug 8, 2020

liggitt added the kind/support Categorizes issue or PR as a support question. label Aug 8, 2020

k8s-ci-robot closed this as completed Oct 29, 2020

This was referenced Mar 16, 2021

Move container level securityContext to pod level Alluxio/alluxio#13061

Merged

Alluxio short-circuit worker volume is not writable when using Helm chart Alluxio/alluxio#13096

Closed

jandubois mentioned this issue Aug 13, 2021

Bitnami Postgres doesn't deploy in Rancher Desktop due to volume mount permissions rancher-sandbox/rancher-desktop#502

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If Kubelet Tries To Chown Files Any Time It Sees fsGroup Then That Can Catastrophically Impact Shared File Systems #93802

If Kubelet Tries To Chown Files Any Time It Sees fsGroup Then That Can Catastrophically Impact Shared File Systems #93802

Mr-Howard-Roark commented Aug 7, 2020

Mr-Howard-Roark commented Aug 7, 2020

liggitt commented Aug 8, 2020

k8s-ci-robot commented Aug 8, 2020

Mr-Howard-Roark commented Aug 8, 2020

Mr-Howard-Roark commented Aug 10, 2020

msau42 commented Oct 28, 2020

Mr-Howard-Roark commented Oct 28, 2020 •

edited

Loading

gnufied commented Oct 28, 2020

Mr-Howard-Roark commented Oct 29, 2020 •

edited

Loading

gnufied commented Oct 29, 2020

k8s-ci-robot commented Oct 29, 2020

nerzhul commented Nov 24, 2021

If Kubelet Tries To Chown Files Any Time It Sees fsGroup Then That Can Catastrophically Impact Shared File Systems #93802

If Kubelet Tries To Chown Files Any Time It Sees fsGroup Then That Can Catastrophically Impact Shared File Systems #93802

Comments

Mr-Howard-Roark commented Aug 7, 2020

Mr-Howard-Roark commented Aug 7, 2020

liggitt commented Aug 8, 2020

k8s-ci-robot commented Aug 8, 2020

Mr-Howard-Roark commented Aug 8, 2020

Mr-Howard-Roark commented Aug 10, 2020

msau42 commented Oct 28, 2020

Mr-Howard-Roark commented Oct 28, 2020 • edited Loading

gnufied commented Oct 28, 2020

Mr-Howard-Roark commented Oct 29, 2020 • edited Loading

gnufied commented Oct 29, 2020

k8s-ci-robot commented Oct 29, 2020

nerzhul commented Nov 24, 2021

Mr-Howard-Roark commented Oct 28, 2020 •

edited

Loading

Mr-Howard-Roark commented Oct 29, 2020 •

edited

Loading