Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If Kubelet Tries To Chown Files Any Time It Sees fsGroup Then That Can Catastrophically Impact Shared File Systems #93802

Closed
Mr-Howard-Roark opened this issue Aug 7, 2020 · 12 comments
Labels
kind/support Categorizes issue or PR as a support question. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@Mr-Howard-Roark
Copy link

What happened:
Based on the below references, I am concerned about the implications of including fsGroup in a pod that mounts shared file system volumes. Many companies have multiple workflows that utilize the same shared file system whether onprem, in the cloud, or in containers and if someone mounts a pod that utilizes a Volume for that shared file system and unwittingly includes fsGroup because he/she sees that you must do that to access IAM projected web identity tokens for example, if Kubelet chowns all of the files, that would potentially break processes across the company that utilize that share.

The purpose of this Issue is to confirm that the below references are correct that if fsGroup is included Kubelet will try to chown all of the files on the volume. I am unsure of whether that is true because when I use hostPath to mount CIFS and NFS shares, I don't see that happen. When I mount NFS Persistent Volumes I also don't see that happen (ie- the Pods run quickly and I don't see the file permissions change). When I mount CIFS Persistent Volumes, I do think I see that happen where Volumes timeout and cannot be mounted.

Overall, I have tried to understand this via the Kubernetes Discussion forum, StackOverflow, and calls with AWS Container representatives, and no one seems to know how this works. If that really is the case that it is trying to chown all of the files, I think that is very dangerous and it would impact my platform for my company where I would I guess need to use Gatekeeper to prevent people from including fsGroup when shared volumes are mounted.

Kubernetes Chowning Files References

https://docs.microsoft.com/en-us/azure/aks/troubleshooting

Since gid and uid are mounted as root or 0 by default. If gid or uid are set as non-root, for example 1000, Kubernetes will use chown to change all directories and files under that disk. This operation can be time consuming and may make mounting the disk very slow.

Kubernetes Chowning Files References
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods

By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod's securityContext when that volume is mounted. For large volumes, checking and changing ownership and permissions can take a lot of time, slowing Pod startup.

Environment:

  • Kubernetes version (use kubectl version): 1.15.11
  • Cloud provider or hardware configuration: AWS/EKS
  • OS (e.g: cat /etc/os-release): Amazon Linux 2
@Mr-Howard-Roark Mr-Howard-Roark added the kind/bug Categorizes issue or PR as related to a bug. label Aug 7, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 7, 2020
@Mr-Howard-Roark
Copy link
Author

/sig architecture

@k8s-ci-robot k8s-ci-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 7, 2020
@liggitt
Copy link
Member

liggitt commented Aug 8, 2020

I think this is covered by https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/695-skip-permission-change which is a feature in progress to allow controlling whether a permission change is attempted

/remove-sig architecture
/sig storage
/remove-kind bug
/kind support

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Aug 8, 2020
@k8s-ci-robot
Copy link
Contributor

@liggitt: The label(s) kind/support cannot be applied, because the repository doesn't have them

In response to this:

I think this is covered by https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/695-skip-permission-change which is a feature in progress to allow controlling whether a permission change is attempted

/remove-sig architecture
/sig storage
/remove-kind bug
/kind support

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. kind/bug Categorizes issue or PR as related to a bug. labels Aug 8, 2020
@liggitt liggitt added the kind/support Categorizes issue or PR as a support question. label Aug 8, 2020
@Mr-Howard-Roark
Copy link
Author

It seems like most people are focusing on the performance issues and occasionally similar to the issue you link to under Motivation they mention that one pod that mounts a shared file system volume with fsGroup could impact another pod that uses that volume; but, it doesn't seem like people are thinking about the very bad consequences for companies that rely on shared file systems throughout their organization. In those cases, if a developer mounts an existing shared filesystem used by his/her entire organization into his/her container via a PersistentVolume and includes fsGroup, seemingly the consequence will be that all of the processes throughout the organization will be impacted.

There are many github issues and docs that advise using fsGroup for various scenarios and they don't mention how dangerous setting this is for shared filesystem volumes. The AWS Docs themselves even advise setting fsGroup to be the nobody user so that non-root containers can read their projected service account token for pod IAM roles. In cases like that, a developer would correctly follow the doc and if he/she is also mounting an important nfs share, the result will be he/she will unwittingly changes the permissions to that share for the entire organization.

I am not sure why this sort of thing is not considered to be more dangerous if that's really how this works--one developer can include fsGroup and impact an entire shared filesystem for an organization.

@Mr-Howard-Roark
Copy link
Author

To make one last comment, if the below two items are true then why isn't this considered to be a more critical problem (mistake?):

  1. Kubelet tries to chown all of the files on a volume to reflect fsGroup
  2. If a process (ie- Kubelet) changes the permissions on files in a nfs mount, those changes will impact everything in a company that uses those shares

So, one developer could come across a tutorial that says to use fsGroup and include it (ie- to read a projected service account token) and the consequence will be that the entire organization's filesystem may become unreadable by many of their processes and people.

It seems like I must be misunderstanding something if the community doesn't consider this to be a critical problem. So, if I am, please let me know before I try to design a way to prevent developers from including fsGroup when they are also mounting nfs volumes.

@msau42
Copy link
Member

msau42 commented Oct 28, 2020

Right now fsgroup support is implemented and decided on per plugin. So exactly for the reasons you specify, fsgroup is ignored for shared filesystems like nfs but enabled for rwo volume types like cloud block disks.

Questions about your scenario:

  • Do all the containers across the different teams and environments need shared access to the entire share, or can it be constrained by subdirectories?
  • If they need access to the whole share, do they need writable access, or can it be read only?
  • If it needs writable, shared access all pods, what permissions are the containers running as today so that they can all read/write the same files?
  • Can you use something like PSP to restrict the uids/gids/fsgroup of the pods?

@kubernetes/sig-storage-misc

@Mr-Howard-Roark
Copy link
Author

Mr-Howard-Roark commented Oct 28, 2020

Thanks for your response! That is such a relief that fsgroup ignores nfs. Based on the behavior I see, I think it may not ignore the smb flex volume plugin we use to mount CIFS shares. Whenever a developer includes fsGroup in conjunction with a CIFS volume mounted via this plugin, the volume fails to mount due to a timeout. I think that the timeout is because Kubelet is trying to chown the volume's files. I am less nervous about this because I can't picture what it would mean to chown files on a Windows file share. However, it is still an issue because they can't use Pod IAM roles if they are also mounting CIFS shares (ie- they must include fsGroup to use pod IAM roles).

To answer your questions:

  • In the CIFS share world, each team mounts shares via their Service Accounts and then accesses whatever files their Service Accounts can access (ie- the flex volume plugin we're using has a field for a secret that contains Service Account creds). I do restrict them to use subpaths in some cases and should probably come up with a way to enforce that via GateKeeper. I think that teams mount entire shares largely because that's what they're used to doing onprem.
  • In the NFS share world, there is a mix of use cases. I do restrict mounts to be ReadOnly where possible
  • I can use something like PSP or Gatekeeper. I think that Developers use fsGroup without understanding it-- ie- for their Pod IAM Roles or because a community helm chart might use it or because they hit an error and see that adding fsGroup fixes it. So, I need to come up with a plan to ensure that I don't break current workflows or impede development. In the end though, it will never be ok for permissions to be dynamically changed on these shared file systems by a process running in Kubernetes, and so probably the GateKeeper rule would be as simple as "If fsGroup and xyz-pvc ; reject".

@gnufied
Copy link
Member

gnufied commented Oct 28, 2020

@Mr-Howard-Roark for flexvolume plugins - a plugin can opt-out of fsgroup based permission change by returning FSGroup: false. We have similar mechanism to opt-out of recursive permission change for csi plugins.

@Mr-Howard-Roark
Copy link
Author

Mr-Howard-Roark commented Oct 29, 2020

Ok thanks for sharing @gnufied. I was not aware of that and will try to make the adjustment to the flexvolume plugin we're using.

@gnufied
Copy link
Member

gnufied commented Oct 29, 2020

I am marking this as closed, hoping that k8s has enough hooks in place to ensure shared filesystems don't accidently get chwoned.

/close

@k8s-ci-robot
Copy link
Contributor

@gnufied: Closing this issue.

In response to this:

I am marking this as closed, hoping that k8s has enough hooks in place to ensure shared filesystems don't accidently get chwoned.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@nerzhul
Copy link

nerzhul commented Nov 24, 2021

i confirm we have the same issue on our side on 1.21 series. on a very very huge fileshare (millions of files) it chown for minutes, strace on kubelet process showed it.

newfstatat(AT_FDCWD, "/var/lib/kubelet/pods/2bde3207-157d-42cb-bdf0-152a1097330f/volumes/fstab~cifs/mnt-ventes-en-cours/VENTES/2-Apr-Jun/xxx/ADV/Zone_Echange_ADV/Folder Settings/background.jpg", {st_mode=S_IFREG|0777, st_size=68446, ...}, AT_SYMLINK_NOFOLLOW) = 0
fchownat(AT_FDCWD, "/var/lib/kubelet/pods/2bde3207-157d-42cb-bdf0-152a1097330f/volumes/fstab~cifs/mnt-ventes-en-cours/VENTES/2-Apr-Jun/xxx/ADV/Zone_Echange_ADV/Folder Settings/background.jpg", -1, 1000, AT_SYMLINK_NOFOLLOW) = 0
fchmodat(AT_FDCWD, "/var/lib/kubelet/pods/2bde3207-157d-42cb-bdf0-152a1097330f/volumes/fstab~cifs/mnt-ventes-en-cours/VENTES/2-Apr-Jun/xxx/ADV/Zone_Echange_ADV/Folder Settings/background.jpg", 0777) = 0
newfstatat(AT_FDCWD, "/var/lib/kubelet/pods/2bde3207-157d-42cb-bdf0-152a1097330f/volumes/fstab~cifs/mnt-ventes-en-cours/VENTES/2-Apr-Jun/xxx/ADV/Zone_Echange_ADV/Folder Settings/thumbs.db", {st_mode=S_IFREG|0777, st_size=4608, ...}, AT_SYMLINK_NOFOLLOW) = 0

patching our CIFS driver with FSGroup: false solved the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

6 participants