Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NFSv4 mounts not working on Flatcar version 3139.2.0 #711

Open
peters5 opened this issue Apr 8, 2022 · 6 comments
Open

NFSv4 mounts not working on Flatcar version 3139.2.0 #711

peters5 opened this issue Apr 8, 2022 · 6 comments
Labels
kind/bug Something isn't working

Comments

@peters5
Copy link

peters5 commented Apr 8, 2022

Description

In the latest Flatcar stable release 3139.2.0 it is impossible to mount NFSv4 shares. Originally this behavior was discovered when our Kubernetes cluster stopped working after the Flatcar upgrade. We are using NFS for persistent volumes and all the volumes could not be mounted. When analyzing the problem it turned out that this issue is not related to Kubernetes, but that the underlying host system (Flatcar) can not execute the mount command successfully. It just hangs forever and does not show any output.

Impact

NFSv4.1/4.2 shares can't be mounted which crashes the Kubernetes cluster.

Environment and steps to reproduce

  1. Set-up: Have a machine running with Flatcar stable 3139.2.0 (platform x86_64, linux kernel 5.15.32)
  2. Task: Mount an NFS share with mount command
  3. Action(s):
    a. Execute sudo mount -t nfs 10.10.10.2:/store1-k8s /opt/test-mount
  4. Error: Command never finishes / hangs forever without any output

Expected behavior

mount command should finish successfully and files of the NFS share should be visible under /opt/test-mount.

Additional information

It seems that this issue only applies to NFS v4.1 and v4.2. v4.2 is the default when running mount -t nfs. If using mount -t nfs -o vers=4.1 it is also not working. However if I specify vers=4.0 it mounts the share as expected. Using NFSv3 also works as expected. If Flatcar is downgraded to the previous stable version 3033.2.4 NFS shares can mounted with v4.2 successfully.

It's also worth noting that I found this bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006518
This report describes exactly the same behavior. I could not determine if Flatcar version 3139.2.0 is using the package versions as shown in the bug report. But I really wonder why you cannot find any more information about this bug as the usage of NFS shares should be quite common.

I would really appreciate if someone could also reproduce this error or even could find a solution. The current workaround would be to either downgrade Flatcar to the previous stable version or force using NFS v4.0.

@peters5 peters5 added the kind/bug Something isn't working label Apr 8, 2022
@comphilip
Copy link

comphilip commented Apr 9, 2022

@peters5 I have the same issue after upgrade to 3139.2. I found more information.

There are several NFS servers. One is QNAP device with version QTS 5.0.0.1891 build 20211221, kernel version: 5.10.60. The others are Ubuntu 20.10 LTS.

Kubernetes pods work fine with vers=4.2 to Ubuntu server, but stuck on QNAP. QNAP works well with (vers=4.0 and vers=3). So it may be NFS sever issue. (I turned off NFS 4 on QNAP)

@jepio
Copy link
Member

jepio commented Apr 9, 2022

Thanks for the QNAP suggestion. I find the following reports of a potential upstream kernel issue (not sure if it's considered a regression yet).

For the time being I would recommend explicitly forcing nfs 4.0 if possible, while upstream figures out a way forward.

@peters5
Copy link
Author

peters5 commented Apr 10, 2022

The QNAP finding is indeed a very good point, thanks @comphilip. We are also using QNAP as the NFS server. The QNAP device has QTS 4.5.2.1594 (which is pretty old).

Also thanks a lot for the links to the kernel lists @jepio, I did not find these before. Forcing NFS 4.0 would be the way to go I guess until this is fixed in kernel or maybe even on QNAP side.

@cpswan
Copy link
Contributor

cpswan commented Apr 12, 2022

I'm finding NFS with GCP NetApp Cloud Volumes is as unreliable as usual (random mount failures at reboot) after upgrading to 3139.2.0. My options string is Options=rw,hard,rsize=65536,wsize=65536,vers=4.1,tcp

I'm presently trying to test .automount + .mount to see if that's any more reliable, but being thwarted by login issues caused by #714

@bitfisher
Copy link

Same problem here :(
Thank's for all the info you provided!

@cpswan
Copy link
Contributor

cpswan commented Apr 20, 2022

In case it helps anyone else here's how I'm using .mount and .automount together for more reliable NFS mounting:

    - name: nfs.mount
      contents: |
        [Unit]
        Description=NFS mount
        [Mount]
        What=my.nfs.server:/my-nfs-mountpoint
        Where=/nfs
        Type=nfs
        Options=rw,hard,rsize=65536,wsize=65536,vers=4.1,tcp
    - name: nfs.automount
      enable: true
      contents: |
        [Unit]
        Description=NFS automount
        Requires=network-online.target
        [Automount]
        Where=/nfs
        TimeoutIdleSec=5m
        [Install]
        WantedBy=multi-user.target

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Development

No branches or pull requests

5 participants