ship nfs-ganesha V6 in main and squid #2246

ktdreyer · 2024-10-02T14:20:02Z

nfs-ganesha V6.0 had issues in the squid container, referenced at #2244

Would you please ship the latest V6 version that has the fixes we need?

ktdreyer · 2024-10-02T14:33:50Z

CC @kalebskeithley

ffilz · 2024-10-02T20:19:19Z

So what exactly is the failure? Log messages?

dmick · 2024-10-02T21:00:04Z

I see the log message in rook/rook#14771

dmick · 2024-10-02T21:03:05Z

The suggested fix, as I understand it, was to add a configuration setting to ignore failure of the system call that could be used by users of nfs-ganesha. Again, as I understand it, this means adding something to the nfs-ganesha deploy for ceph.

I proposed another possible fix for the issue, which was to add parameters to the invocation of the nfs-ganesha container to allow the system call to penetrate the seccomp jail.

Both of these seem to require further changes to Ceph nfs-ganesha deployment before we can integrate v6.x. This leads me to further believe that there's been no testing of v6.x on containerized Ceph (by far the majority deployment strategy for Ceph upstream and the only one for downstream).

ffilz · 2024-10-02T22:09:36Z

It has been years since Ganesha has got meaningful testing from various adopters prior to a release tag, so it is not at all surprising that V6.0 has problems.

This issue has been identified. One fix went into downstream that was accepted for upstream, but before it was merged upstream, another adopter also hit the problem and proposed a different fix.

What has ultimately gone upstream into V6.1 is a fix agreed on by participants in a Ganesha community call last week. This fix does require a config change when operating in a non-privileged container. That config change allows Ganesha to try the prctl, and if it fails with EPERM, to issue a warning message but proceed to start.

The prctl call was added in Ganesha V6-dev.19.

The prctl call is necessary to prevent a kernel deadlock if there is a local NFS mount using Ganesha as the server. The deadlock can occur if pages need to be flushed via NFS to allow Ganesha to allocate pages to proceed. Since Ganesha is blocked waiting for free pages, it can not serve the request, via the NFS client, to flush the pages necessary to be able to allocate pages.

One thing that isn't clear is if Ganesha running in a container while some other container on the host has an NFS mount, or even a non-containerized NFS mount could also trigger the problem. That may well be the case.

If the prctl is required for containerized Ganesha, then either prctl needs to be allowed for that container, or maybe there's some way to do the prctl for the container as a whole.

In any case, setting the parameter to true will allow Ganesha to proceed without prctl success, and it will be in the same boat as it was pre-V6-dev.19.

dmick · 2024-10-02T22:33:14Z

Yep. So the new version will require a change in ceph deployment code before it can even run, and it's untested-in-Ceph code beyond that deployment change. (To be clear I'm talking about at least cephadm/ceph orchestrator; I don't even know if we currently support deploying on non-cephadm deploys.) This PR cannot be accepted until those changes, and perhaps at least some kind of testing, are complete.

github-actions · 2024-10-18T20:01:54Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

ktdreyer · 2024-10-25T13:11:35Z

Hi @guits

Would you please tell us if you've confirmed that nfs-ganesha V6.1 has the fixes we need in order to ship it in main and squid?
We have https://github.com/ceph/ceph/blob/squid/container/Containerfile now, so we'll need to make this change in that repo and this one. Is that right @dmick ?

dmick · 2024-10-25T20:15:12Z

@ktdreyer As far as I know it's still the case that we have to change something to add allow_set_io_flusher_fail=true to the nfs-ganesha config /etc/ganesha/ganesha.conf. I'm not exactly sure what agent is going to have to change to set this, probably cephadm, which already knows about ganesha.conf, but without it, the service will fail to start in a container because it can't do a prctl(PR_SET_IO_FLUSHER).

dmick · 2024-10-25T20:17:56Z

...and yes, of course, for CI builds we'll also need to change ceph.git's container/Containerfile

ffilz · 2024-10-26T00:04:21Z

Yes, nothing is changed there.

…

On Fri, Oct 25, 2024 at 1:15 PM Dan Mick ***@***.***> wrote: @ktdreyer <https://github.com/ktdreyer> As far as I know it's still the case that we have to change something to add allow_set_io_flusher_fail=true to the nfs-ganesha config /etc/ganesha/ganesha.conf. I'm not exactly sure what agent is going to have to change to set this, probably cephadm, which already knows about ganesha.conf, but without it, the service will fail to start in a container because it can't do a prctl(PR_SET_IO_FLUSHER). — Reply to this email directly, view it on GitHub <#2246 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADBQU536TLUQXRSTOXITNDZ5KRGNAVCNFSM6AAAAABPHZE4SGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZYG4YTIOJQGE> . You are receiving this because you commented.Message ID: ***@***.***>

ktdreyer · 2024-10-28T13:42:51Z

@ffilz , would you please open a ticket at https://tracker.ceph.com/ so that the cephadm developers understand what changes to make in Ceph's installer?

(Here's an example ticket you can use for inspiration: https://tracker.ceph.com/issues/65144 , it was under "Project: Orchestrator")

github-actions · 2024-11-12T20:02:07Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

ktdreyer assigned guits Oct 2, 2024

github-actions bot added the wontfix label Oct 18, 2024

ktdreyer removed the wontfix label Oct 25, 2024

github-actions bot added the wontfix label Nov 12, 2024

ktdreyer mentioned this issue Nov 13, 2024

add ceph-iscsi for ibm's 7 (reef) product version #2251

Open

ktdreyer removed the wontfix label Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ship nfs-ganesha V6 in main and squid #2246

ship nfs-ganesha V6 in main and squid #2246

ktdreyer commented Oct 2, 2024

ktdreyer commented Oct 2, 2024

ffilz commented Oct 2, 2024

dmick commented Oct 2, 2024

dmick commented Oct 2, 2024

ffilz commented Oct 2, 2024

dmick commented Oct 2, 2024

github-actions bot commented Oct 18, 2024

ktdreyer commented Oct 25, 2024

dmick commented Oct 25, 2024

dmick commented Oct 25, 2024

ffilz commented Oct 26, 2024 via email

ktdreyer commented Oct 28, 2024

github-actions bot commented Nov 12, 2024

ship nfs-ganesha V6 in main and squid #2246

ship nfs-ganesha V6 in main and squid #2246

Comments

ktdreyer commented Oct 2, 2024

ktdreyer commented Oct 2, 2024

ffilz commented Oct 2, 2024

dmick commented Oct 2, 2024

dmick commented Oct 2, 2024

ffilz commented Oct 2, 2024

dmick commented Oct 2, 2024

github-actions bot commented Oct 18, 2024

ktdreyer commented Oct 25, 2024

dmick commented Oct 25, 2024

dmick commented Oct 25, 2024

ffilz commented Oct 26, 2024 via email

ktdreyer commented Oct 28, 2024

github-actions bot commented Nov 12, 2024