Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EFS /home in parallel cluster #2344

Closed
andrei-xdlab opened this issue Dec 30, 2020 · 6 comments
Closed

EFS /home in parallel cluster #2344

andrei-xdlab opened this issue Dec 30, 2020 · 6 comments

Comments

@andrei-xdlab
Copy link

We have a requirement for a large persistent /home EFS filesystem + AWS backup shared across cluster nodes. I created custom AMI (2.10.1) and able to mount /home access points but pcluster deployment is failing due to "nfs_export failure" (see cfg-init.log snippet below). I believe pcluster is trying to re-export EFS NFS /home and build is failing. How do I disable /home nfs export in template to allow successful build?

cfn-init.log

Error executing action create on resource 'nfs_export[/home]'

Mixlib::ShellOut::ShellCommandFailed

execute[exportfs] (/etc/chef/local-mode-cache/cache/cookbooks/nfs/providers/export.rb line 43) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of exportfs -ar ----
STDOUT:
STDERR: exportfs: /etc/exports [1]: Neither 'subtree_check' or 'no_subtree_check' specified for export "172.31.0.0/16:/shared".
Assuming default behaviour ('no_subtree_check').
NOTE: this default has changed since nfs-utils version 1.0.x

exportfs: /etc/exports [2]: Neither 'subtree_check' or 'no_subtree_check' specified for export "172.31.0.0/16:/home".
Assuming default behaviour ('no_subtree_check').
NOTE: this default has changed since nfs-utils version 1.0.x

exportfs: /home requires fsid= for NFS export
---- End output of exportfs -ar ----
Ran exportfs -ar returned 1

Cookbook Trace:

/etc/chef/local-mode-cache/cache/cookbooks/nfs/providers/export.rb:73:in `block in class_from_file'

Resource Declaration:

In /etc/chef/local-mode-cache/cache/cookbooks/aws-parallelcluster/recipes/head_node_base_config.rb

130: nfs_export "/home" do
131: network node['cfncluster']['ec2-metadata']['vpc-ipv4-cidr-blocks']
132: writeable true
133: options ['no_root_squash']
134: end
135:

@tilne
Copy link
Contributor

tilne commented Dec 31, 2020

Hi @andrei-xdlab,

Why not use the cluster config to mount the EFS drive on all the cluster nodes instead of doing it via fstab in the AMI? This would likely require mounting the EFS drive on a directory other than /home, but it would avoid the need for any customization.

In theory, you could accomplish this without using the config would be by using a pre-install script to modify the portions of the ParallelCluster cookbook recipes in-place so that the head node doesn't attempt to export the home directory, and the compute nodes don't attempt to mount it.

@andrei-xdlab
Copy link
Author

Hi @tilne

Currently, parallel cluster doesn't support EFS access points natively, so we are using fstab in the AMI to mount /home access point and set default user permissions. I am able to use cluster config to mount EFS filesystem on the a directory other than /home. But we have a customer that requires a large /home that would persist after pcluster delete operations. We also have to backup /home and EFS is best option for that. Can you suggest which ParallelCluster cookbook recipes have to be modified via per-install script in order to accomplish our goal?

Do you plan to introduce EFS /home option for parallel cluster in the future? We are trying to complete replace EBS and NFS running on the master node with AWS shared filesystem solutions (EFS and FSx) to avoid performance bottleneck.

Happy New Year!

@tilne
Copy link
Contributor

tilne commented Dec 31, 2020

Can you suggest which ParallelCluster cookbook recipes have to be modified via per-install script in order to accomplish our goal?

In the current version, the recipes that need to be modified will be compute_base_config.rb and head_node_base_config.rb in the directory /etc/chef/cookbooks/aws-parallelcluster/recipes. You'd want to comment out this resource in the head node recipe and this resource in the compute node recipe. To be clear: I haven't tested this. I can't think of anything off the top of my head that this will break, but it's definitely possible there's something. (Actually, one consequence will be that the home directory containing the public SSH key specified via the config file's key_name will be covered up by the EFS mount, but I'm assuming you're aware of that.)

Do you plan to introduce EFS /home option for parallel cluster in the future? We are trying to complete replace EBS and NFS running on the master node with AWS shared filesystem solutions (EFS and FSx) to avoid performance bottleneck.

I'm not aware of any definitive plans, but I believe it's been requested before and we're tracking that internally.

Happy New Year!

Same to you! 🥂

@rkarnik-kymeratx
Copy link

I use this kind of EFS-backed home directory, but have it mounted at /users instead.

@enrico-usai
Copy link
Contributor

I'm going to close this ticket in favour of:

@github-actions
Copy link
Contributor

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants