-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug:1421649] When using a fuse mount for client, EC volumes do not mount. #924
Comments
Time: 20170213T10:58:51 |
Time: 20170213T13:58:58 |
Time: 20170213T14:40:51 This new /run/gluster/lib directory should be added to extras/run-gluster.tmpfiles.in so that it is created upon boot. There also needs to be a selinux-policy update (most likely) to enforce the correct context on the directory. |
Time: 20170213T14:55:42 I think the location of the generated files needs to be detected during runtime:
For non-root, the directory needs to be created on demand, and have the correct SELinux context set after that (type:bin_t)? |
Time: 20170216T07:23:16 There's already a patch to avoid failures when mmap() or anything else related to the dynamic code generation fails (https://bugzilla.redhat.com/show_bug.cgi?id=1421955). It simply falls back to the precompiled code instead of causing fatal failures. This is not the best solution but allows disperse to be used, at least until a proper solution for the selinux problem is found and implemented. It's also a good solution for any possible future change or OS problem that could cause this to fail again. Another workaround is to manually disable dynamic code generation by using option disperse.cpu-extensions = none. |
Time: 20170216T07:35:04
Additionally we also need to make sure that these are clearly documented in the release notes for 3.10. |
Time: 20170216T11:31:30
|
Time: 20170216T14:45:35
If https://bugzilla.redhat.com/show_bug.cgi?id=1421955 is taken in, what additional release notes are required? I need them ASAP so that the same can be updated. Further, removing this from the 3.10 blocker list as per the discussion here. |
Time: 20170217T16:48:07
I thought it would be better to add a note about this fallback mechanism which we do internally instead of explicitly setting the specified volume set option. As you know I am not an expert in EC. I will leave it for Xavi to have a final word. Xavi, |
Time: 20170220T07:20:18 Not sure how to word it or how many details to tell. |
Time: 20170220T10:54:53
Basically because the amount of possible combinations that may be needed to encode/decode the data is too big (in the order of tens of thousands). On normal circumstances, only a bunch of combinations are really needed, but in relatively big configurations, things can fail in multiple ways and all of them need to be supported though not at the same time (as an example, in a 16+4 configuration, there are 4845 different ways in which 4 of the 20 bricks can fail, and each combination needs a different matrix to be computed). It can be done without dynamic code (it has been working this way till recently) using more generic functions, but the performance is significantly worse. The addition of SSE and AVX support makes this problem even worse because it would need to build specific functions for each of the possible extensions for each combination.
I don't think so. The code doesn't really need to be stored anywhere. In a previous patch it was created only on memory but, because of selinux, it needed to be stored into a file. The file is volatile though: the code creates the file and then immediately deletes it, keeping only an open fd. The generated code only needs to be there while the process is running. Since creating all possible combinations would be impractical, only the needed combinations are created as they are needed on run-time, so we cannot create a file with all the code and then revoke privileges.
Well, solution in comment 5 basically disables dynamic code generation if something fails, so we are at the starting point again. The performance benefits are lost, though it's true that it works in all cases. The detailed problem is this: First solution: An mmap() call was used to create an anomymous memory area where dynamic code was generated. It had read/write/execute privileges and it was not backed by any file. The problem here is that selinux forbids this. Second solution: Looking at how some internet browsers solve this problem (they generate dynamic code for javascript), I tried to use the double mmap() approach, that also seems to be the recommended solution to support selinux. With this approach I need a backing file mmapped to two distinct memory regions, one with read/write access and another one with read/execute access. This works fine and selinux really allows this, but the problem here is that the backing file must have the bin_t selinux's type. Otherwise selinux doesn't allow the file to be mmaped to an executable region. With the help of Anoop, we found that /usr/libexec/glusterfs has this type set, so we used it and everything worked fine with selinux activated. The problem here (I didn't know that) is that this directory can be read-only in some cases, so it cannot be used. Also, it requires root privileges, while some gfapi applications may not run as root. So the questions are:
Thanks |
Time: 20170314T15:48:26 |
Time: 20170314T19:36:12
If that is acceptable, we should make that happen. Siddarth, what is your take on this? |
Time: 20170315T04:29:47
looking at other places, there doesnt seem to be a place which is recommended for such type of files. So if this file does not remain for long time I am OK with tweaking selinux policy to do so. |
Time: 20170315T09:36:25 If it's ok to create an selinux policy that allows the process to create that file in an already existing directory (this directory should be writable by the owner of the process, not necessarily root. Maybe /tmp ?), that's fine to me. If the best place to put that file is inside /run/usr/$gid (is it really gid or uid ?), I can write the necessary code. In this case, would we need to create a /run/usr/$gid/glusterfs ? or we can directly use /run/usr/$gid ? Does /run/usr/$gid exist in all distributions ? |
Time: 20170315T11:05:12
/run/usr/$gid is created by pam, and it is not persistent. so the problem with writing such file with predictable filename to /tmp is that it will become vulnerable to symlink attack. So I am not in favor of it being written to /tmp. |
Time: 20170315T11:28:51
This would be /run/user/$uid. Note that it is "user", not "usr" and "$uid" not "$gid". Modern distributions (Fedora, CentOS 7, all systemd ones?) have this directory. It is not available on CentOS 6 though, and that is also a target we aim to support as a Gluster community. We can do some ./configure stuff to select different directories, but it still requires the correct SELinux context (bin_t) to be set too. Depending on the directory that gets picked for a distribution, the SELinux policy needs those modifications. Suggestions for a non-systemd distribution directory welcome. |
Time: 20170317T15:48:48 |
Time: 20170317T16:38:37
bin_t is used to identify a file as an ordinary program type. If this is shell script then it should be shell_exec_t. |
Time: 20170320T07:27:54
The dynamic code here is used because the number of combinations that would be needed grows easily to thousands. However in normal circumstances only a small subset is used and tends to remain stable until some event happens (brick failure or similar). The program mmaps regions in blocks of 64KB to store multiple fragments of code. However the fragments are generated as they are needed. At the same time some other fragments might be executing, so the mprotect() solution is not practical here. Using an independent mmap() for each fragment of code seems a waste of memory to me (normally each fragment of code requires few hundreds of bytes). (In reply to Simon Sekidde from comment #19)
The dynamic code generated and stored into the temporary backend file is machine code, not shell code. The bin_t type works fine here. |
Time: 20181031T13:10:45 Is it ok to use /run/gluster/bin (for example) to store dynamic code for root user and /run/user/$uid/gluster for other users ? If so, do I need to also update https://github.com/gluster/glusterfs-selinux ? If that's the correct option, I'll implement it. |
Time: 20181112T13:26:17
Siddarth or Simon, could you answer this question? |
Time: 20181120T08:22:24 |
Time: 20191118T07:00:38 Are we going to address this any time soon? I remember that we have the fall back mechanism implemented but wanted to fix the real problem by putting dynamic code satisfying SELinux needs. |
Time: 20191122T14:04:42 The change is quite simple though. Only select a different path depending on the user (and maybe creating some directory), so if someone wants to take it, I'll be happy to help. |
Thank you for your contributions. |
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it. |
URL: https://bugzilla.redhat.com/1421649
Creator: nigelb at redhat
Time: 20170213T10:54:56
We recently fixed bug 1402661 by using the /usr/libexec/glusterfs folder for creating the executable files. This folder is created when you have the glusterfs-server package installed.
However, when you only have the client installed, it doesn't create the /usr/libexec/glusterfs folder. Therefore, disperse volumes cannot be mounted correctly. This is from the mount logs on the client.
[2017-02-13 09:53:14.583515] I [MSGID: 100030] [glusterfsd.c:2460:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.11dev (args: /usr/sbin/glusterfs --volfile-server=172.19.2.81 --volfile-id=/testvol_dispersed /mnt/testvol_dispersed_glusterfs)
[2017-02-13 09:53:14.593506] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-02-13 09:53:14.597531] I [MSGID: 122067] [ec-code.c:1025:ec_code_detect] 0-testvol_dispersed-disperse-0: Using 'sse' CPU extensions
[2017-02-13 09:53:14.597580] E [MSGID: 122074] [ec-code.c:425:ec_code_space_create] 0-testvol_dispersed-disperse-0: Unable to create a temporary file for the ec dynamic code [No such file or directory]
[2017-02-13 09:53:14.597598] E [MSGID: 122073] [ec.c:654:init] 0-testvol_dispersed-disperse-0: Failed to initialize matrix management [No such file or directory]
[2017-02-13 09:53:16.597684] E [MSGID: 101019] [xlator.c:503:xlator_init] 0-testvol_dispersed-disperse-0: Initialization of volume 'testvol_dispersed-disperse-0' failed, review your volfile again
[2017-02-13 09:53:16.597723] E [MSGID: 101066] [graph.c:324:glusterfs_graph_init] 0-testvol_dispersed-disperse-0: initializing translator failed
[2017-02-13 09:53:16.597770] E [MSGID: 101176] [graph.c:680:glusterfs_graph_activate] 0-graph: init failed
[2017-02-13 09:53:16.598251] W [glusterfsd.c:1329:cleanup_and_exit] (-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x3c1) [0x7fbf765006f1] -->/usr/sbin/glusterfs(glusterfs_process_volfp+0x1b1) [0x7fbf764fa8c1] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7fbf764f9dfb] ) 0-: received signum (1), shutting down
[2017-02-13 09:53:16.598281] I [fuse-bridge.c:5802:fini] 0-fuse: Unmounting '/mnt/testvol_dispersed_glusterfs'.
[2017-02-13 09:53:16.598575] W [glusterfsd.c:1329:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7fbf74e61dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7fbf764f9f85] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7fbf764f9dfb] ) 0-: received signum (15), shutting down
The text was updated successfully, but these errors were encountered: