Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorten temporary files lifetime, various fixes/refactoring #155

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

Siborgium
Copy link

@Siborgium Siborgium commented Jan 3, 2023

This PR is a mess I have to apologize for. Unfortunately, rebasing with proper history rewrite is not a viable option, because a lot of changes are intertwined across multiple commits. I suggest you squash it instead of simply merging/rebasing.

The problem I initally wanted to solve is wl-copy leaking clipboard contents when terminated. This occurs e.g. when doing C-c on hung wl-copy spawned from Kakoune text editor (see Issues). On my way I discovered a lot more ways to break stuff here and there, so it ended up like this.

Now the temporary file life is much shorter: the only reason we need temporary file present in filesystem is xdg-mime mimetype detection, so the file is unlinked immediately once it's of no use. I've put some effort into protecting it against signals, but it's just bare minimum as I'm convinced we could replace xdg-mime with gmime or another mimetype library (there are too few admittedly), eliminating the need to create in-filesystem temporary files.

Other than that it's mostly refactoring to make the code easier to understand & reason about.

@bugaevc
Copy link
Owner

bugaevc commented Jan 4, 2023

Hi!

This PR is a mess I have to apologize for. Unfortunately, rebasing with proper history rewrite is not a viable option, because a lot of changes are intertwined across multiple commits.

So this indeed looks rather complicated...

Now the temporary file life is much shorter: the only reason we need temporary file present in filesystem is xdg-mime mimetype detection, so the file is unlinked immediately once it's of no use.

Indeed, that sounds like a good idea. But it would seem that this could be done much simpler, by unlinking the file immediately after figuring out its MIME type and only storing the fd.

Here's a sketch of what it could look like, what do you think?

Patch
diff --git a/src/types/copy-action.h b/src/types/copy-action.h
index c0a1290..babef84 100644
--- a/src/types/copy-action.h
+++ b/src/types/copy-action.h
@@ -40,8 +40,9 @@ struct copy_action {
 
     /* Exactly one of these fields must be non-null if the source
      * is non-null, otherwise all these fields must be null.
+     * The null value for fd_to_copy_from is -1.
      */
-    const char *file_to_copy;
+    int fd_to_copy_from;
     argv_t argv_to_copy;
     struct {
         const char *ptr;
diff --git a/src/types/copy-action.c b/src/types/copy-action.c
index f661fc3..1704636 100644
--- a/src/types/copy-action.c
+++ b/src/types/copy-action.c
@@ -71,7 +71,7 @@ static void do_send(struct source *source, const char *mime_type, int fd) {
      /* Unset O_NONBLOCK */
     fcntl(fd, F_SETFL, 0);
 
-    if (self->file_to_copy != NULL) {
+    if (self->fd_to_copy_from != -1) {
         /* Copy the file to the given file descriptor
          * by spawning an appropriate cat process.
          */
@@ -82,9 +82,11 @@ static void do_send(struct source *source, const char *mime_type, int fd) {
             return;
         }
         if (pid == 0) {
+            dup2(self->fd_to_copy_from, STDIN_FILENO);
+            close(self->fd_to_copy_from);
             dup2(fd, STDOUT_FILENO);
             close(fd);
-            execlp("cat", "cat", self->file_to_copy, NULL);
+            execlp("cat", "cat", NULL);
             perror("exec cat");
             exit(1);
         }
@@ -97,6 +99,10 @@ static void do_send(struct source *source, const char *mime_type, int fd) {
          * instead.
          */
         wait(NULL);
+        off_t rc = lseek(self->fd_to_copy_from, 0, SEEK_SET);
+        if (rc < 0) {
+            perror("lseek");
+        }
     } else {
         /* We'll perform the copy ourselves */
         FILE *f = fdopen(fd, "w");
diff --git a/src/wl-copy.c b/src/wl-copy.c
index f08cf17..af8cda8 100644
--- a/src/wl-copy.c
+++ b/src/wl-copy.c
@@ -81,28 +81,13 @@ static void did_set_selection_callback(struct copy_action *copy_action) {
     }
 }
 
-static void cleanup_and_exit(struct copy_action *copy_action, int code) {
-    /* We're done copying!
-     * All that's left to do now is to
-     * clean up after ourselves and exit.*/
-    char *temp_file = (char *) copy_action->file_to_copy;
-    if (temp_file != NULL) {
-        /* Clean up our temporary file */
-        execlp("rm", "rm", "-r", dirname(temp_file), NULL);
-        perror("exec rm");
-        exit(1);
-    } else {
-        exit(code);
-    }
-}
-
 static void cancelled_callback(struct copy_action *copy_action) {
-    cleanup_and_exit(copy_action, 0);
+    exit(0);
 }
 
 static void pasted_callback(struct copy_action *copy_action) {
     if (options.paste_once) {
-        cleanup_and_exit(copy_action, 0);
+        exit(0);
     }
 }
 
@@ -235,6 +220,7 @@ int main(int argc, argv_t argv) {
 
     /* Create and initialize the copy action */
     struct copy_action *copy_action = calloc(1, sizeof(struct copy_action));
+    copy_action->fd_to_copy_from = -1;
     copy_action->device = device;
     copy_action->primary = options.primary;
 
@@ -257,7 +243,26 @@ int main(int argc, argv_t argv) {
             if (options.mime_type == NULL) {
                 options.mime_type = infer_mime_type_from_contents(temp_file);
             }
-            copy_action->file_to_copy = temp_file;
+            copy_action->fd_to_copy_from = open(temp_file, O_RDONLY | O_CLOEXEC);
+            if (copy_action->fd_to_copy_from == -1) {
+                perror("Failed to open temp file");
+                return 1;
+            }
+            /* Now, remove the temp file and its
+             * containing directory. We still keep
+             * access to the file through our open
+             * file descriptor.
+             */
+            int rc = unlink(temp_file);
+            if (rc < 0) {
+                perror("Failed to unlink the temp file");
+            } else {
+                rc = rmdir(dirname(temp_file));
+                if (rc < 0) {
+                    perror("Failed to remove temp file directory");
+                }
+            }
+            free(temp_file);
         }
 
         /* Create the source */
@@ -291,6 +296,5 @@ int main(int argc, argv_t argv) {
     while (wl_display_dispatch(wl_display) >= 0);
 
     perror("wl_display_dispatch");
-    cleanup_and_exit(copy_action, 1);
     return 1;
 }

I've put some effort into protecting it against signals

Right; being robust in case of signals makes this a little bit more complicated.

@bugaevc
Copy link
Owner

bugaevc commented Jan 5, 2023

I'm convinced we could replace xdg-mime with gmime or another mimetype library (there are too few admittedly), eliminating the need to create in-filesystem temporary files.

In #97, there was an idea to use file(1) for this. It should be possible to invoke it as file -i -, in which case it reads its stdin, so we don't have to have a file on the filesystem for it. Then we could event attempt to use create_anonymous_file() (i.e. memfd) for this temporary file too.

@Siborgium
Copy link
Author

So this indeed looks rather complicated...

Changes themselves are not very hard to grasp.

  • Refactor to have a taste of incapsulation, each code path is handled separately
  • Some code paths are merged (see below)
  • Try to avoid forks where possible without extra effort (e.g. use sendfile over cat if possible)

unlinking the file immediately after figuring out its MIME type and only storing the fd

Yes, this is what my PR does, except we don't even need the fd if we mmap the file. mmap greatly simplifies fd bookkeeping. Using a mmap/buffer to copy from instead of fd also allows to merge to code paths for buffer and fd.

what do you think?

This is roughly what I came up with initially. However, the current architecture was too hard to tweak, extend and reason about -- e.g. when clearing resource leaks, hence the incapsulation effort.

there was an idea to use file(1) for this

xdg-mime already does this and a lot more. It's quite complicated admittedly, which is why it can't be replaced with libraries or file alone without losing quality. If we're ok with losing quality, mimetype libraries are superior in my opinion.

If we're running on Linux, and user_namespaces feature is enabled, one could mount tmpfs over /tmp in their own namespace, dumping the file there. The file would be invisible but to the child processes (xdg-mime). This is nice enough, but too many "if"s, and it's out of the scope of this PR.

@bugaevc
Copy link
Owner

bugaevc commented Jan 5, 2023

Try to avoid forks where possible without extra effort (e.g. use sendfile over cat if possible)

The idea with using cat was exactly to rely on someone else having figured out how to determine the availability and applicability of various ways of copying (copy_file_range, splice, sendfile, whatever other non-Linux systems may have) and just defer to their implementation.

Have you actually found a case where calling sendfile directly (over forking off a cat) makes wl-copy run noticably faster, or is this a theoretical speedup?

xdg-mime already does this and a lot more. It's quite complicated admittedly

Indeed, I can see that it calls file in the fallback handler (info_generic).

It does seem to be complicated, though not complex: it essentially determines whether it's running under KDE (in which case it wraps kmimetypefinder) or GNOME (or Cinnamon, or LXDE, or MATE, or Xfce), in which case it wraps gio info, or wraps file as a fallback. It wouldn't be so hard to reimplement the same logic in wl-clipboard if we have to, though of course using a ready-made xdg-mime script is very handy.

But, there's a need to know if the file is textual or not (and if it is, make a guess at its encoding), to conditionally add text/plain to the list of offered formats. As discussed in #97, this is easy with file, and does not seem to be available with xdg-mime.

which is why it can't be replaced with libraries or file alone without losing quality. If we're ok with losing quality, mimetype libraries are superior in my opinion.

Do you actually have an example of where xdg-mime fares better then file?

I do have a reverse example: on SSH public keys, xdg-mime (at least with its gio open backend) gets confused into thinking they're MS Publisher documents due to the .pub extension, while file just says it's plain text:

$ file --mime --brief .ssh/id_rsa.pub
text/plain; charset=us-ascii
$ xdg-mime query filetype .ssh/id_rsa.pub
application/vnd.ms-publisher

That all being said, I don't actually think that it's that important to infer MIME type in complex cases correctly. Really, the data types that typically get placed into a clipboard are:

  • various flavor of plain text, in which case the exact MIME type (be it application/x-yaml or application/pgp-keys or text/x-python3 or text/x-kotlin or text/x-csrc) does not really matter, what matters is that it's just text;
  • various image formats — in this case specifying just the right type does matter, but I'd expect all MIME type tools to handle this case well;
  • rich text formats (HTML, RTF), in which case again all the tools should do an equally good job;
  • and finally application- or toolkit-internal formats (e.g. application/x-libreoffice-internal-id-147670) that we would not find in a file anyway.

So while it's surely preferable to handle exotic formats, like say an ISO image or a Blender .blend file, correctly, these are not the types you'd normally find in a clipboard, and none of the apps are really going to know how to paste them. Note that copying/pasting "a file" in a file manager is done very differently, by basically placing its path/URL (and not its actual contents) into the clipboard.

As for using libraries, I'd rather avoid that if possible. wl-clipboard tries to be very light on dependencies — it doesn't really require anything other than libwayland-client (and a libc). While it wouldn't be hard to make a library an optional dependency, just relying on an external tool to do the right thing is nicer still.

If we're running on Linux, and user_namespaces feature is enabled, one could mount tmpfs over /tmp in their own namespace, dumping the file there.

That sounds like a huge overkill for what we're trying to achieve.

A huge lot of simple command-line programs use tmpfile/mk*tmp for storing temporary files, but I don't think I've ever heard of any that uses namespaces to hide that file from everyone else. This may be beneficial to do for system daemons (and quite easy to set up with systemd's PrivateTmp option), but not for small utilities. That being said, nothing prevents you from making a little wrapper for wl-copy that would unshare CLONE_NEWUSER|CLONE_NEWNS, remount /tmp, and then exec wl-copy.

However, the current architecture was too hard to tweak, extend and reason about -- e.g. when clearing resource leaks, hence the incapsulation effort.

I'm not at all asserting that the current architecture is perfect (although it sure took a massive refactoring effort to get there from the wl-clipboard 1.0 codebase), but it wasn't hard at all to make the above tweak (copy_action->file_to_copycopy_action->fd_to_copy_from). That would probably get somewhat messier with signal handling (which we would not need should we implement the fd-only model), but that still looks quite doable from where I'm standing.

Changes themselves are not very hard to grasp.

I understand this is not trivial, but maybe if you do split your work up into proper incremental commits it would be more approachable...

Plus, some things just cannot go in as is. To name a few things:

  • you're doing includes wrong, internal includes are never supposed to be done with the <>s, even if you add an internal directory to the include path — and for a simple enough project like wl-clipboard, you should not have to do that either;
  • the project style is to use post-increment (foo++) in for loops;
  • format pointers as void *ptr (with the asterisk attached to the pointer name);
  • we don't use... whatever doc-comment convention this is:
    /// @brief Begin "sensitive" section
    /// @param[in] handler Emergency handler, invoked on signal
    /// @return 0 on success, negative value otherwise
  • we don't use C++ style comments (//) at all, other than sometimes to specify what an include is needed for, e.g.
    #include <fcntl.h> // open
    #include <sys/stat.h> // open
    #include <sys/types.h> // open
    #include <stdlib.h> // exit
    #include <libgen.h> // basename

    In fact there are rather specific rules of how comments are to be formatted; please take a look at the existing codebase for examples.

Anyway, please don't take this the wrong way, I'm grateful for your contribution and for this conversation we're having 🙂

bugaevc added a commit that referenced this pull request Mar 4, 2023
We only need the temp file to be present on the file system to run
xdg-mime on it. Once we have done that, we can unlink it and simply use
an fd to refer to it, seeking to the beginning each time.

This makes us more robust against wl-copy crashing (or getting killed),
in which case the file would be kept on the file system; this should no
longer be the case with this patch.

Related to #155
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants