Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to save an upload? #428

Closed
scottw-finao opened this issue Oct 15, 2021 · 7 comments
Closed

how to save an upload? #428

scottw-finao opened this issue Oct 15, 2021 · 7 comments
Labels

Comments

@scottw-finao
Copy link

Scouring through the documents and videos, I can't seem to find a simple example of how to save a file upload. I've tried mulitple examples I see and none worked. I took some guesses and got one to save using getContents() but it doesn't seem to work on larger files.

I see other sources that say to save stream data you use stream_copy_to_stream but when i take UploadedFile()->getStream() as the input, it says it wants a resource, not an instance.

I just need to do a simple file upload as part of a multipart form data rest interface. I can the array of UploadedFile() instances with getUploadedFiles() but I can't seem to find any basic instructions for turning that into a file.

I tried yet another example that uses a promise, but then I'm not sure what the proper way is to get the UploadedFile() data into the sub-function. Can someone please give me a simple example on how I'm supposed to save the file data please?!

@clue
Copy link
Member

clue commented Feb 5, 2022

@scottw-finao Thank you for this excellent question! This project does indeed support file uploads just fine (but there are a number of caveats, see below).

Here's the gist how to save uploaded files:

$http = new React\Http\HttpServer(function (Psr\Http\Message\ServerRequestInterface $request) {
    $files = $request->getUploadedFiles();
    if (!isset($files['avatar'])) {
        return React\Http\Message\Response::plaintext("No file given\n");
    }
    
    $avatar = $files['avatar'];
    assert($avatar instanceof Psr\Http\Message\UploadedFileInterface);
    
    if ($avatar->getError() !== UPLOAD_ERR_OK) {
        return React\Http\Message\Response::plaintext("Upload error\n");
    }
    
    // unsafe, for demonstration purpose only: no filename checks, may overwrite existing file
    $name = $avatar->getClientFilename();
    
    // blocking, for demonstration purpose only: filesystem may block, loads entire file into memory
    file_put_contents(__DIR__ . '/' . $name, (string) $avatar->getStream());
    
    return React\Http\Message\Response::plaintext("Uploaded $name\n");
});

This example should work just fine, but loads the entire file into memory before writing it to the filesystem, see also https://github.com/reactphp/http#request-body and #252. As per the linked documentation, we can use the getStream() method, but not the moveTo() method for reasons given below.

In particular, note that the filesystem is inherently blocking, so you may wish to use a different approach if you have high concurrency. For example, you may instead send this off to some external service (think S3, FTP, SSH, etc.) or use our (experimental) filesystem adapter: https://github.com/reactphp/filesystem

On top of this, note that the HttpServer has a default request size limit of just 64 KiB. This means that if you're trying to upload larger files, you have to configure the server to support larger files. See https://github.com/reactphp/http#httpserver for details and also #412 for motivation.

The problem we're seeing here is that the filesystem is inherently blocking, but ReactPHP is entirely non-blocking. This means we try hard to avoid hitting the filesystem and as a consequence have to store everything in memory. This works perfectly fine for smaller requests, but it's easy to see why you wouldn't want to store multiple gigabytes of data in memory for large uploads. To avoid this, ReactPHP provides the option to use streaming request bodies as per https://github.com/reactphp/http#streaming-incoming-request. While this would allow you to accept arbitrary file uploads, it's arguably also somewhat harder to use.

Together, this means handling file uploads is surprisingly hard.

I've looked into this a couple of times already and hope to be able to make this easier in the future, but the main problem that files have to be stored "somewhere" remains for now. I hope to provide some kind of configuration option for this in the future, but no promises at this point.

I believe this has been answered, so I'm closing this for now. Please come back with more details if this problem persists and we can always reopen and/or keep discussing this 👍

@bartvanhoutte
Copy link

bartvanhoutte commented May 20, 2022

@clue I've also been looking into handling file uploads through streaming for the last couple of days but I must admit it is a very hard problem to solve.

Mainly the biggest issue is handling the stream flow while doing any asynchronous I/O. At a certain point while handling the request you might want to check the filesystem or database to see if the file already exists, which causes issues with the stream. I've tried buffering the data emitted by the stream and handling the buffer + remaining data emitted by the stream after doing non-blocking I/O but this is really hard and probably unreliable. The size of the buffer is correlated to the duration of any asynchronous I/O you'd like to do. Hence in the worst case scenario you'll end up with having the complete file in memory at which point streaming becomes completely useless.

The least worse solution I've come up with is to store the file on a temporary filesystem that supports ReadableStreamInterface before doing any processing. When you've done the necessary I/O you can either move (stream) the file from the temporary filesystem to the final destination or delete the file when you shouldn't have saved it in the first place.

Bear in mind that this solution:

  • requires some form of middleware to be executed as early as possible in the stack since 3rd party middleware might do I/O while data is being emmitted.
  • requires you to manage the temporary filesystem.

@clue
Copy link
Member

clue commented May 20, 2022

Agree with your analysis and the fact that this is both surprisingly complex and that it most likely requires some kind of filesystem integration for temporary files. I've started looking into this for some private projects a while ago, a more generic solution is definitely on the roadmap, but requires non-trivial effort.

@bartvanhoutte
Copy link

I have been looking into this a bit more, there's still some hurdles left but I think it should be doable. Just a couple of things to think about, YMMV.

The main issue is avoiding any I/O while receiving incoming data:

  • A temporary filesystem can be created on startup of the HttpServer and destroyed on shutdown, avoiding some I/O while invoking the request handler. A fallback for when the server wasn't shut down properly should be in place, simply destroying the previous temporary filesystem should be sufficiant.
  • Async file_exist calls should be avoided by leaving the generation of filenames in the temporary filesystem to the implementation of the filesystem. Avoiding filename collisions is going to require some effort.
  • Probably the hardest one is dealing with (async) fopen calls when invoking the request handler. I haven't tested it but I can think of a situation where you have a filesystem with high latency taking a long time to return a file descriptor, missing out on the first couple of chunks emitted by the incoming stream. Could this be solved by not adding the stream to the loop until a FD is returned? If not maybe use a pool of FD's? If not maybe use one file/FD and use offset and length to write/read?

@clue
Copy link
Member

clue commented Jun 3, 2022

The filesystem is inherently blocking while ReactPHP is entirely non-blocking, so any solution we're going to come up with is going to involve some non-trivial effort. Most commonly, blocking filesystem calls can be avoided by using thread pools (not available in PHP without custom extensions and a ZTS build which is rare) or process pools (available out of the box, but incurs significant overhead).

Technically, any filesystem call is blocking. But in practice, today's filesystem implementation and underlying hardware devices tend to be fast. Realistically, this means we may get away with using blocking filesystem calls. This can be especially true if we're only dealing with files in smaller chunks, for instance for our socket communication we usually read a few kilobytes at once only and most filesystems would outperform any network connection.

PHP's tmpfile() or tempnam() can be used to create temporary files with unique names, so no need to perform costly stat calls with file_exists() and friends. Filesystem calls may indeed be blocking, but from a practical perspective they might be fast enough to be considered non-blocking for many use cases.

@bartvanhoutte
Copy link

I should have added in my previous reply that for starters I'd got for an implementation using libuv for now 😅

The filesystem is inherently blocking while ReactPHP is entirely non-blocking, so any solution we're going to come up with is going to involve some non-trivial effort. Most commonly, blocking filesystem calls can be avoided by using thread pools (not available in PHP without custom extensions and a ZTS build which is rare) or process pools (available out of the box, but incurs significant overhead).

Does this include filesystem access using libuv from the perspective of the event loop? I'm not sure what to make of the following statement in the libuv documentation found at http://docs.libuv.org/en/v1.x/design.html#file-i-o. The prototyping I've done described in the previous comment was with libuv.

Am I correct in saying we either use threads/process pools in native PHP or use I/O functions with libuv? Like it's done in react/filesystem 0.2.x?

Unlike network I/O, there are no platform-specific file I/O primitives libuv could rely on, so the current approach is to run blocking file I/O operations in a thread pool.

... and most filesystems would outperform any network connection.

Makes sense but i guess YMMV when buffering requests on an IO throttled drive.

PHP's tmpfile() or tempnam() can be used to create temporary files with unique names, so no need to perform costly stat calls with file_exists() and friends. Filesystem calls may indeed be blocking, but from a practical perspective they might be fast enough to be considered non-blocking for many use cases.

Would definitely work with a thread/process pool and would also makes things less complicated 👍.

@bartvanhoutte
Copy link

The filesystem is inherently blocking while ReactPHP is entirely non-blocking, so any solution we're going to come up with is going to involve some non-trivial effort. Most commonly, blocking filesystem calls can be avoided by using thread pools (not available in PHP without custom extensions and a ZTS build which is rare) or process pools (available out of the box, but incurs significant overhead).

Technically, any filesystem call is blocking. But in practice, today's filesystem implementation and underlying hardware devices tend to be fast. Realistically, this means we may get away with using blocking filesystem calls. This can be especially true if we're only dealing with files in smaller chunks, for instance for our socket communication we usually read a few kilobytes at once only and most filesystems would outperform any network connection.

PHP's tmpfile() or tempnam() can be used to create temporary files with unique names, so no need to perform costly stat calls with file_exists() and friends. Filesystem calls may indeed be blocking, but from a practical perspective they might be fast enough to be considered non-blocking for many use cases.

As libuv is now able to use io_uring (libuv/libuv@d2c31f4) this may become a lot more viable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants