-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OC_FileChunking on external file systems #4997
Comments
👍 as mentioned in bug #2947 already which was killed by a close-bugs-no-matter-what wave unfortunately ;-) |
This is a good idea, we only need to add an offset header to know where in the file to write the chunk. This way the order of chunks is not important |
we could use the http range header for this |
@karlitschek oc7 I guess 😉 |
Assembling the chunks on the external file system will be a problem though most backends don't support partial writes or appends, the only reliable way to append two uploaded chunks on an external file system would be to download both chunks and upload the result. |
In such a scenario we need to stick with the approach we have today. What about moving this to the storage api and let the storage implementation handle this? |
Yes. ownCloud 7 |
Moving it to the storage API might help but I only know of one backend (ftp) that might be able to provide a better then default implementation. |
it will help with irods as well - not the most common storage but still ..... 😉 |
Any more ideas on this ? After reading the thread that raises some relevant issues about the proposed approaches, I can't think of any other at the moment. |
I heard that 1.6.0 has configurable chunk sizes. If we can increase the chunk size to reach the max upload size, it might help when uploading files to ext storage as a single chunk, at least for bigger files. |
@PVince81 remember that we also introduced the chunking to not get hit by the request timeout. |
We might want to update the protocol in this case - I can think scenarios where the chunk size can become an necessary information on the server. As of today the chunk size is not communicated. @dragotin another topic for Wednesday? |
Yes, but I do not really see a big benefit in changing/configuring the chunk size. |
This is part of a larger conversation to be had around external storage. For example, if you upload a file once to the server in chunks and reassemble, and then put it to a backend server...you are uploading twice: once to the ownCloud server, once to the backend. This is part of a larger oC 8 concept we need to look at that allows direct access where possible, streaming if the backend supports it, and then this as a last effort - as performance is slowest here. |
This is not an Enhancement, but a bug, the mirall issue tracker is full of bug reports about timed out uploads of big files (Example owncloud/client#2074). Please re-tag to bug. Solution proposal is #12097 |
@dragotin do you face these timeouts only on external storage or on local storage as well? |
also on local storage. Its a matrix of computer speed and load and size of the file to process and timeout value, so it will always simply be possible to time out. |
okay - we can for now at least get rid of the reassembly step ... I'll take care of that - NOW |
@DeepDiver1975 if I understand well, will you store the part file directly on the external storage, and use Note that The other approach, simply putting the chunks onto the external storage server will be even slower because the workflow would be as follows:
Putting the chunks directly with this methods adds an additional overhead for each chunk, and also the additional overhead of having the re-download the chunks from SMB for assembly. |
First step will be to not save the chunks individually on the server and then reassemble them but to write the chunks to a temp file and once done this file can be pushed to the final destination. Regarding external storage this will not help much because the file once reassembled has to be pushed forward anyhow. |
Ok, so if I understand well the following will happen:
When the chunks do not arrive in order, it will simply use Is that correct ? 😄 |
exactly |
@dragotin understood and we will get into this discussion as well - but this has to be done also because we are shifting bytes around too many times. |
@DeepDiver1975 yes, I understood this now as well ;-) thanks... |
The interface defs look good. Makes sense! |
In order to bring this in we need to resolve the encryption issue as discussed in #12006 (comment) @icewind1991 @schiesbn @PVince81 @karlitschek @craigpg we need to make a decission - otherwise and written piece of code to solve chunked upload will be pointless |
@icewind1991 proposed to port the file proxys to be called inside a storage wrapper. That would be a quick workaround that saves us to port the whole encryption app to a storage wrapper short term (but which we'll have to do eventually!). Here it is: #12701 Another possibly alternative if that doesn't work is to make the Storage's We'll see how the experiments go. |
The chunk handler experiment is here: #12160 |
Some additional ideas coming from #13157. I suspect that Dropbox might not need part files either if their upload mechanism is atomic as well. I don't think that they'll directly overwrite the current file like FTP or SMB would do. This could save the part file step. |
@PVince81 I have OC server 7.0.4 running on a 5GB space at a webhost. The connection is very slow, upload speed 20 - 60 kB/sec. When uploading these large files, I get at about each 6 min a "Connection closed" and "Operation cancelled" error message. |
@VincentvgNn it mostly looks like the timeout makes it unable to finish the chunks ? Are you using php-fpm which auto-kills PHP processes when the network connection is broken ? (mod_php doesn't kill the PHP process, it would continue working) |
@PVince81 Shouldn't a killed process be able to recover by using the available "chunking" files? |
@VincentvgNn it's not a setting it's your setup. phpinfo might tell you. If the chunk files aren't complete there is no way to recover them. The client should then resend the chunk again, with a different transaction id (@dragotin can you confirm?). This would leave canceled chunks lying around, which is a known issue. |
@PVince81 |
@PVince81 |
The chunk cleanup will be a background job starting with 8.1: #14500 |
We're past feature freeze => 9.0. |
closing this in favor of #20118 |
The current implementation of OC_FileChunking is storing the chunks in the servers file cache (hidden/private folder on the server).
After all chunks have been received the file is reassembled and moved to the final target location.
While this approach seems valid for local filesystems this can case issues with respect to external filesystems and big files, because the final move of assembled file will consume some time.
From the overall system behavior it might be interesting to move the chunks already to the external filesystem.
My suggestion would be to store the chunks already in the target file but name it differently, append the .part and even make it hidden if possible.
This approach would speedup the process on the local filesystem as there is no reassembly step which as of today adds additional execution time.
@karlitschek @bartv2 @icewind1991 @danimo @dragotin
Please add your comments - THX
The text was updated successfully, but these errors were encountered: