-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't re-download/re-upload manually copied files #3422
Comments
There is definitely an open feature request for this available here but can't find it at the moment. Will see if i can find it tomorrow. |
This would be a use cases for checksum. @dragotin |
This would be really nice to have for slow connections when you want to "seed" a new client install. e.g. If I have nGB in ownCloud on my laptop client. Now I want my wife's laptop to have ownCloud with the files. If I could either: The client would find that the files on the server and files on the local client already matched - a little bit of time sending messages back-and-forth client<->server but no need to transfer any real file data. I suspect that any code that succeeded in doing this sort of thing would work for both sequences (a) or (b) anyway. |
bump :0)...definitely need this feature. I have over 5tb of files I'd like to have in owncloud. In order to get all those files it would probably take over a year for the client to upload all of them! |
This issue also makes it extremely inconvenient to migrate from other sync tools (unison, rsync, dropbox, ...) to owncloud. I have over 100GB of files stored in 4 locations (linux+unison+rsync) and decided to make owncloud available for my co-workers... If we all have to redownload all files over again our network will be clogged for weeks. I know very little of owncloud's internals, but let me try brainstorming a possible solution... For server side:
P.S.: I do realize that hashing on download is CPU- and disk-expensive on the server side, but pointless downloading is network intensive and uses just as much disk io... |
This would indeed make owncloud a lot more usefull. I'm having around 600GB of data. If I would reinstall one computer and need to resync all this data (instead of first backing it up on an external drive and just copy after reinstallation) that would take a lot of time. |
Gotta agree with @lhartmann - he seems to have thought it through quite thoroughly. |
My suggestion is flawed for large files. Imagine you have a 50GB virtualbox disk image to sync, where only 10kB were actually modified. You would have to rehash all of the 50GB, which is terrible, but so would be to transfer 50GB instead of 10kB. Recent clients (I believe) can upload fragments of a file, say 4MB, per HTTP connection. I don't think this is a fixed value which complicates things a lot... However, if a standard fragment size can be defined at least for owncloud clients, then there is another potential solution:
Phew... Took me 2 hours to write this, but at least this feels even better than the previous idea. :-) |
LH> My suggestion is flawed for large files. Not quite. LH> Imagine you have a 50GB virtualbox disk image to sync, where only 10kB You WILL need to rehash it, it is unavoidable. There's just no other way to LH> HASH information MUST NOT be included in replies to regular propfind It doesn't need to be, but there's options. LH> That would make responses big: 500kB+ in hash entries for a 50GB file That assuming md5 hash, which is weak. sha256 hash is ~100 octets per chunk. |
@lhartmann @AnrDaemon : there are already topics about differential sync: But this topic is not. Don't get me wrong, it's a great feature and both issues could be dealt at the same time. The problem here could be easier to implement and would already help a lot of users. Picture, music, or film archives are very static. |
Unless the filesystem supports keeping track of partial file changes You will need to rehash the 50GB file on the client side every time a single byte is changed. Annoying but still better than tranfering all that data. On the server side hashing is required only after server update, but would be gradual and on demand. Once all files are hashed the client may send modified hashes along with modified segments, so normal operation would not need the server to hash anything. Exceptions are shared files where trusting an ill-intended client is unsafe, and old/alternate clients which would not provide the new hash as segments are uploaded. Hashing is supposed to consider every segment independently, as if there were 12k 4MB files instead of a single 50GB file, so there would be no edge issues. Note that full file hashing is discouraged on the server, for it would require 50GB disk IO for every segment updated. @tflidd Sorry about the crossed topic post. I will move the misplaced posts away as soon as I have the time. |
@tflidd, apology for hijacking the thread, but these two questions are so tightly linked that it isn't really possible to point a finger and say "there's stops the one and starts another." |
Is there any workaround to this issue? Perhaps a way to manually edit metadata on the remote files or trick the Owncloud sync client into thinking it has already synced the files? I have a new Owncloud installation with terabytes of data that is in two locations that is exactly the same, but Owncloud tries to re-sync everything, which would take forever and use a ton of bandwidth. |
@theminor sadly I haven't found one yet. There needs to be a way to seed the server but as of now there is none. I also have over 5TB of data with more than 1M files which would take years to complete. |
now that we have checksum in the database, we could try to detect copies. I don't know if the server support the webdav |
Same problem. I have about 500GB which would be re-downloaded over WIFI by the client even if the complete data is copied beforehand. |
Any progress yet? My data is growing relatively fast with at least two user having more then 500GB. Syncing this again on reinstall of their laptops (which inevitable will happen someday) will be problematic that. |
it is scheduled for the 2.2 version: |
When will 2.2. be released ? I just reinstalled a machine and can share the frustrating experience of owncloud client wanting to download everything again although its already there. |
Hi, |
Hi It would be a greeeeeat feature! |
When moving files from e.g. PC A to PC B make sure to copy over the hidden file .csync_journal.db to the same place / folder on PC B and make sure the file mtime stays the same. The sync client keeps its state in there and won't re-download the files. @brunodegoyrans and others Please use the emoticon at the first post of this issue to say that you would like to see that feature. This avoids that this issue gets filled up with no-content posts like "me too". |
The server is already able to store checksums as sent by the client. |
OC server supports the COPY method, but it doesn't seem to copy the checksum (it doesn't copy any other metadata either). This could be improved and have it copy the checksum field at least. Raised here owncloud/core#26584 |
The original problem should be solved for client 2.4 by #5838 if the size and mtime of the server and client file are identical. @PVince81 @ogoffart Your discussion of |
For large files, it often makes sense to copy them directly between two computers, instead of waiting for an upload/download cycle. The client should detect such directly copied files and not re-download them from the server.
Expected behaviour
The client should not download a file, which has been copied manually to the ownCloud folder from another client
Actual behaviour
The client downloads an already existing file again from the server, if the file has been copied manually into the ownCloud folder
Steps to reproduce
Two clients: A,B
Server configuration
Operating system: Ubuntu 12.04
Web server: Apache 2.4.12
Database: MySQL
PHP version: 5.5.23
ownCloud version: 8.0.2
Storage backend:
Client configuration
Client version: 1.8.3
Operating system: Ubuntu 15.04
OS language: DE
Installation path of client: /usr/bin
--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/23945516-don-t-re-download-re-upload-manually-copied-files?utm_campaign=plugin&utm_content=tracker%2F216457&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F216457&utm_medium=issues&utm_source=github).The text was updated successfully, but these errors were encountered: