Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use content Checksums to detect copy #4609

Closed
dragotin opened this issue Mar 24, 2016 · 4 comments
Closed

Use content Checksums to detect copy #4609

dragotin opened this issue Mar 24, 2016 · 4 comments

Comments

@dragotin
Copy link
Contributor

dragotin commented Mar 24, 2016

Expected behaviour

If the user copies a file on the client, currently the new file is uploaded. However, it is exactly the same as the source file which is already in sync.

Instead of uploading again, the client shoud detect the copy and issue a WebDAV COPY as described in http://www.webdav.org/specs/rfc4918.html#METHOD_COPY

For that, the following steps are needed:

  1. If there is a new file, the sync algorithm should calculate the content checksum and query the journal if there is another file with the same checksum.
  2. If so, the client should send a WebDAV COPY with the ETag of the source file that is supposed to be copied and the target file name.
  3. On successful copy, the server returns the new fileID and ETag of the copy target file.

Dependency: The server needs to implement WebDAV COPY. @DeepDiver1975 wouldn't that be cool?

Alternatively (attention, brainstorm): With the new chunking, if the client announces the file as described in https://dragotin.wordpress.com/2015/07/10/owncloud-chunking-ng-part-2-announcing-an-upload/ , it could send the checksum with the announcement. The server could check if the file with the checksum is available anywhere on the server, and if so, just copy it and return a 201 created back to the client which does not to upload anything. Called: Deduplication, fits nicely into the concept 💯

@hodyroff FYI

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32234484-use-content-checksums-to-detect-copy?utm_campaign=plugin&utm_content=tracker%2F216457&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F216457&utm_medium=issues&utm_source=github).
@guruz
Copy link
Contributor

guruz commented Apr 19, 2017

FYI @IljaN @ogoffart @ckamm

@phil-davis
Copy link
Contributor

👍
and think about the file move/rename cases also. e.g. if the client app has been stopped for a while, I think it can't understand that a file has been renamed. So it does a delete+copy to the server.
If the attempted copy is done before the delete, then (with the logic proposed above) the server will find the existing same file and copy it locally on the server. Then the delete can happen without pain.

If the server has a cache of recently-deleted files/file versions... then the copy-delete time ordering would not matter - the client app could delete first, then copy, and the server would find the file in the recently-deleted files and "get it back".

@IljaN
Copy link
Member

IljaN commented Apr 20, 2017

The server could check if the file with the checksum is available anywhere on the server, and if so, just copy it and return a 201 created back to the client which does not to upload anything

Would require an index on the checksum column in oc_filecache. Problem I see is that we currently store checksums denormalized in a single column in a format like SHA1:abc;MD5:def;ADLER32:hij thus making an efficient index lookup by checksum impossible.

JSON indexing capabilities provided by Postgress or MySQL 5.7 are not compatible on the database level.

So the only ugly solution I see is to denormalize the current checksum column into three seperate ones (checksum_sha1, checksum_md5, checksum_adler32) . Or even uglier: The client always needs to send all three checksums in exactly the format and order described above, but I think I would refuse to implement that one :-p

@ogoffart
Copy link
Contributor

ogoffart commented Dec 5, 2017

Moved to #5867

@ogoffart ogoffart closed this as completed Dec 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants