Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locally copying a file should cause a COPY not a PUT #5867

Open
ckamm opened this issue Jun 28, 2017 · 7 comments
Open

Locally copying a file should cause a COPY not a PUT #5867

ckamm opened this issue Jun 28, 2017 · 7 comments
Assignees
Milestone

Comments

@ckamm
Copy link
Contributor

ckamm commented Jun 28, 2017

This is a follow up of a discussion in #3422.

Currently when you make a copy of a file that is synchronized, the copy will be uploaded from scratch. Instead, the client should send a COPY command to the server. For large files this would save a lot of bandwidth and time.

The duplicate detection should be possible by checking size, hashes and possibly mtime.

This server issue is relevant: owncloud/core#26584

@ckamm ckamm changed the title Locally copying a file should cause a COPY not an upload Locally copying a file should cause a COPY not a PUT Jun 28, 2017
@guruz
Copy link
Contributor

guruz commented Jul 3, 2017

  • make sure COPY honors the If-match header so we really only copy the file if it has that ETag

@guruz guruz added this to the 2.5.0 milestone Oct 7, 2017
@ogoffart
Copy link
Contributor

ogoffart commented Nov 21, 2017

  • Ceate an index on the contentChecksum in the DB
  • Compute the checksum in _csync_detect_update for any INSTRUCTION_NEW for new local files
  • Create a new INSTRUCTION_COPY that can already be set in detect_update, confirmed in reconcile
  • Make a PropagateRemoteCopy job.

Also for server copy detection:

  • If the server provide a "sufficiently good" checksum in the metadata, we can also do copy detection to avoid multiple download

But this algorithm, that query the database, only work if the file would have been there from a previous sync. It would need some tweaking if two identical files are added at the same time. (When should we do the decision to copy? it has to be done in the right order)

For which file do we compute the checksum in the update phase? All NEW files? All files whose mtime has changed (currently we do that for eml files only)?

Finaly, which hash can we use for that? Is it ok to do that with MD5 or SHA-1?

@AnrDaemon
Copy link

You can't ensure file content equality on hash alone.
Especially not on known-weak hash, such as md5.

@ckamm
Copy link
Contributor Author

ckamm commented Nov 22, 2017

@ogoffart What if the decision to COPY instead of PUT was done in the PropagateUpload job? We compute the content checksum of all files we upload in that job anyway.

@ckamm
Copy link
Contributor Author

ckamm commented Nov 22, 2017

About hashes: We currently use csync_is_collision_safe_hash to determine whether assuming content equality based on hash equality has acceptable risk.

@guruz
Copy link
Contributor

guruz commented Nov 23, 2017

You can't ensure file content equality on hash alone.

Is SHA1+filesize good enough?

@ogoffart
Copy link
Contributor

Is SHA1+filesize good enough?

It's been 10 years since SHA1 is no longer considered good enough for chryptographic hashes. And a collision was even found earlier this year.

https://en.wikipedia.org/wiki/Hash_function_security_summary
http://valerieaurora.org/hash.html

Ideally we should use SHA-256 or SHA-3, but we don't currently implement these function.

Now, maybe we do not need to be cryptographycally secure. However, when we try to implement COPY, we increase the risk.

@guruz guruz modified the milestones: 2.5.0, 2.6.0 Mar 26, 2018
@ckamm ckamm modified the milestones: 2.6.0, 2.7.0 Mar 25, 2019
@TheOneRing TheOneRing modified the milestones: 2.7.0, 2.9.0 Mar 17, 2021
@TheOneRing TheOneRing modified the milestones: 2.11, 2.12 May 4, 2022
@TheOneRing TheOneRing modified the milestones: 4.0, 5.0 Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants