Compare hashes for files where mtime and size match #5838

ckamm · 2017-06-14T11:36:13Z

Addresses #5589

The client is very picky about date strings it accepts. If dates are formatted with a non-C locale (such as localized weekday names), it fails to parse it and tests fail in subtle ways.

ogoffart

good work.
I made a few comment, but overall looks good

ogoffart · 2017-06-14T12:18:36Z

test/syncenginetestutils.h

@@ -706,17 +706,26 @@ class FakeErrorReply : public QNetworkReply

 class FakeQNAM : public QNetworkAccessManager
 {
+public:
+    using Observer = std::function<void(QByteArray verb, QString fileName)>;


Would it be not more generic to have the whole QNetworkRequest as a parameter?
So we could also test "Does the call contains the header FooBar"

And in fact, it could return a QNetworkReply *, which would be returned if it is not nullptr

So some tests could throw custom error

Makes sense, will do.

ogoffart · 2017-06-14T12:31:34Z

src/libsync/checksums.cpp

@@ -120,6 +132,12 @@ QByteArray contentChecksumType()
    return type;
 }

+QList<QByteArray> collisionSafeHashes()
+{
+    static QList<QByteArray> list = { "SHA1" };


Well well well.... haven't you heard the news about sha1?
But yeah, since we don't support anything better anyway...

SHA1 may not be safe against attacks, but collisions are still overwhelmingly unlikely. Even MD5 should be okay.

ogoffart · 2017-06-14T12:45:59Z

csync/src/csync_reconcile.c

+                    // It could be a conflict even if size and mtime match!
+                    // When we have the remote checksum available we can detect
+                    // it without downloading the file.
+                    is_conflict |= (ctx->current == REMOTE_REPLICA ? cur->checksumHeader : other->checksumHeader) != 0;


This logic is duplicated (and different, in particular the check for collisionSafeHashes) from the logic in PropagateDownloadFile::start
Have you considered always creating the Download job (in any cases) and have the logic only there.

(But it's true that if it's is_conflict is strictly more "true" than the logic in PropagateDownloadFile::start, that's fine if they are different. as long as PropagateDownloadFile does not actually fetch the file)

Yes, this is an "early out" optimization to avoid creating unnecessary jobs. However, it won't gain us much in practice when servers generally have hashes for all files. It may be worth it to remove this logic and just keep the is_conflict=true case to simplify the code. Do you agree?

This will lead to a strong behavior change though: Previously, in the absence of hashes, we'd treat NEW/NEW with same mtime/size as no conflict - now we would.

So I think it's a good idea to switch on the discerning behavior only if we have a hash.

Yes, I agree that's what i had in mind.

This will lead to a strong behavior change though: Previously, in the absence of hashes, we'd treat NEW/NEW with same mtime/size as no conflict - now we would.

I was thinking the Download job logic would be changed not to download the file if mtime and size are the same, even if there is no hash.

ogoffart · 2017-06-14T12:47:58Z

src/libsync/propagatedownload.cpp

+    if (_item->_instruction == CSYNC_INSTRUCTION_CONFLICT
+        && _item->_size == _item->log._other_size
+        && _item->_modtime == _item->log._other_modtime
+        && collisionSafeHashes().contains(checksumType)) {


How about doing it even if it's not a collisionsafe hash, it would still be better than before where we were not comparing anything.

I'd rather not allow it for Adler32 - it's not designed as a hash function, so collisions are even more likely than its short 32 bit output would suggest, in particular for short strings.

So you accept collision without a hash (from the reconsile phase), but not if adler32 is used?

Adler32 is not good enough

You're missing the point, it goes the other way.
We were considering the file to be equal if they have the same size and mtime.
When there is a checksum, we now add a security check by comparing the checksum.

This is useful for monitoring what kind of network requests are sent to the fake server. Such as "did this sync cause an upload?" and "was there a propfind for this path?". It can also inject custom replies.

* For conflicts where mtime and size are identical: a) If there's no remote checksum, skip (unchanged) b) If there's a remote checksum, create a PropagateDownload and compute the local checksum. If the checksums are identical, don't download the file and just update metadata. * Avoid exposing the existence of checksumTypeId beyond the database layer. This makes handling checksums easier in general because they can usually be treated as a single blob. This change was prompted by the difficulty of producing file_stat_t entries uniformly from PROPFINDs and the database.

ckamm · 2017-06-15T08:58:49Z

@ogoffart I've fixed up the network observer addition, but left my other changes in a separate commit so they're easier to review. Let me merge when you're done, so I can wipe away that fixup commit.

ogoffart · 2017-06-15T09:20:42Z

👍

ckamm · 2017-06-15T11:56:39Z

merged manually

ckamm added 2 commits June 14, 2017 12:01

SyncEngineTest: Fix date locale related bug

9b606d0

The client is very picky about date strings it accepts. If dates are formatted with a non-C locale (such as localized weekday names), it fails to parse it and tests fail in subtle ways.

SyncEngineTest: Send 'checksums' in FakePropfind

2caaaed

ckamm added this to the 2.4.0 milestone Jun 14, 2017

ckamm self-assigned this Jun 14, 2017

ckamm requested a review from ogoffart June 14, 2017 11:36

ckamm mentioned this pull request Jun 14, 2017

csync: mtime diff < 1s => conflict not detected #5589

Closed

ogoffart approved these changes Jun 14, 2017

View reviewed changes

ckamm added 3 commits June 15, 2017 10:30

SyncEngineTest: Add network override

7c09fb3

This is useful for monitoring what kind of network requests are sent to the fake server. Such as "did this sync cause an upload?" and "was there a propfind for this path?". It can also inject custom replies.

fixup: review fixes

716b99d

ckamm force-pushed the downloadchecksum branch from 97b0ece to 716b99d Compare June 15, 2017 08:58

ckamm closed this Jun 15, 2017

ckamm deleted the downloadchecksum branch June 15, 2017 11:56

ckamm mentioned this pull request Jun 22, 2017

Don't re-download/re-upload manually copied files #3422

Closed

ckamm mentioned this pull request Nov 10, 2017

[Enhancement] Check file hash before starting network transfer #6153

Closed

codeling mentioned this pull request Sep 3, 2024

Avoid redownload by client of locally existing files (eg copied by rsync) nextcloud/desktop#1383

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare hashes for files where mtime and size match #5838

Compare hashes for files where mtime and size match #5838

ckamm commented Jun 14, 2017

ogoffart left a comment

ogoffart Jun 14, 2017

ogoffart Jun 14, 2017

ckamm Jun 15, 2017

ogoffart Jun 14, 2017

ckamm Jun 15, 2017

ogoffart Jun 14, 2017

ckamm Jun 15, 2017 •

edited

Loading

ogoffart Jun 15, 2017

ogoffart Jun 14, 2017

ckamm Jun 15, 2017

ogoffart Jun 15, 2017

guruz Nov 15, 2017

ogoffart Nov 16, 2017

ckamm commented Jun 15, 2017

ogoffart commented Jun 15, 2017

ckamm commented Jun 15, 2017

Compare hashes for files where mtime and size match #5838

Compare hashes for files where mtime and size match #5838

Conversation

ckamm commented Jun 14, 2017

ogoffart left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ckamm Jun 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ckamm commented Jun 15, 2017

ogoffart commented Jun 15, 2017

ckamm commented Jun 15, 2017

ckamm Jun 15, 2017 •

edited

Loading