-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare hashes for files where mtime and size match #5838
Conversation
The client is very picky about date strings it accepts. If dates are formatted with a non-C locale (such as localized weekday names), it fails to parse it and tests fail in subtle ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good work.
I made a few comment, but overall looks good
test/syncenginetestutils.h
Outdated
@@ -706,17 +706,26 @@ class FakeErrorReply : public QNetworkReply | |||
|
|||
class FakeQNAM : public QNetworkAccessManager | |||
{ | |||
public: | |||
using Observer = std::function<void(QByteArray verb, QString fileName)>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be not more generic to have the whole QNetworkRequest as a parameter?
So we could also test "Does the call contains the header FooBar"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in fact, it could return a QNetworkReply *, which would be returned if it is not nullptr
So some tests could throw custom error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, will do.
src/libsync/checksums.cpp
Outdated
@@ -120,6 +132,12 @@ QByteArray contentChecksumType() | |||
return type; | |||
} | |||
|
|||
QList<QByteArray> collisionSafeHashes() | |||
{ | |||
static QList<QByteArray> list = { "SHA1" }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well well well.... haven't you heard the news about sha1?
But yeah, since we don't support anything better anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHA1 may not be safe against attacks, but collisions are still overwhelmingly unlikely. Even MD5 should be okay.
csync/src/csync_reconcile.c
Outdated
// It could be a conflict even if size and mtime match! | ||
// When we have the remote checksum available we can detect | ||
// it without downloading the file. | ||
is_conflict |= (ctx->current == REMOTE_REPLICA ? cur->checksumHeader : other->checksumHeader) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is duplicated (and different, in particular the check for collisionSafeHashes) from the logic in PropagateDownloadFile::start
Have you considered always creating the Download job (in any cases) and have the logic only there.
(But it's true that if it's is_conflict is strictly more "true" than the logic in PropagateDownloadFile::start, that's fine if they are different. as long as PropagateDownloadFile does not actually fetch the file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is an "early out" optimization to avoid creating unnecessary jobs. However, it won't gain us much in practice when servers generally have hashes for all files. It may be worth it to remove this logic and just keep the is_conflict=true
case to simplify the code. Do you agree?
This will lead to a strong behavior change though: Previously, in the absence of hashes, we'd treat NEW/NEW with same mtime/size as no conflict - now we would.
So I think it's a good idea to switch on the discerning behavior only if we have a hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that's what i had in mind.
This will lead to a strong behavior change though: Previously, in the absence of hashes, we'd treat NEW/NEW with same mtime/size as no conflict - now we would.
I was thinking the Download job logic would be changed not to download the file if mtime and size are the same, even if there is no hash.
src/libsync/propagatedownload.cpp
Outdated
if (_item->_instruction == CSYNC_INSTRUCTION_CONFLICT | ||
&& _item->_size == _item->log._other_size | ||
&& _item->_modtime == _item->log._other_modtime | ||
&& collisionSafeHashes().contains(checksumType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about doing it even if it's not a collisionsafe hash, it would still be better than before where we were not comparing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not allow it for Adler32 - it's not designed as a hash function, so collisions are even more likely than its short 32 bit output would suggest, in particular for short strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you accept collision without a hash (from the reconsile phase), but not if adler32 is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adler32 is not good enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing the point, it goes the other way.
We were considering the file to be equal if they have the same size and mtime.
When there is a checksum, we now add a security check by comparing the checksum.
This is useful for monitoring what kind of network requests are sent to the fake server. Such as "did this sync cause an upload?" and "was there a propfind for this path?". It can also inject custom replies.
* For conflicts where mtime and size are identical: a) If there's no remote checksum, skip (unchanged) b) If there's a remote checksum, create a PropagateDownload and compute the local checksum. If the checksums are identical, don't download the file and just update metadata. * Avoid exposing the existence of checksumTypeId beyond the database layer. This makes handling checksums easier in general because they can usually be treated as a single blob. This change was prompted by the difficulty of producing file_stat_t entries uniformly from PROPFINDs and the database.
@ogoffart I've fixed up the network observer addition, but left my other changes in a separate commit so they're easier to review. Let me merge when you're done, so I can wipe away that fixup commit. |
👍 |
merged manually |
Addresses #5589
@moscicki