Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store checksum in journal and verifiy upload for eml-files and others #3235

Closed
dragotin opened this issue May 13, 2015 · 26 comments
Closed

Store checksum in journal and verifiy upload for eml-files and others #3235

dragotin opened this issue May 13, 2015 · 26 comments
Assignees
Labels
blue-ticket Discussion p2-high Escalation, on top of current planning, release blocker
Milestone

Comments

@dragotin
Copy link
Contributor

Once #2542 is implement, the checksum should be stored in local sync journal. We could use it to verify if an upload is needed, or if just the files metadata (mtime specifically) has changed. This would solve trouble we have with eml files on windows, and this could be a nice first usage case.

Problems: If we detect that only the mtime has changed, we could propagate the new mtime to the server, using PROPPATCH (needs to be verified). But that will probably not change the ETag of the file and also not the ETags of the parent directories. If it would, a client seeing a changed ETag would not understand that only the mtime has changed, and not the content of the file.

Note to self: If the server could do file checksums as well, it should also maintain directory checksums. The checksums only would change if the file contents or the contents of files within a directory changes. The ETag would just be an indication that something changed, either content or metadata. The client would need to consider the checksum to find that out.

@guruz
Copy link
Contributor

guruz commented May 15, 2015

It would be good in a way to have it support multiple checksum algos at the same time.
For example Adler32 is only good for error or change detection but not for deduplication (e.g. for #230)

@dragotin dragotin self-assigned this May 19, 2015
@MTRichards MTRichards modified the milestones: 2.1-next, 2.0 - Multi-account Jun 19, 2015
@MTRichards
Copy link

Moving to 2.1 to keep 2.0 reasonable size.

@cdamken
Copy link
Contributor

cdamken commented Jun 26, 2015

@MorrisJobke

00002839

@dragotin
Copy link
Contributor Author

@cdamken please elaborate. Pasting numbers and adding people is not enough.

@MorrisJobke
Copy link
Contributor

@cdamken please elaborate. Pasting numbers and adding people is not enough.

I think this is just for reference. I was mentioned, because it became a blue ticket and the number is the backlink to their system ;)

@dragotin
Copy link
Contributor Author

@MorrisJobke yes, I know, but I would like to know the content of the blue-ticket as well.

@cdamken
Copy link
Contributor

cdamken commented Jun 29, 2015

@dragotin The content of the ticket: *.EML files can't be deleted, even they are deleted in one client will reappear again and again, additionally generates a lot of network traffic.

@guruz
Copy link
Contributor

guruz commented Oct 2, 2015

@dragotin @ogoffart @ckamm @moscicki I was in general wondering if we should have a csync_journal table
phash, mtime, checksumtype, checksum that is only lazily filled.
Else we risk taking a lot of time to checksum everything..

@ckamm
Copy link
Contributor

ckamm commented Oct 14, 2015

Storing the checksums in the DB will be part of #3735, this ticket can then be about using them to avoid unnecessary uploads.

@ckamm
Copy link
Contributor

ckamm commented Nov 11, 2015

This has a couple of steps

  • Calculate checksums on upload even if the server doesn't support them, store them in the local db
  • If the new checksum is the same as the old one, assume the content is the same.
    • This could either be done while reconciling the local and db trees or in the upload propagation job.
  • Do a remote metadata update instead of a full upload.
    • Check that PROPPATCH works

@moscicki
Copy link
Contributor

I would guess you'd want md5 for that. The interesting bit is to see if it
is possible to marry it with adler32 in the server used to check the
transfers.

On Wed, Nov 11, 2015 at 4:40 PM, ckamm [email protected] wrote:

This has a couple of steps

  • Calculate checksums on upload even if the server doesn't support
    them, store them in the local db
  • If the new checksum is the same as the old one, assume the content
    is the same.
    • This could either be done while reconciling the local and db
      trees or in the upload propagation job.
  • Do a remote metadata update instead of a full upload.
    • Check that PROPPATCH works


Reply to this email directly or view it on GitHub
#3235 (comment).


Best regards,
Kuba

@ckamm
Copy link
Contributor

ckamm commented Nov 12, 2015

I've tested PROPPATCH for updating file mtimes on the server, but it currently makes the file etag change (owncloud/core#20474).

@dragotin and @ogoffart agree that in that case we are better off not telling the server about the new mtime, as it'd cause all other clients to re-download the unchanged file.

ckamm added a commit to ckamm/owncloud-client that referenced this issue Nov 23, 2015
ckamm added a commit to ckamm/owncloud-client that referenced this issue Nov 23, 2015
@ckamm
Copy link
Contributor

ckamm commented Nov 23, 2015

Current pull request is #4180

ckamm added a commit that referenced this issue Nov 23, 2015
ckamm added a commit that referenced this issue Nov 23, 2015
* Compute the content checksum (in addition to the optional
  transmission checksum) during upload (.eml files only)

* Add hook to compute and compare the checksum in csync_update

* Add content checksum to database, remove transmission checksum
@ckamm ckamm added the ReadyToTest QA, please validate the fix/enhancement label Nov 23, 2015
@Dianafg76
Copy link

@ckamm what are the steps to reproduce this issue?

@ckamm
Copy link
Contributor

ckamm commented Nov 25, 2015

@Dianafg76

  • Create a new .eml file locally, wait for it to upload
  • Update the mtime of the file without changing the contents (touch command)
  • Verify that the client does not re-upload the file. (log file or activity log)
  • Verify that the client still uploads the file if the content changes.
  • Verify that the client still uploads other files. :)

@ckamm ckamm removed the Needs info label Nov 25, 2015
@Dianafg76 Dianafg76 added approved by QA and removed ReadyToTest QA, please validate the fix/enhancement labels Nov 27, 2015
@Dianafg76
Copy link

I tested this issue and is working OK
Desktop v ownCloud-2.1.0.5655-nightly20151126-setup.exe
Server v {"installed":true,"maintenance":false,"version":"8.2.1.4","versionstring":"8.2.1","edition":"Enterprise"}

Closed

@mocolini
Copy link

It seems that this bug affects owncloud-client yet.
If I don't remove ".eml" in "HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindowsCurrentVersionPropertySystemPropertyHandlers" the client keep uploading (unchanged) *.eml files.
So, every time windows update itself (and so the ".eml" registry key is set back to default), hundred of .eml files (and MB) are re-uploaded to the server.

EDIT:
O.S.: windows 10 Home
Client version: 2.1.0 (build 5683)

@guruz
Copy link
Contributor

guruz commented Jan 19, 2016

@mocolini Are you saying the content of those .eml files does not change but they are still reuploaded?

@mocolini
Copy link

@guruz exactly.
Today windows 10 upgrade itserlf and re-set the ".eml" key in the register (the one I've removed days ago), after that owncloud-client has begun to re-upload all the (unchanged) *.eml files.

@ogoffart
Copy link
Contributor

I think it needs to be uploaded at least once after upgrading before the checksum are stored in the databse and are compared.

Does it re-upload several time the same file?

@mocolini
Copy link

To be onest I don't know, 'cause I suddenly remove the ".eml" key .
So the upload is performed just one time to "populate" the db? Am 'I right?
And this happen for each windows PC, right?

@ogoffart
Copy link
Contributor

yes, previous version of the client did not store the checksum in the database. and we only store it after we complete an upload or a download.

@mocolini
Copy link

Ok, perfect, I misunderstood the client behavior.
I'll check if the problem persist after complete upload/download, in that case I'll bother you again!
:-)

Cheers
Michelino

@guruz
Copy link
Contributor

guruz commented Jan 19, 2016

Sounds good :)
If you re-added the registry key, you can also try to remove it..

@mocolini
Copy link

Sure, I'll let you know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blue-ticket Discussion p2-high Escalation, on top of current planning, release blocker
Projects
None yet
Development

No branches or pull requests

10 participants