Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checksums to Verify Up- and Download #12417

Closed
dragotin opened this issue Nov 25, 2014 · 15 comments
Closed

Checksums to Verify Up- and Download #12417

dragotin opened this issue Nov 25, 2014 · 15 comments
Assignees
Milestone

Comments

@dragotin
Copy link
Contributor

Some storage backends support hash sums of files. These could be used to ensure the files integrity when up- or downloaded between sync client and server. This adds another level of security.

The basic idea is to add a hashsum as HTTP header to GET or PUT and compare on the other communication side. If the numbers do not match, the transmission has to be an error.

@moscicki has drafted a specification: https://github.com/cernbox/smashbox/blob/master/protocol/checksum.md as discussed on a meeting with @DeepDiver1975 and @dragotin.

@karlitschek
Copy link
Contributor

Makes sense from my point of view if the checksum is provided by the storage. It not than it is my opinion that it is out of scope of ownCloud to maintain checksums to ensure file system consistency. This is seriously something the OS or file system has to do.
A challenge with storage provided checksums might be that they could be provided in different formats.

@DeepDiver1975
Copy link
Member

Regarding the serverside implementation this falls into the area of #12052 (comment)

We need interception points on the upload mechanism - see above

@moscicki
Copy link

@karlitschek: yes, in our case we are happy to provide the checksums for you to use directly by the sync client or by the owncloud server via object storage plugins we developed.

You'd still have to think about your customers using owncloud on top of other storage, for example NFS built of commodity hardware -- what if the NFS server is rebooted while the changes are not commited to the disk? It's is not the owncloud job to fix storage but I think it is the owncloud job not to propagate corruptions to the client if the storage is able to provide checksums (and possibly help the service managers to recover from such a situation using the client's local sync folder which may contain the uncorrupted file). I think nobody of your big customers asked for it yet because simply they are unaware how many corrupted files they have (also due to software bugs which we discovered lately). So this waits for the users to discover there is something wrong with their files. I think this is the bottom line of this discussion. I think especially sync should come with this extra protection layer if offered by the backend server.

@bantu
Copy link

bantu commented Feb 28, 2015

I am not sure whether this is useful in this context, but for the record: https://github.com/bantuXorg/php-stream-filter-hash allows checksumming data written to a PHP stream.

@PVince81
Copy link
Contributor

Once again random data corruption has been found with up/download which could be avoided by checksums: owncloud/client#2893

@PVince81
Copy link
Contributor

Can we get this scheduled ?

@DeepDiver1975 DeepDiver1975 self-assigned this Mar 19, 2015
@DeepDiver1975 DeepDiver1975 added this to the 8.1-current milestone Mar 19, 2015
@DeepDiver1975
Copy link
Member

Let's start to add the checksums on the transport layer when putting a file from the client to the server.
In a second step we can add a getChecksum() method to the storage interface to transmit the checksum to the client on PUT.

@PVince81
Copy link
Contributor

The way I see, this is what needs to be done:

Part 1

  • client sends checksum along with the PUT operation (possibly on this branch https://github.com/owncloud/client/commits/checksum_verify)
  • server part that writes the part file (either from chunks or directly) must compute the checksum on the fly (rolling checksum), then compare it with the one submitted with PUT. If mismatch: return HTTP code for error

Part 2

  • server stores received checksum in oc_filecache (needs new column)
  • on PROPFIND, return checksum in a new attribute (but only if one is available in DB)

@DeepDiver1975
Copy link
Member

server stores received checksum in oc_filecache (needs new column)

I'd use the checksum on filecache as fall back if the storage does not provide the checksum

@DeepDiver1975
Copy link
Member

on PROPFIND, return checksum in a new attribute (but only if one is available in DB)

header on GET is required as well - https://github.com/cernbox/smashbox/blob/master/protocol/checksum.md#get

@karlitschek
Copy link
Contributor

Good plan. 👍

@PVince81
Copy link
Contributor

Defer to 8.2 as we're post feature freeze ?

@PVince81 PVince81 modified the milestones: 8.2-next, 8.1-current Apr 23, 2015
@dragotin
Copy link
Contributor Author

We had a discussion about this on the ownCloud contrib conf today again. It was agreed that as a first step, the server should be enhanced to store the checksums the client is sending along with a PUT request. The server will return the checksums in the attributes list that is returned by a PROPFIND on a certain resource by default. Also, the server will add the checksum as a custom header to GET requests.

Even though it is completely clear that checksums will not always be there, we can already benefit from this first step in a couple of scenarious.

@karlitschek
Copy link
Contributor

i created a new issue for that #18716

@dragotin
Copy link
Contributor Author

ok, lets track it there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants