-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow remote-delta on sync files #16162
Comments
Hi @gadLinux , thank you for trying to help on this issue. The client is no longer using httpf. And is using only very few parts of csync. You need to contribute in the client repository. (and in the core) You will find the code that uploads files in Good luck :-) |
Some brainstorming: There needs to be a list of blocks with its according checksums stored on the server for every file. It will be accessible through a special route on the server API. The client will need to fetch the list once it detected that an existing file has changed and is to be uploaded. Once the blocklist has arrived, the client needs to recalculate the blocklist on the new, changed file. After that, the client can compare the lists and identify the blocks that have a changed checksum. These are the blocks that need to be uploaded. To upload only parts of files, maybe the HTTP PATCH command (RFC) helps. The server needs to be able to handle this command and reassemble the whole file. For files that appear new on the client, the client will have to calculate the blocklist and send it along the initial upload to the server to avoid having the server to calculate the list. Also, for each uploaded block, the client will send the new checksum along. The server will either recalculate or invalidate the blocklist for files that were changed on third party storage. The client needs to be able to deal with the fact that the server can not provide a blocklist for a file and will transparently fall back to uploading the entire file. The kind of checksum is configurable, and will be adjustable by a server configuration later. |
A good starting point for a rsync-like approach using HTTP is zsync, aka client-side rsync. I don't think using librsync is feasible, as we want to stick to HTTP(S) as transport protocol. |
Hi Olivier, Thank you for the feedback. I will see what parts are in use on csync. I suppose that the first way is easier and better for now. Let me check the code. What's the most recent branch of development in both projects? I found Best regards, El 22/05/15 a las 12:20, Olivier Goffart escribió:
|
@gadLinux Please stick with HTTP. Other protocols are not usable in certain scenarios (Port 80 and 443 are always open, others often are not). On top of that, another advantages with the HTTP-based zsync approach is that the server can remain mostly passive, which shifts a lot of load to the client. This improves scalability. |
Hi, So it's important to reimplement solution over HTTP. Okay, I hope I can Did you think about using thrift? Best regards, El 25/05/15 a las 11:40, Daniel Molkentin escribió:
|
It does not have to be HTTP, but it would be better. (The configuration of an owncloud server has to remain simple). Anyway if it doesn't work we can still fallback to the normal method. Anyway, as I said, we are slowly moving away from csync, and new code on the client should be written in C++ in the libsync directory. Bittorrent-like protocol would be something different, that opens its own cans of worms (how do you do authantication or security if you are peer to peer?) |
Do not try to solve two problems at once. The basic idea behind ownCloud is that the server stays in control of the data, so p2p would involve getting a server authorization first. This requires oauth, which is scheduled for one of the next ownCloud server versions. The p2p implementation would then be purely client-side, and not involve the server at all, except for the authorization request. |
and the second one is use something like bitttorrent. So it can transfer
Yo are right, I agree. Let me take a look to the partial file sync problem. |
@gadLinux perhaps it makes sense to have a Skype call about this to avoid a lot of work going in a direction that won't work ;-) |
Ok so I have been looking into this and reading up on zsync (and by extend rsync). And I think this is something that can be done. Luckily the zsync library is available so I'm modifying that to get a POC working. Since at least the upload use case is not what is was designed for. I'll hopefully soon put that work on github. So stay tuned. But let me here (quickly) write down how I think this should work. Assumptions
SetupServerThe server has only 2 tasks. Store the -zsync files. And generate the new file. This is to keep things scalable and not have the server do all the checksumming. We need to come up with a protocol to tell the server how to assemble the new file. I'll be thinking about that. We do require space for this to work since you need to copy the original file. And only copy it back once everything is done. Probably locking is also a good idea. ClientIn the normal use case of zsync the client figures out which parts are needed from the server. Now we need to find the parts in common with the server. Which parts needs to go. And which parts need to be added. This requires some new code but the info should all be available. Since the server is not doing much here all the computational load is shifted to the client. Also in the current setup we trust the clients to generate the zsync files (this is hell in PHP) and send them. From my point of view this is fine since any client can just do a PUT request now anyway. I hope to have a POC (in C so easily portable to the client) soonish. CC: @danimo |
I think it should work this way both for the upload and the download because we already know which file is the "newer" one. The client calculates the parts which have to be transferred and the direction is then depending on the sync status. For the upload there should be something like a HTTP PATCH request. |
Hi, Looks great! Finally I didn't have time to go into deep. But if you fail Best regards, El 30/06/15 a las 09:37, Roeland Douma escribió:
|
Just curious - does this play into this and might it be interesting to make sure it can work with it? https://dragotin.wordpress.com/2015/06/22/owncloud-chunking-ng/ |
@dragotin and myself have been talking about delta-sync as well - there are 2 more blog posts to be expected ... |
@powerpaul17 sure it should work like this both ways. But the download case is easier. Since the zsync file tells you what has to be done. you can download using http range requests. And all is well. Some form of HTTP PATCH would indeed be best for the upload case. @jospoortvliet this should be independant of that. Altough it could (and should) still make use of chunking if there are a lot of changes. @DeepDiver1975 awesome looking forward to that. So my code now "works"... at least on the limited set of tests I've thrown at it. It is inefficient. But works. And hopefully will be documented enough soonish to be available for discussion. |
Hello, i'm really interested about testing your code @rullzer |
Sadly no reply yet from the zsync devs (regarding the licence). But here is some code. A very simple POC owncloud app: https://github.com/rullzer/deltasync If I have some time soonish I'll update with steps on how to test. |
Hi, How do you avoid the deftasync file to be syncronized? Best regards, El 30/07/15 a las 10:27, Roeland Douma escribió:
|
O you don't currently. |
@rullzer Very nice. @dragotin @DeepDiver1975 FYI |
Wondering if rfc3253 is related to this problem. |
For large amounts of files I believe that something in the line of https://en.wikipedia.org/wiki/Merkle_tree might be a good idea. If such a tree structure were used it would suffice to compare two hashes to know that nothing changed. If a single chunk was to changed, it could be found in logarithmic time. This kind of structure would increase the round-trips so it might need a bit of tweaking. Maybe sending 1k nodes instead of just one would improve it. |
Hello again. @jospoortvliet I left this because it seems that @rullzer was already implementing something. I'm still interested on it. What's the status of this issue? @lavrentivs About the merkle tree. We are using this in one of our platforms but maybe this can apply here. I don't know if this would work for a large amount of files or just for big files. I will ask the one tha implemented this here. Anyway. I saw a lot of improvements in server and client since my last publish but this seems to be still stalled. Should I go or do I wait for @rullzer final solution? |
@gadLinux My approach has a licencing issue wich I'm trying to figure out. Also implenting this as is is not so trivial since we need proper server side support to do delta uploads/downloads. Which is currently not there. And I prefer to have that eventually in in a nice way instead of hacking it in. That being said. If you think you have a nice approach please go ahead. :) |
@rullzer I'm a totally newbie on owncloud specific stuff. So no. I don't have a nice approach. But when I have something cool to implement there's no barriers... :-) I took a look to your implementation. And got this: ./uploadclient README.md.zsync README.md http://owncloud-dev /files/readme admin admin Started delta sync Maybe doing something wrong... The licensing issue could be a problem. But I don't understand much about this kind of stuff. Can I ask why zsync? Why not rsync directly or Unison? I'm still reviewing architecture. And sincerely I thought you already had the solution for it since it looks nice. But I will tell you if I work on something. |
Question: Why enable delta-sync for larger files only? Meaning: What is the reason behind this configuration parameter? |
@hodyroff it's client configurable because there is inherent cpu over-head to do a zsync upload/download, so it might not make sense for small files, also currently |
The basic approach is to store zsync metadata files in a folder called `files_zsync/` which stores them based on fileid. These metadata files can be requested by the client via a new route `dav/files/$user/$path?zsync`. They can also be deleted using the same route. This is implemented using a new `ServerPlugin` called `ZsyncPlugin`. Filesystem hooks are used to mirror any `copy/delete` operation on the base file or containing folders onto the metadata files. To ensure any changes server-side changes are will not produce out-of-sync metadata. The upload path is implemented by creating a new plugin `ChunkingPluginZsync`. The chunk file ids are now assumed to be named as the offsets into the original file. Special handling is done when a chunk named `.zsync` is found which is the generated client-side metadata. This means copying the contents to the `files_zsync/` folder. The core reason behind this is to ensure that both the metadata and the file are put in place atomically, as part of the final `MOVE` request. The implemenation adds a new class `AssemblyStreamZsync` which extends `AssemblyStream` with additional support to fill in the data between chunk offsets from a `backingFile`. A new `zsync` capability is added to the dav app, which can be checked by the client to know if delta-sync is supported or not. A zsync dav property is also returned for files which have metadata on the server. This commit closes owncloud#16162.
The basic approach is to store zsync metadata files in a folder called `files_zsync/` which stores them based on fileid. These metadata files can be requested by the client via a new route `dav/files/$user/$path?zsync`. They can also be deleted using the same route. This is implemented using a new `ServerPlugin` called `ZsyncPlugin`. Filesystem hooks are used to mirror any `copy/delete` operation on the base file or containing folders onto the metadata files. To ensure any changes server-side changes are will not produce out-of-sync metadata. The upload path is implemented by creating a new plugin `ChunkingPluginZsync`. The chunk file ids are now assumed to be named as the offsets into the original file. Special handling is done when a chunk named `.zsync` is found which is the generated client-side metadata. This means copying the contents to the `files_zsync/` folder. The core reason behind this is to ensure that both the metadata and the file are put in place atomically, as part of the final `MOVE` request. The implemenation adds a new class `AssemblyStreamZsync` which extends `AssemblyStream` with additional support to fill in the data between chunk offsets from a `backingFile`. A new `zsync` capability is added to the dav app, which can be checked by the client to know if delta-sync is supported or not. A zsync dav property is also returned for files which have metadata on the server. This commit closes owncloud#16162.
The basic approach is to store zsync metadata files in a folder called `files_zsync/` which stores them based on fileid. These metadata files can be requested by the client via a new route `dav/files/$user/$path?zsync`. They can also be deleted using the same route. This is implemented using a new `ServerPlugin` called `ZsyncPlugin`. Filesystem hooks are used to mirror any `copy/delete` operation on the base file or containing folders onto the metadata files. To ensure any changes server-side changes are will not produce out-of-sync metadata. The upload path is implemented by creating a new plugin `ChunkingPluginZsync`. The chunk file ids are now assumed to be named as the offsets into the original file. Special handling is done when a chunk named `.zsync` is found which is the generated client-side metadata. This means copying the contents to the `files_zsync/` folder. The core reason behind this is to ensure that both the metadata and the file are put in place atomically, as part of the final `MOVE` request. The implemenation adds a new class `AssemblyStreamZsync` which extends `AssemblyStream` with additional support to fill in the data between chunk offsets from a `backingFile`. A new `zsync` capability is added to the dav app, which can be checked by the client to know if delta-sync is supported or not. A zsync dav property is also returned for files which have metadata on the server. This commit closes owncloud#16162.
The basic approach is to store zsync metadata files in a folder called `files_zsync/` which stores them based on fileid. These metadata files can be requested by the client via a new route `dav/files/$user/$path?zsync`. They can also be deleted using the same route. This is implemented using a new `ServerPlugin` called `ZsyncPlugin`. Filesystem hooks are used to mirror any `copy/delete` operation on the base file or containing folders onto the metadata files. To ensure any changes server-side changes are will not produce out-of-sync metadata. The upload path is implemented by creating a new plugin `ChunkingPluginZsync`. The chunk file ids are now assumed to be named as the offsets into the original file. Special handling is done when a chunk named `.zsync` is found which is the generated client-side metadata. This means copying the contents to the `files_zsync/` folder. The core reason behind this is to ensure that both the metadata and the file are put in place atomically, as part of the final `MOVE` request. The implemenation adds a new class `AssemblyStreamZsync` which extends `AssemblyStream` with additional support to fill in the data between chunk offsets from a `backingFile`. A new `zsync` capability is added to the dav app, which can be checked by the client to know if delta-sync is supported or not. A zsync dav property is also returned for files which have metadata on the server. This commit closes owncloud#16162.
The basic approach is to store zsync metadata files in a folder called `files_zsync/` which stores them based on fileid. These metadata files can be requested by the client via a new route `dav/files/$user/$path?zsync`. They can also be deleted using the same route. This is implemented using a new `ServerPlugin` called `ZsyncPlugin`. Filesystem hooks are used to mirror any `copy/delete` operation on the base file or containing folders onto the metadata files. To ensure any changes server-side changes are will not produce out-of-sync metadata. The upload path is implemented by creating a new plugin `ChunkingPluginZsync`. The chunk file ids are now assumed to be named as the offsets into the original file. Special handling is done when a chunk named `.zsync` is found which is the generated client-side metadata. This means copying the contents to the `files_zsync/` folder. The core reason behind this is to ensure that both the metadata and the file are put in place atomically, as part of the final `MOVE` request. The implemenation adds a new class `AssemblyStreamZsync` which extends `AssemblyStream` with additional support to fill in the data between chunk offsets from a `backingFile`. A new `zsync` capability is added to the dav app, which can be checked by the client to know if delta-sync is supported or not. A zsync dav property is also returned for files which have metadata on the server. This commit closes owncloud#16162.
The basic approach is to store zsync metadata files in a folder called `files_zsync/` which stores them based on fileid. These metadata files can be requested by the client via a new route `dav/files/$user/$path?zsync`. They can also be deleted using the same route. This is implemented using a new `ServerPlugin` called `ZsyncPlugin`. Filesystem hooks are used to mirror any `copy/delete` operation on the base file or containing folders onto the metadata files. To ensure any changes server-side changes are will not produce out-of-sync metadata. The upload path is implemented by creating a new plugin `ChunkingPluginZsync`. The chunk file ids are now assumed to be named as the offsets into the original file. Special handling is done when a chunk named `.zsync` is found which is the generated client-side metadata. This means copying the contents to the `files_zsync/` folder. The core reason behind this is to ensure that both the metadata and the file are put in place atomically, as part of the final `MOVE` request. The implemenation adds a new class `AssemblyStreamZsync` which extends `AssemblyStream` with additional support to fill in the data between chunk offsets from a `backingFile`. A new `zsync` capability is added to the dav app, which can be checked by the client to know if delta-sync is supported or not. A zsync dav property is also returned for files which have metadata on the server. This commit closes owncloud#16162.
The basic approach is to store zsync metadata files in a folder called `files_zsync/` which stores them based on fileid. These metadata files can be requested by the client via a new route `dav/files/$user/$path?zsync`. They can also be deleted using the same route. This is implemented using a new `ServerPlugin` called `ZsyncPlugin`. Filesystem hooks are used to mirror any `copy/delete` operation on the base file or containing folders onto the metadata files. To ensure any changes server-side changes are will not produce out-of-sync metadata. The upload path is implemented by creating a new plugin `ChunkingPluginZsync`. The chunk file ids are now assumed to be named as the offsets into the original file. Special handling is done when a chunk named `.zsync` is found which is the generated client-side metadata. This means copying the contents to the `files_zsync/` folder. The core reason behind this is to ensure that both the metadata and the file are put in place atomically, as part of the final `MOVE` request. The implemenation adds a new class `AssemblyStreamZsync` which extends `AssemblyStream` with additional support to fill in the data between chunk offsets from a `backingFile`. A new `zsync` capability is added to the dav app, which can be checked by the client to know if delta-sync is supported or not. A zsync dav property is also returned for files which have metadata on the server. This commit closes owncloud#16162.
Is this finally merged? and bounty claimed? |
The server side code was merged at least |
This feature will be in the upcoming 2.6 alpha release. |
The basic approach is to store zsync metadata files in a folder called `files_zsync/` which stores them based on fileid. These metadata files can be requested by the client via a new route `dav/files/$user/$path?zsync`. They can also be deleted using the same route. This is implemented using a new `ServerPlugin` called `ZsyncPlugin`. Filesystem hooks are used to mirror any `copy/delete` operation on the base file or containing folders onto the metadata files. To ensure any changes server-side changes are will not produce out-of-sync metadata. The upload path is implemented by creating a new plugin `ChunkingPluginZsync`. The chunk file ids are now assumed to be named as the offsets into the original file. Special handling is done when a chunk named `.zsync` is found which is the generated client-side metadata. This means copying the contents to the `files_zsync/` folder. The core reason behind this is to ensure that both the metadata and the file are put in place atomically, as part of the final `MOVE` request. The implemenation adds a new class `AssemblyStreamZsync` which extends `AssemblyStream` with additional support to fill in the data between chunk offsets from a `backingFile`. A new `zsync` capability is added to the dav app, which can be checked by the client to know if delta-sync is supported or not. A zsync dav property is also returned for files which have metadata on the server. This commit closes #16162.
Hi,
I want to fix this improvement ASAP.
owncloud/client#179
It relates to the remote-delta implementation on client. But I suppose it must be supported on both sides. I saw that latest version of csync inludes a module for owncloud but it differs on current implementation of client on github.
in fact I saw that's using a special module that's not on the normal csync library that's called httpbf. It seems for me that's sending the whole file as PUT http?
So my question is. If you will want to implement rsync how will you implement it?
I was thinking about clone a module on csync and make it transfer using remote-delta (surely using rsync lib) instead of sending the whole file in chunks.
What do you think? Any suggestions?
Using httpbf. I can detect what's changed and send only the blocks that are changed. But this would mean extra work for nothing. Because librsync already implements this kind of processing we are not going to do ti twice.
Best regards,
The text was updated successfully, but these errors were encountered: