-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid redownload by client of locally existing files (eg copied by rsync) #1383
Comments
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
I have the same issue..... Client REdownload already existing files... |
I still would gladly welcome this new feature to add existing data into the nextcloud sync process. For my use case I eventually chose the indirect path, by removing my nextcloud account, add it again telling nextcloud to keep existing files. This works but its definitively a little cumbersome. |
@rainer042 this is interesting. Do you know whether those existing files will be synced again if they're changed later on? |
@FriendFX I think if you have installed your new os and then create a new nextcloud account and tell nextcloud that its data directory is exactly the directory with your copy from your nextcloud root folder backup and also tell nextcloud to keep existing data this should work without any problems. In this case ideally no file should be synced initially and if you change any of the files on the client or the server side afterwards a sync would be triggered by your nextcloud client. For me at least this works perfectly. |
I recently updated the desktop client to 3.0.1, and it disabled sync of three of my folders due to their size. They were being synced before I upgraded the client (which is a separate bug). When I enabled sync of those three folders, the nextcloud client re-downloaded all 8Gb of data that was already on my local disk, logging "File has changed since discovery" messages. (Bandwidth isn't free, btw). So this is a sync defect that isn't only triggered by adding files with external software ... |
+1, I have 2TB of local data to sync up to the server folder which is 99% identical. Re-downloading it all is exactly what I'm trying to avoid. I think I'm going to switch to SyncThing, which has a number of other aadvantages. |
checksum support ticket here: nextcloud/server#11138 that would make it possible for the client to detect that the files are identical and skip syncing now, without this, maybe an expert option somewhere to tell the sync client to assume that all files in a folder are identical and just retrieve the etag/metadata and update it locally |
I'm also struggling with this. Rsync tells me everything is identical, even timestamps, but the client still tried to redownload everything. |
Yes, it look like it's not the file dates/checksums, it's whether they are in the local db. There doesn't appear to be any officially supported way of getting them there. You could possibly do that insert, but I suspect that by the time you do that, you could almost have created a PR ;-) Anyway. I don't have time for that, so I am uysing Syncthing which seems much better suited for a one way sync and appears to be much more stable and mature. |
I actually managed to make it recognize existing files by deleting the local sync folder config but with keeping the files. Then I added back the sync folder and pointed the config at the existing data. It seems to have recognized existing files by mtime/size as it didn't resync all of them. So at least that one is a possible workaround. |
For the record, it checks mtime, etag and inode. So if the inode changes it'll assume something changes locally. |
The inode is a bad metric for that. I am just experiencing the same issue after I rsync'd my files from my old PC to my new one. Now the Nextcloud client (V 3.1.3) redownloads all files again although they are 100% identical. Well, at least I have only 32GB of data. But it' still not necessary. |
Welcome to another absurd issue. Re-invent the wheel but as a square; inode + assumption = genius.. Prompt the user if you are not sure. This issue still exists after almost 2 years. |
Another plug for SyncThing. I have been using it for nearly 6 months now and it's been fantastic. You can control it (eg. timed pause/resume) from the command line via REST, you can set up one or two way shares, you only need to do port redirection at one end. |
This problem also happens when I add stuff to a synced folder on Windows and I reboot to Linux. The client on Linux re-downloads what I uploaded with Windows. This is extremely annoying as my internet also isn't very fast. I tried deleting the database created by the app after exiting it, but it would still re-download after checking the files. I'm becoming tired of having to deal with this client. I heard some scary things about SyncThing but I think I'll try that instead anyway. There is no way I keep re-downloading everything after each reboot. |
Wow. I just came across this issue when looking to avoid unnecessarily downloading data that is already on a new Windows client. This is a major flaw and will likely force me to look at SynchThing to avoid this issue. |
I started using SyncThing instead and it works much better than the Nextcloud client does. I recommend it. |
How did you go about doing this, with issues like permissions and scanning? |
I just use the "SetGID" (2xxx) option along with a chmod of 775 on the main folder, and I also changed the umask of the syncthing user to 775. You could do the same for the webserver user (www-data in my case) but I'm not a fan of that so the only way I found is setting a cron task that does chmod 775 for the whole folder and its subfolders every once in a while... |
I'm having the same problem as well. My nextcloud is 1TB large and is my offsite backup. I always have a local backup of my laptop (Veeam for Windows) and copied all files of my nextcloud user to my new notebook. But Nextcloud Desktop Client for Windows (Version 3.2.2) wants to redownload everything - although everything is still there. How is the status here? How can you transfer all user files which are in the nextcloud user directory from one computer to another without redownloading? Thank you! |
@unpairedliabilitylibrarian At the moment there is no other way than to download everything again. |
This bug report did not receive an update in the last 4 weeks. Please take a look again and update the issue with new details, otherwise the issue will be automatically closed in 2 weeks. Thank you! |
According to my test, the latest owncloud desktop client has implemented this feature and it works with nextcloud server. @gpillay |
Maybe if you copy the whole folder including the hidden nextcloud synchronization state files and use the same nextcloud instance. I had to switch to another nextcloud instance and it did not work. |
In my test, I removed the sync connection on NC client after it synced with server and then add the sync connection using OC's client. OC would skip the files while NC does not. |
Ah sorry I overread that you are using the owncloud client too. |
Hello Nextcloud team, I would really appreciate this feature since I am planning to sync over 500 GB of data between my desktop and laptop. All the data is already present on both computers. I'd very, very much prefer not to download all 500 GB to my laptop again after setting up NC on my desktop, when the local data is already identical. Is anybody working on this issue? Is there a fix planned? |
In the meantime, I have a question... Would a workaround be to copy the .sync_########.db files from PC A (already synced) to PC B (to be connected to NC, but already has all the data locally)? Are there any dangers associated with this? EDIT: Tried. This seems to work. WORKAROUND Scenario: you have fully synced a few folders on computer A to a fresh NC instance. (in my case, each of those folders had its own sync connection, screenshot: https://paste.pics/DL1EM). Now, you have all the same data already locally on computer B and you want to connect it to the same NC server without it redownloading all that data. Steps (this is how I did it, from the beginning, with a brand new NC server):
Notes: Questions: BUT: again, am I missing any hidden dangers here? |
there might be an easier way, this is what worked for me a few months ago: #1383 (comment) basically on the target machine start with an empty config, and then add an existing folder that points at the remote NC instance and also the local sync folder that has the data already in it. I believe there might be some special logic when initially adding a sync folder so it makes the effort to recognize existing data. It seems that logic is not active when doing a regular sync with existing configs, so the key is to re-add the sync folder to the config. |
Interesting! So what you're proposing is this?
Am I understanding that correctly? |
No, as far as I remember it was even easier:
In my experience back then with about 1 TB file it still synced about 20 GB of data for whatever reason, but left everything else alone. I don't think you need to delete the dot files because you anyway did not have any config. |
Ahhh, gotcha. I think step 3 is only possible, though, when setting up the folders during the initial setup, not after the client has been configured once. If I open Settings and add a new folder connection, it doesn't ask me whether I want to keep local data (I'm on Linux, might be different on other OS clients). My setup is kinda specific. I only want to sync three folders in the root of my ~/home folder. I make 3 individual folder sync connections for them because I've burned my fingers before by selecting the whole ~/home folder to sync and de-selecting the folders I didn't want. One wrong checkmark and you might end up deleting local stuff. So it's a catch-22. If I do the folder connections during initial setup, I can tell it to keep local data. But I can only set up one folder sync. If I want to add more folder syncs, I need to do it later, where I don't get the option to keep data... |
how to verify this ? I guess checking in oc_filecache to see if there's a hash in the column ? |
We have a slightly different situation but with the exact same problem. We are using GitHub Desktop + LFS to sync our Unreal project across multiple locations. To save on bandwidth we copy the project over to a new machine then link the repo on that machine. However when we do this it seems to download every single file again as described. Our project is usually 50-100gb and sometimes needs to be shared across 5-10 machines, so the bandwidth stacks up pretty fast. Still seems like there isn't actually a solution to this? any help would be greatly appreciated |
My solution from last August has been working for me since then: |
I had an OS partition crash last week so had to reinstall the OS and also lost the Nextcloud client config in the way. When I reconfigured the Nextcloud desktop client from scratch and pointed it at the existing data (1 TB+), it did not redownload everything. The way I did it is to skip the wizard and then configure the sync folders manually later on. I did see some downloads but I believe it's because I forgot to exclude folders from selective sync, some I had excluded before the crash. |
This didn't work for me, it scans for changes then says it needs to download all the files again. |
I'm also having this issue, except it appeared out of absolutely nowhere. 3.4TB all synced up then NC client randomly decides it needs to re-download it all over again. |
A quick follow up... I wasn't a fan of syncthing - it also tried to re-download hundreds of GBs of identical files. Resilio Sync, however, is working very well so far. My only gripe is that it's quite limited in configuration options, but hopefully the default settings you're stuck with continue to work OK. |
Still working workaround, verified today with NC 3.6.1 Hey folks, I needed to do this again today as I moved to a different managed NC provider (Hetzner, from IONOS). So I used NC client on my desktop PC (computer A) to upload all my stuff, about 500 GB to the new server. Now, I'm hooking up the NC client on my laptop PC (computer B), where all the data is already present. With this point of departure, the following steps still work reliably for me:
It'll re-sync the few megabytes it downloaded before you closed it in step 5, then you'll be fine. EDIT: video: https://youtu.be/nS8XpbTS928 |
Since comparing checksums is implemented now for specific cases, couldn't this be extended to cover this use case too - when two files have the exact same size, even if their mtime is different, do a hash comparison, and only re-sync if the hash differs. This effectively would probably mean that inode and mtime data would then only be used for change recognition, but not for difference recognition. |
This is kind of important with external storage, because it always downloads the entire external storage after I unchecked and re-checked it in the client. With many S3 providers, egress is not free. Although I guess it would be financial suicide to enable external storage with nextcloud without unlimited egress, since who knows how many "transactions" it will produce that may be billed, despite no real data transfer occurring. |
Expected behaviour
Tell us what should happen
Files that have been copied via rsync into a clients local nextcloud folder are redownloaded by the nextcloud client altough this should be avoided.
Actual behaviour
I copied a testfile into a nexclouds client subfolder that I disabled for syncing in the nextcloud client before. Then I reenabled syncing again for this subfolder and the file was downloaded again from the server instead of using the local identical copy.
Steps to reproduce
Client configuration
Client version: 2.5.1
Operating system:
Linux, OpenSuE Leap 15.1
OS language: German
Qt version used by client package (Linux only, see also Settings dialog):
Built from Git revision b37cbe using Qt 5.9.7, OpenSSL 1.1.0i-fips 14 Aug 2018
Client package (From Nextcloud or distro) (Linux only):
From distro
Installation path of client:
/usr/bin/nextcloud
Server configuration
Nextcloud version: 16.0.4
Storage backend (external storage):
Disk
Logs
I found issue #3422 from 2017, where this problem was already discussed. However I did not find a solution up to now if there is any.
Personally what I want to do is what I described here:
avoid-complete-nextcloud-resync-to-client-if-data-are-already-on-the-client-via-rsync
Thanks in advance
Rainer
The text was updated successfully, but these errors were encountered: