Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OneDrive] corrupted file transfer #399

Closed
Darkvater opened this issue Mar 22, 2016 · 17 comments
Closed

[OneDrive] corrupted file transfer #399

Darkvater opened this issue Mar 22, 2016 · 17 comments

Comments

@Darkvater
Copy link

Usingnrclone 1.28 on an ubuntu arm machine (raspberry pi).
I am trying to sync my onedrive contents to google drive - e.g. cloud to cloud. Out of the 24k or so files 465 consistently are corrupted and thus not transferred. I know that those files are not corrupted on onedrive or my local drive (that uses microsoft's onedrive client to sync files), however those files I am simply unable to sync. I have started out with an empty google drive folder so there are no duplicates.

See attached log files. The only correlation that I can see is that these files are all small, not more than 100k. On average they are 56k (median 57kb), with minimum size of 67 bytes and max 190kb. It does not matter how many times I retry, these files are consistently corrupted.

full log file here (10mb, result of -v) http://darkvater.homenet.org/rclone_server_6.txt
rclone_server_5.txt

Update: I am doing a test now to just sync onedrive to local and although it hasn't finished yet, I can already see corrupted files. I'll update in the morning, but it could be a onedrive issue instead as these files are correct in the cloud.

@Darkvater Darkvater changed the title [Google Drive] corrupted file transfer [Google Drive/OneDrive] corrupted file transfer Mar 22, 2016
@Darkvater Darkvater changed the title [Google Drive/OneDrive] corrupted file transfer [OneDrive] corrupted file transfer Mar 22, 2016
@ncw
Copy link
Member

ncw commented Mar 22, 2016

Is there any chance you could send me one of those files that does get corrupted?

Can you also post the command line you are using just for completeness?

I've uploaded 1,000 random files to drive with mean size 64k, then copied them to onedrive but I didn't manage to reproduce the problem :-( I'm using linux on ubuntu/amd64.

@Darkvater
Copy link
Author

Sure, no problem, here's a link to a onedrive folder (Public/ooenttd/goofs) with corrupted files in it: https://onedrive.live.com/redir?resid=8A5D53472BD1F293!586&authkey=!ANUn7GClDXvuwSQ&ithint=folder%2cpng

Command executed: tomi@alexandria:/mnt/sdb2$ work/rclone-v1.28-linux-arm/rclone --delete-excluded --filter-from ~/filter_file --dump-filters -v sync onedrive: local:stuff: 2>rclone_server_8.log

Local folder contents:

tomi@alexandria:/mnt/sdb2/stuff:/Public/openttd/goofs$ ls -l
total 664
-rw-r--r-- 1 tomi users 324590 Jan 22  2008 595x395.png         
-rw-r--r-- 1 tomi users 169251 Jan 22  2008 surrealistic.png
-rw-r--r-- 1 tomi users  42497 Jan 22  2008 trees-1.png        
-rw-r--r-- 1 tomi users 132091 Jan 22  2008 trees-alot.png

List of corrupted files in that folder: http://darkvater.homenet.org/corrupted_goofs.txt

@Darkvater
Copy link
Author

I did a test on an ubuntu 13.19 64bit vm I have at home and did a copy with that from onedrive. Same, or at, least very similar results. Many of the corrupted files are the same with the same sizes for corruption.

@ncw
Copy link
Member

ncw commented Mar 24, 2016

I've managed to replicate the bug with your corrupted goofs - thank you very much for those.

Just downloading the files from onedrive with rclone is enough to cause the problem, so we can rule drive out of the equation.

I downloaded the files using the web interface and they were correct.

I haven't worked out what is going on yet but I will :-)

@ncw ncw added the bug label Mar 24, 2016
@ncw ncw added this to the v1.29 milestone Mar 24, 2016
@ncw
Copy link
Member

ncw commented Mar 24, 2016

To take the example of 1 file. it looks this big

-rw-rw-r-- 1 ncw ncw  97258 Feb 15  2014 HIGH-bridges.png

However according to rclone it is this big

$ rclone ls onedrive:goofs
   205029 HIGH-bridges.png

The web interface agrees that it is 200k.

onedrive

So somehow onedrive has its metadata in a twist...

Which gives me an idea...

When I try uploading the files with --no-gzip-compression they appear properly, so somehow onedrive has decompressed and recompressed the file on the fly.

Can you see if re-uploading the bad files with --no-gzip-compression to onedrive and downloading them with the same fixes the problem for you?

I think this is probably a bug in onedrive, but possibly one that can be worked around!

@ncw
Copy link
Member

ncw commented Mar 24, 2016

I uploaded the same file 100 times to onedrive and it is 200k half the time and 97k half the time!

https://onedrive.live.com/redir?resid=71A96798E7B1D253!10473&authkey=!ALWELG3BUcvK-gM&ithint=folder%2cpng

@ncw
Copy link
Member

ncw commented Mar 25, 2016

After a lot of investigation, I've discovered that it is the updating the modification time which we do after the file is uploaded which triggers the problem. If I stop doing that then the file is no longer has the wrong size when uploaded. I'm reasonably convinced this is some sort of race condition in onedrive - I've been trying to reproduce it with the python SDK so I can report it as a bug.

@ncw
Copy link
Member

ncw commented Mar 25, 2016

After a few more hours of experimentation, I've discovered that

  • the bug doesn't depend on gzip compression
  • the bug doesn't depend on the setting of modtime (PATCH)
  • the bug can be reproduced with the official python SDK

It seems to be very sensitive to something that I haven't worked out yet.

Since I managed to reproduce the bug using the official SDK I reported it as a bug.

OneDrive/onedrive-sdk-python/issues/27

Hopefully someone from Microsoft will escalate the problem to the right person.

@klauspost
Copy link
Collaborator

Same bug: https://onedrive.uservoice.com/forums/262982-onedrive/suggestions/6711029-fix-size-metadata-bug. A dev has responded here: http://stackoverflow.com/a/27031491/681490 1½ year ago, so they don't seem to be in a hurry to fix it.

@ncw
Copy link
Member

ncw commented Mar 26, 2016

Thanks for finding that Klaus. Not quite sure what we should do for rclone. Setting the size in the object returned by Put will allow the upload without the corrupted report but it will be uploaded every sync which maybe is OK since it doesn't happen to many files.

@Darkvater
Copy link
Author

Hey ncw, thanks a lot for the investigation! Really appreciate it. I'm currently in India but if you need anything I'll happily help in two weeks when I'm back

@ncw
Copy link
Member

ncw commented Apr 4, 2016

I have received an official response from Microsoft on OneDrive/onedrive-sdk-python#27

It is a known bug that Microsoft haven't fixed and if you would like it fixed then vote on Uservoice - I did!

I can't think of a sensible work-around for this - rclone needs to make sure the size of the file is correct and if it can't rely on the size of the file then it will assume that it has been corrupted.

@ncw ncw modified the milestones: v1.30, v1.29 Apr 4, 2016
@Darkvater
Copy link
Author

Darkvater commented Apr 16, 2016

There is a response by the python sdk developer - I believe rightly - that file size should not be used to check file integrity. I see that rclone has a –checksum option. Does this also still verify file size as the documentation suggests it does? Maybe we can adapt the behaviour to only check the checksum if this option is set and not the filesize.

@ncw
Copy link
Member

ncw commented Apr 16, 2016

Checksum will still check file size. I'd need to add a --no-check-size option which you would use with --checksum that might be worth a try.

@Darkvater
Copy link
Author

Hi ncw, not sure how the program is structured internally, but I think that if we are making changes that --checksum should not check file size. A checksum will fail if file size is not the same anyways, so a file size check is nothing more than a quick failure shortcut. I would change the checksum option to ignore file size. An option could be added to the really brave to disable all kind of checks by setting --no-check-size as file size is default mechanism if you so wish so.

@ncw
Copy link
Member

ncw commented Jun 17, 2016

I've implemented an --ignore-size flag which you can use to work-around this issue. I've verified it does the right thing.

Here is a beta with that fix in for you to try.

http://pub.rclone.org/v1.29-1-gbb75d80-62-g46135d8%CE%B2/

Please re-open the issue if you have any problems with it.

@ncw ncw closed this as completed in 46135d8 Jun 17, 2016
@Darkvater
Copy link
Author

Hi ncw, thanks a lot for the patch! I tried it and can verify that the corruption as I have experienced is now gone. Good stuff!
I see a new version has already been released with these changes. Just would like to add that in the changelog the description only talks about "corrupted images". This is however incorrect as corruption happens also for other files as word or PDF documents, I believe anything that onedrive has previews for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants