-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HttpError 400 when pushing to Google Drive #3098
Comments
@Maxris any ideas from the top of your head? At least, how should we debug this? Add more logging? |
it's strange that url from HttpError doesn't contain any query params. @RomanVeretenov could you please give an idea what URL format do you have in the config for remote gdrive? For example, it should be like |
Btw, @RomanVeretenov check this doc https://github.com/pared/dvc.org/edit/2473/public/static/docs/user-guide/troubleshooting.md?pr=%2Fiterative%2Fdvc.org%2Fpull%2F875 (not merged yet). I think it's better to increase Also, could you confirm by chance if it always takes the same amount of time before it breaks? Also, is there a chance that someone is using the same DVC remote at the same moment? |
gdrive://root/_dvc/myrepo At first run, dvc added some files to gdrive folder, so I don't think there was some error caused by race. |
I don't remember if I tried to increase ulimit before limiting --jobs, but I will try to investigate it.
At first run, it has added some of the files to gdrive. And at second run and later, it takes almost the same amount of time. I can run the command with -v again, redirect output to a file and compare 2logs from 2 attempts.
Absolutely no chance. |
It looks like it fails on collecting existing files on the remote. I think it's using some pagination mechanism and I wonder if the "cursor" just expires or something? @Maxris any ideas? May be we can put more debug logs to catch the error? |
I have implemented a workaround on my side. I zip all small files at the last level of folders hierarchy and store them in dvc as archives. Also I have setup hooks for unpacking them after git pull/checkout. I have done it also to speedup the syncing with local remote. Syncing a lot of small files is really slow. I have recreated the gdrive remote and the error is gone. But the gdrive remote still behaves strange. I'll open another issue. |
@RomanVeretenov what is the number of files in the folder, btw? I'll try to reproduce this 400 error on my end. |
Before I migrated to new storage structure (zip all files in the bottom-level directories), it was about 2 million |
@RomanVeretenov do you still have that remote storage that you was pushing these files initially? To try to reproduce this with the latest DVC release? I tried to push ~300K files and then run |
Sorry, totally out of time now. Will try to check it, but now I have most of small files zipped (there can be several k files in one zip) and I keep these zips under dvc. So I should reorganize the storage and test. I think you may close this issue now, I will let you know if something goes wrong. |
Ok, closing for now. Please feel free to reopen if the issue persists. Thanks for the feedback! 🙏 |
DVC version - 0.80.0, Intalled via pip
Ubuntu 18.04.2 LTS
I'm getting following error when pushing to google drive remote
ERROR: unexpected error - <HttpError 400 when requesting https://www.googleapis.com/drive/v2/files returned "Invalid query">
I have a repo with a lot of files (about several kk). My GDrive storage has unlimited space.
Pushing to "local remote" works fine. But after adding a gdrive remote, first I got an "ulimit error" (fixed by adding
--jobs 8
param todvc push
), and now I always get HttpError 400.I've run the push command with -v also. I can't paste full log here because it's huge and reproducing this issue with -v takes several hours. But the tail is following:
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
The text was updated successfully, but these errors were encountered: