-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR: failed to pull data from the cloud - Checkout failed for following targets: #9653
Comments
Looks like it is complaining that some cache files are corrupted, deletes them and then fails because of it. Is anything strange about your setup? Could you show your |
Thanks for the response, @efiop.
Maybe the two gdrive remotes (one service account, one not)? I'm not sure if that's standard practice for open source projects that need to enable CI. Nothing else seems strange that I can think of.
(I'm not sure what here is sensitive, so I'll be overly conservative)
I don't believe so. It's using whatever container Github Actions provides, which I believe is not a shared machine (or it may be shared but I assume things run in containers). Is there a way to determine whether the dvc cache would be shared more broadly? I was able to SSH into the machine, and I manually deleted |
@thekevinscott Since this just came up in #9640, is there any chance the failures are because your Github runner ran out of space? |
That's a pretty interesting thought.
Running |
@thekevinscott Do you mix 2.x and 3.x? Original log is from gdrive, right? |
That doesn't sound good, seems like dvc thinks that the files it is downloading from your gdrive remote are corrupted. We are currently having some problems with gdrive remotes on 3.x, that might be related (maybe it thinks that these files are using the new hash while they don't) iterative/dvc-gdrive#29 Could you try pinning to 2.x and see if that fixes it? |
Original log is pulling from gdrive, yes. For your first question - you're asking if I'm using different DVC versions? I'm using whichever version is being installed from pip which appears to be 3.x |
Sure thing, I'll give this a shot tomorrow and report back. Any particular 2.x version or is the latest good? |
Hi, can you try installing |
Thanks all! It is almost certainly a versioning issue. The files were added (via I ran two experiments:
So, presumably, because the models were added locally with a I assume then the next step is to follow the instructions here on upgrading from 2.x to 3.x, and re-add the models using the correct file hashing format? |
@thekevinscott, can you try with |
Sure I can try that. Should I also pin pydrive2? |
No, that should not be needed. dvc 3.2.1 requires pydrive2>=1.16. |
I see that those files are now downloaded, but says some files are corrupted (and gets deleted as a result which most likely fails the checkout). |
Yes, although the error message appears slightly differently (appears to be complaining about the whole model folder now, and not just the individual corrupted model pieces) |
Should I hold off following the steps in the upgrade guide for troubleshooting purposes? (I assume once I upgrade, I won't be able to reproduce the issue.) This is an open source project so I'm not under any sort of time rush to fix if it's helpful to you all to leave it in a broken state, but I assume the fix for me is following the upgrade guide to get things to 3.x. |
I had this issue. On my side there was a problem with a cache folder. I deleted I suspect this occured to me while switching between branches and dvc v2 and v3. |
@thekevinscott Does the latest dvc version work for you? We've changed some stuff to not share internal cache between major versions. |
I'm seeing the same issue, persisting on 3.6.0 (at least on a machine that I just upgraded to that version after seeing it come up on 3.5.1), and have intermittently seen it back when all our machines were on v2 as well. We use an S3 remote, but have a sync task set up to duplicate that entire bucket across to a read only bucket on GCP (to avoid repeatedly paying egress costs when pulling data to servers on GCP). Root cause seems to be pulling from the secondary remote before the sync task has had a chance to run. On the first attempt WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files: But on all subsequent attempts (even after the buckets have replicated successfully) it fails with: ERROR: failed to pull data from the cloud - Checkout failed for following targets: Running on a freshly cloned copy of the repo seems to work fine, so I'm assuming something in the state of the original copy of the repo is getting messed up by trying to pull when the files are missing from the cloud, and then it's not successfully re-checking to see if they've appeared on subsequent runs? |
could be related to #9826 |
Closing this as duplicate of #9651 This should be resolved in the latest DVC release (3.14.0 or later). In github actions CI you should not need to do anything other than updating DVC. On non-CI machines (for @gtebbutt and @WilliamHarvey97) you will need to remove the
If you still see this issue after updating and clearing the site cache feel free to re-open this ticket. |
Still seeing this error for
Pinning to Don't necessarily want to keep this ticket open as I've found a workaround and will be upgrading locally soon, but I'm happy to reopen it if you'd like to debug it further. |
I realize I forgot to clear the cache in CI - let me do that and I'll report back with the results |
Same issue as above with cache cleared for https://github.com/thekevinscott/UpscalerJS/actions/runs/5831901238/job/15816243365?pr=842 |
@thekevinscott Seems related to #9733 . Could you try dvc 3.15.0, please? |
|
Bug Report
Description
On Github Actions, I'm receiving the following error message during a
dvc pull
(the full log is here):It appears that a subset of files fails to be pulled.
Things I've tried:
dvc pull
there without any issues. All models get pulled successfully.Reproduce
I'm not quite sure how to reproduce, as this is only happening on Github Actions. Here is a sample run where it happens. I cannot reproduce locally.
Environment information
Output of
dvc doctor
:I'd be happy to provide any other information for help in debugging this. I'm not sure of the best way to troubleshoot as the issue only seems to appear in Github Actions.
UPDATE
One additional thing that might be useful is, there are three remotes associated with this repo:
gdrive
(A regular Google Drive)gdrive-service-account
(The same account as above, but set up to work with a service account)s3
(Amazon S3, mirrored)The reason for the two gdrive remotes is so that users can clone the repo and easily pull models (it's an open source library and
gdrive
is the default remote), but also to enable CI integration (which afaik requires a service account).That said, I've confirmed locally that pulling from the service account works successfully. Just not in the Github Actions session for some reason.
The text was updated successfully, but these errors were encountered: