Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DVC New Cloud-supported Google Drive Remote #466

Merged
merged 26 commits into from
Jul 26, 2020

Conversation

nickdelgrosso
Copy link
Contributor

@nickdelgrosso nickdelgrosso commented Jul 23, 2020

This pull request is about adding monitoring and usage controls to the dvc remote, so we don't hit the same "usage limit" error we encountered yesterday. Following the directions on the DVC documentation here, the dvc database has been migrated to a new remote. Extra benefits:

  • A special read-only "service account" has been created for travis, that doesn't collide with the normal developer login (we had read-only before, but through a different mechanism).
  • No fiddling with Travis encryption.
  • Pull requests across repos should work (we can run tests on community submissions now)
  • Usage monitoring and raising if needed. Shouldn't be a problem, but if we start hitting limits we'll understand why.

To use it, I recommend everyone update dvc (pip install --upgrade dvc) to the latest version (they had a 1.0 release recently) and delete the json file so google asks to re-authenticate on the next dvc pull. From there it should work as before.

@chriski777
Copy link
Collaborator

This is awesome! Will community developers still be able to call dvc pull without authenticating?

@nickdelgrosso
Copy link
Contributor Author

Yes, if they pull from the travis repo: dvc pull -r gdrive-travis

@nickdelgrosso
Copy link
Contributor Author

nickdelgrosso commented Jul 24, 2020

Two more commits to fix some quality-of-life issues:

  • I just did a little more digging through the monitor and found the source of the "usage exceeded" error: DVC was making way too many calls at once for google drive. The DVC devs reccomend just tracking a whole folder when big data directories are involved, in order to have fewer calls and to more-efficiently do downloads and uploads.
  • Adding new files for new regression tests involves git-adding a bunch of dvc files at once.
  • Also, every time we changed branches, pull/pushed, or commited, DVC would add a "/tmp" line to its .gitignore file. Very annoying to stash it every time we want to make a change! This is a well-known issue for certain repos that the devs weren't able to track down the source of, but have somehow fixed for new repos. They recommend just re-initializing the repo with the latest DVC version.

To fix this, I re-initialized the DVC repo. You'll notice that there's now only one .dvc file (which tracks the data/testdata folder as a whole), and hopefully that that .gitignore problem is gone!

@nickdelgrosso
Copy link
Contributor Author

Okay now the .gitignore:/tmp issue is fixed. Seems to be because a credentials file was committed to the .dvc/tmp directory. Moving it to .dvc/cred stopped the .gitignore:/tmp behavior

@chriski777 chriski777 self-requested a review July 24, 2020 18:51
@carsen-stringer carsen-stringer merged commit 62c35c7 into MouseLand:dev Jul 26, 2020
@carsen-stringer
Copy link
Member

lol probably my file :D I was being lazy with the stashing of the dvc folder -- thanks nick this is merged! I will make the dvc pull -r gdrive-travis the default instructions in the readme

@nickdelgrosso nickdelgrosso deleted the dvc-gdrive3 branch July 27, 2020 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants