-
Notifications
You must be signed in to change notification settings - Fork 12
Continuous deployment with DVC #288
Comments
So it turns out that using "Self hosted runners" is not recommended for public repositories.
I am not sure if we want to use github servers to automatically train or evaluate our models. |
See below a script that could be turned into a github action. What is the goal?Replace the manual process that we need to go through when reviewing PRs (heavily inspired by #265). Namely
In a way, it is like a unit test that makes sure that all potential changes to our models and data have been correctly tracked What are the challenges
What are the benefits
Suggested script (WIP)Before we run the script, a deterministic git revision will be checked out (e.g. the most recent commit of the branch from which we triggered the action). set -e # if any command exits with a nonzero code the entire script exits too
set -x
pip install -r requirements.txt
dvc pull # also checks that everything listed in dvc.lock is on remote
# NER
pushd data_and_models/pipelines/ner/
dvc repro
test -z "$(dvc diff)" # exits with nonzero code if there are any changes
popd
# Sentence embeddings
pushd data_and_models/pipelines/sentence_embedding/
dvc repro
test -z "$(dvc diff)" # exits with nonzero code if there are any changes
popd |
This is a must-have! One comment:
Why wouldn't we want this to run inside a Docker container? Indeed, not running inside a Docker container is:
|
In my opinion, github actions are already run inside of a "container" of some sort. So IMO there is not need to introduce yet another level of nesting. |
I also agree that this kind of test needs to ba automated. Among all points you mentioned above I'm worried about the following two:
|
While dealing with the latest DVC tests I had the following issues / annoyances:
If what @jankrepl suggest above turns out to be infeasible, then we can think about writing something automated on our servers. |
The GitHub container might also just change and the reproduction fail because of the change. |
Concerning using GitHub Actions — we cannot have GitHub servers (1) set up a VPN connection with BBP (2) pull/push data from BBP servers. But we wait for GitLab actions to be available to do that on BBP premises. |
Scope
We need to make sure that we know when the changes in our source code influence our models / datasets. Without any manual procedures!
Current problems
Proposed solution
Github action triggered on each push
Notes
The most attainable/reasonable setup would be to use/replicate https://github.com/iterative/cml and just trigger some process on our server with pushes to a branch.
The text was updated successfully, but these errors were encountered: