You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's say there is a 3 stages workflow: A -> B -> C:
A run a.py and produce a.json,
B run b.py that takes a.json as input and produce b.json,
C run c.py that takes b.json as input and produce c.json.
It would be very handy to have a flag in dvc repro to download only what is necessary to rerun the stages that changed. For example, when working on a fresh clone of the repo:
if I change nothing and run dvc repro --new-flag -> dvc download nothing and run nothing.
if I change c.py and run dvc repro --new-flag -> dvc only download b.json and run step C.
if I change b.py and run dvc repro --new-flag -> dvc download a.json and run steps B and C.
This become particularly useful when working with big pipelines that train multiple models. Downloading all the training data for all the model can takes a lot of time and a lot of space on the disk.
The text was updated successfully, but these errors were encountered:
Let's say there is a 3 stages workflow: A -> B -> C:
a.py
and producea.json
,b.py
that takesa.json
as input and produceb.json
,c.py
that takesb.json
as input and producec.json
.It would be very handy to have a flag in
dvc repro
to download only what is necessary to rerun the stages that changed. For example, when working on a fresh clone of the repo:dvc repro --new-flag
-> dvc download nothing and run nothing.dvc repro --new-flag
-> dvc only download b.json and run step C.dvc repro --new-flag
-> dvc download a.json and run steps B and C.This become particularly useful when working with big pipelines that train multiple models. Downloading all the training data for all the model can takes a lot of time and a lot of space on the disk.
The text was updated successfully, but these errors were encountered: