You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the comment #1462 (comment) extracted as a separate feature request. Also, @kskyten opened submodels issue #301.
Users should have an ability to create a "library" of reconfigurable pipelines #1462 and reuse them from different projects. Pipeline import can work through copy, Git-submodules or git clone https://my-dvc-repo.
# Clones a repository and pull data for reconfigurable modules with data
$ dvc clone https://github.com/iterative/so-dataset-posts-25K
$ dvc run -d prepare.py -d so-dataset-posts-25K/data.xml \
-o data.tsv -o data-test.tsv \
python prepare.py so-dataset-posts-25K/data.xml
$ dvc clone https://github.com/iterative/text-to-bag-of-words
# Run cloned module instead of:# dvc run -d featurization.py -d data.tsv -o matrix.pkl \# python featurization.py data.tsv matrix.pkl# -d1 - pass a file as the first module input\dependency (since it can have a few)# -o1 - instatiate (create a hardlink) the first module output as a data file
$ dvc sub text-to-bag-of-words -d1 data.tsv -o1 matrix.pkl \
-p columns=1,2 -p lowercase=true -p max_features=9000
# Just a regular run
$ dvc run -d train.py -d matrix.pkl \
-o model.pkl \
python train.py matrix.pkl model.pkl
Details
The module should not be executed dvc sub from its directory text-to-bag-of-words since a single run might be not enough. Instead, a separate module instance should be created in some directory (let say .dvc/inst/) for each separate run with a unique suffix (for example .dvc/inst/text-to-bag-of-words_8bf3cfe).
Connection to build cache issue
The module unique suffix can be based on the module instance config file (not in the example above) and set of params. In such a way DVC can easely identify a similar runs and can be reused as build cache #1234 for a regular runs (not modules).
This is the comment #1462 (comment) extracted as a separate feature request. Also, @kskyten opened submodels issue #301.
Users should have an ability to create a "library" of reconfigurable pipelines #1462 and reuse them from different projects. Pipeline import can work through copy, Git-submodules or git clone https://my-dvc-repo.
An analogy with programming:
UPDATE: Added a link to @kskyten issue.
The text was updated successfully, but these errors were encountered: