Add a way to test dvc repro on sample / test dataset #176

nsorros · 2021-12-10T09:19:18Z

At the moment dvc repro is not tested not guaranteed to run with changes. This is mitigated to a certain extent by tests but not fully. It would be ideal to have a way to run dvc repro on a test dataset so validate that it works before kicking a full run. This is intended to be used from the person that develops a PR as a sanity check similar to tests. Later on it could be added as Github check although a minor problem there is that dvc pull needs to run which fetches >100GB.

The text was updated successfully, but these errors were encountered:

nsorros · 2021-12-10T09:21:06Z

One we were discussing with @pdan93 is to add an environment variable TEST=1 that dvc reads and uses as a param or flag in preprocess to create a small dataset say 1K examples or less. This can be enabled / disabled locally and in Github and it will not be enabled when running the final run in the cloud (or whenever its run)

nsorros · 2021-12-20T07:24:52Z

One way that works well in neural nets or models that use sgd variants to train and you can stop train early is to use a --dry-run flag which stops train after the first batch. We have effectively used this in #160

nsorros added discussion test labels Dec 10, 2021

nsorros assigned ivyleavedtoadflax, nsorros, aCampello and pdan93 Dec 10, 2021

nsorros mentioned this issue Dec 10, 2021

Xmas shortlist 🎄 #169

Closed

9 tasks

nsorros removed their assignment Feb 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a way to test dvc repro on sample / test dataset #176

Add a way to test dvc repro on sample / test dataset #176

nsorros commented Dec 10, 2021

nsorros commented Dec 10, 2021

nsorros commented Dec 20, 2021

Add a way to test dvc repro on sample / test dataset #176

Add a way to test dvc repro on sample / test dataset #176

Comments

nsorros commented Dec 10, 2021

nsorros commented Dec 10, 2021

nsorros commented Dec 20, 2021