Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to test dvc repro on sample / test dataset #176

Open
Tracked by #169
nsorros opened this issue Dec 10, 2021 · 2 comments
Open
Tracked by #169

Add a way to test dvc repro on sample / test dataset #176

nsorros opened this issue Dec 10, 2021 · 2 comments

Comments

@nsorros
Copy link
Contributor

nsorros commented Dec 10, 2021

At the moment dvc repro is not tested not guaranteed to run with changes. This is mitigated to a certain extent by tests but not fully. It would be ideal to have a way to run dvc repro on a test dataset so validate that it works before kicking a full run. This is intended to be used from the person that develops a PR as a sanity check similar to tests. Later on it could be added as Github check although a minor problem there is that dvc pull needs to run which fetches >100GB.

@nsorros
Copy link
Contributor Author

nsorros commented Dec 10, 2021

One we were discussing with @pdan93 is to add an environment variable TEST=1 that dvc reads and uses as a param or flag in preprocess to create a small dataset say 1K examples or less. This can be enabled / disabled locally and in Github and it will not be enabled when running the final run in the cloud (or whenever its run)

@nsorros nsorros mentioned this issue Dec 10, 2021
9 tasks
@nsorros
Copy link
Contributor Author

nsorros commented Dec 20, 2021

One way that works well in neural nets or models that use sgd variants to train and you can stop train early is to use a --dry-run flag which stops train after the first batch. We have effectively used this in #160

@nsorros nsorros removed their assignment Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants