-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MNIST dataset to the dataset-registry for the new example-get-started #27
Comments
@iesahin let's add raw data to the existing data registry. let's push processed, models etc to the public (read) remote on S3, similar to the one we use for the |
is it related to this ticket? |
Ah, right, moved to the other. Thanks. 😄 |
This is closed by iterative/dataset-registry#7 |
I started to write the experiments tutorial using the original MNIST from deepai.org:
Currently I'm using a Google Storage bucket as a remote:
https://console.cloud.google.com/storage/browser/dvc-example-data
After
prepare
andpreprocess
, thedata
dir is like:There are 4 raw data files. These are better served with
dvc import-url
ordvc get
instead of using a common remote, I think. Also instead of adding them one by one, I decided to addraw/
directory to DVC.prepared/
directory contains mixed, and shuffled and remixed data,preprocess/
contains normalized and (optionally) noise-added data. These are all steps in the pipeline. I plan to use tracking directories instead of individual files in these as well.There may be multiple models (one MLP, one CNN, another deep CNN ... ) that can be selected in
params.yaml
bytrain.py
. I think it overcomplicates the pipeline. (Which files will thetrain.py
depend?) However it can show the experimentation features more clearly. "We select this model with this amount of salt and pepper noise and it gives us this result."I plan to write individual model files for different models. Some of them may not run on Katacoda, but overall there will more parameters for the users to try.
In summary, I need some way to put the data files to a public place. I can use the above URL or you may want to keep the data at the current one: Which way would you recommend?
@shcheklein @dberenbaum
This is related with iterative/dvc.org#1400 iterative/katacoda-scenarios#60 and probably many others :)
The text was updated successfully, but these errors were encountered: