-
Notifications
You must be signed in to change notification settings - Fork 11
Maintain code / data files in scenarios from a central repository and bucket #44
Comments
Yes, we have an AWS S3 bucket (used as public remote for https://github.com/iterative/example-get-started for example). But what's the benefit of this proposed approach, to make scenario preparation faster? |
What other assets do we need to upload other than raw data (already on S3), prepared data, and model? Everything else should already be in Git I think. Thanks |
You can see the difference in https://katacoda.com/dvc/courses/get-started/experiments and https://katacoda.com/dvc/courses/get-started/params-metrics-plots. If you can put https://one.emresult.com/~iex/project-experiments.zip to a bucket and let me know the URL, I'll change it. There is a limit of 1MB for the asset files in Katacoda. The code can be put into the assets, in the form of a DVC repository and it can pull the data files. In that case, we need to upload the data files only. @shcheklein said the maintenance of these code/project files may become an issue. It's possible to use DVC itself to manage the data files but this must be fast. DVC installation takes some time and the time spent on preparation should be minimal. If we prefer to manage the data with DVC (I do), some structural changes needed in the code files. At least they should use |
All the pipeline data should be up on the example-get-started repo remote (see all the
You can checkout the corresponding tag and use
But you have to install |
After dockerization of the scenarios, it's now not required to use such a facility. The containers already include all needed assets and code. Related: #49 |
Currently, scenario initialization scripts replay the previous steps in to obtain a project. It initializes Git and DVC, downloads the data, splits it, prepares it, etc. Instead of this, we can package the end of scenario result in a single zip file, download and extract it in a single step. Because of the data, we cannot put all to the assets, it has 1MB size limit.
Is there a central location (in S3, perhaps), that I can use to upload / download these zip files? I'll put a zipped DVC repository for each of the scenarios.
@jorgeorpinel @shcheklein
The text was updated successfully, but these errors were encountered: