-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example data #279
Add example data #279
Conversation
mypy was complaining about lists being invariant, whatever that means.
I was thinking more of having a completely parallel package within the same repo, rather than embedding it within This search page shows how the example from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Just had a query, how much of a problem will updating those binary files be. Like I know we don't want to be changing them super frequently, but they will need to be altered every time model inputs are changed, which is still going to happen with reasonable (weekly?) frequency. Is the best approach just not to worry and to update them every time a new model input gets added (or removed or changed)?
We don't want to update them too much, because you end up with complete copies of each binary file in the repo history. The files aren't huge, so not a big deal, but it wouldn't take too many updates to bloat the repo. One thing here might be to develop the |
From time to time, you can purge the repo from binary files in the history if it becomes too big. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@davidorme I hear what you're saying, but I'm not sure that we want the data in a separate repo (for now), for a couple of reasons.
If VR had been around for years and there was a stable data format and you had, e.g. 1GB of example model data, then it might make sense to store that somewhere else (e.g. using git LFS). Even then though, I think it's still nice to have a minimal example dataset to go along with code, so that users can test it out. Yes, you shouldn't be committing loads and loads of binary data to your git repo as a matter of course, but it's fine to have some small binary files in there, even if they change semi-regularly. People routinely use git to store websites, even where there are image files in there that change all the time. My copy of the VR repo is currently at ~5MB. Even if it ends up being >100MB a few years down the line, that's honestly totally fine too. Let's not make our lives unnecessarily difficult here. |
I agree that the binary files here are not likely to be an issue - mostly responding to @jacobcook1995's query 😄 I take your point about keeping the data in sync while |
@davidorme Sounds good to me. |
Description
This PR adds some example data as a subpackage inside
virtual_rainforest
, as suggested by @davidorme. This allows end users to try VR out without having to construct their own dataset or get access to the RDS. This does include some binary files, though they are small in size (< 100Kb total).I based the dataset on the one at
/rds/project/virtual_rainforest/live/dummy_data
, with the paths in config files changed to be relative (now that #269 is merged).I added a new flag to
vr_run
(--example
) to allow users to run a simulation with the example data. I'm not entirely sure about this solution -- it would be nicer to have users just pointvr_run
at wherever the data is located -- but it might not be easy to find if e.g. VR was installed viapip
. Suggestions welcome.There is also now an
example_data_path
member in thevirtual_rainforest
package for convenience, though if you import it then all the models will also be loaded (see #278).Closes #265.
Type of change
Key checklist
pre-commit
checks:$ pre-commit run -a
$ poetry run pytest
Further checks