-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add version specifiers to schemas (and potentially cabs and recipes) #306
Comments
I'll take a closer look at this during the week but, for the moment, it's probably worth mentioning that Schema Evolution is an entire topic in it's own right: there's probably a good deal of prior knowledge that can be drawn on. On example that springs to mind is Google Protocol Buffers which define message schemas for use by Remote Procedure Calls (gRPC). They can evolve over time and, for e.g. "Modifying gRPC services over time", suggests some best practices in this context. |
I am obviously on board for this but I think we could be even more ambitious/user-friendly. I think I have mentioned my reservations about coupling image versions to the Specifically, I think that we should consider turning
I think that it is completely acceptable to bail out if a user attempts to use a parameter which is not part of the associated schema, and being too clever on this point may lead to future pain. We can easily minimize schema duplication as having multiple images share a schema would not be difficult. As an aside, this also means that new images could be added without requiring a package release of In order to fully support the above, we would need to make versioning in stimela recipes less opaque i.e. at present there is an implication that using an image is the
There are some other things we could consider (if we decide to be very ambitious):
I am going to stop here as this is getting muddled. I could also try and parse this into separate ideas and put them on |
I'd imagine at some point that dependency resolution may become a necessity (similar to pip). There's an outdated Python package called mixology which appears to handle dependency resolution for the concept of a generic package (i.e. not specific to a Python package on pypi). It's based on the pubgrub algorithm, which seems to be the current state of the art. |
I'm a bit reticent about adding top-heavy structures... the current scheme is simple, and relies on standard repositories (PyPI and quay.io) where all versions persist. It's also easy to replicate for somebody who wants to maintain their own cult-cargo-like collection. I also think all information required for full reproducibility is already in there, unless I'm missing something. Let me try to address some points.
Well it's already being done in the reverse sense -- each cult-cargo cab in the release already has a specific
Which schema parameter do you mean? The average user just works with an overall cult-cargo release version, which, in turn, implies frozen versions of all constituent packages under the hood (where the average user need not look).
I don't think this is the implication, but I also think we mean different things by "latest", are you thinking of it as a mutable, continuously updating version? There is no such thing once a given release of cult-cargo is out. There is simply a default image version (which does have a specific well-defined number). It is "latest" only in the sense of "latest at time of this specific cult-cargo release". Once a cult-cargo release is out, the associated images don't change anymore. The only time we change images is during a cult-cargo prerelease process. I.e. 0.1.3 is the next version -- I'll keep pushing new images with that tag until 0.1.3 is released. The cult-cargo build script already has protections for this, it will refuse to push images for a known release.
Agreed. I was merely suggesting a friendlier message when bailing out ("unsupported parameter because you have version blah" as opposed to just "unknown parameter").
This is already the case somewhat. As soon as we push 0.1.3pre1, we are free to push and push 0.1.3 images, until we hit the release button on 0.1.3 proper (see above). Also, a dev version of something doesn't even need to use cult-cargo. I could push breifast images to my own personal repo and keep shipping dev cabs pointing to them, all the way until breifast makes it into cult-cargo.
Good idea (and touches on the certifiable workflows discussion), so let's break this out into a separate issue/discussion.
I was thinking of something similar in #115 (in a pure venv context), but indeed this could also be done with images. |
I don't think abandoning either of PyPI or quay.io would be required. I think that having a layer on top of them that maps schema (in the sense of cab definitions i.e. the parameters the cab accepts) may just be helpful in the long run.
Agreed, although I really do stand by my opinion that using
I think the use of schema confused this point - apologies. What I mean is that for packages included in
I understand this point but I maintain that this is opaque. It means that if a user were to read the recipe, there would be absolutely no way of knowing which versions were in use without either checking other files (requiring more expert knowledge) or installing a specific version of
Ok, this is something I hadn't thought about. That is fair enough. I will point out that in the current model,
On the point about private repos, absolutely. I have done so too. What I meant by this point is that we could push a hypothetical Finally, just to reiterate, if the goal is reproducibility, I sincerely believe we have to decouple the "runtime" requirements i.e. the cab definitions and images, from the Python code/packaging infrastructure. |
Fair point. This is where the PyPI model breaks. Still, I like the simplicity of it for now, so maybe we can muddle our way forward to a more structured scheme while we retain backwards compatibility?
Arguably this is a good thing. The top-level recipe should not be burdened by details, it's more readable that way. For those that want to get into the versioning weeds, there is the
The model I followed for 0.1.2 was multiple 0.1.2preX releases of cult-cargo, while the images themselves were versioned 0.1.2 and were being updated. Do you think this works going forward? Bleeding edge people can use the pre-releases and/or track cult-cargo master. At some point we make a proper release, images get frozen, and another pre-release cycle starts. |
Yeah - any changes weren't going to be short term regardless. Just something to keep at the back of our heads.
Agreed. So the policy is that versions float with the
Ok, that works. |
Related to discussions with @JSKenyon... how do we deal with evolving CLIs. Multiple versions of, say, wsclean are already supported via
image: version
, but there's no way to tell Stimela that a particular parameter is only available with e.g. version 3.1 and up (or, conversely, has been deprecated).Proposal:
Cabs shalt have an optional
version
attribute, populated fromcab: image: version
if not set.Parameter schemas shalt have an optional
versions
attribute, specified PyPI style, e.g.versions: >=3.1
.Inputs/outputs shalt be (de)activated by comparing their version string to the cab version, if both are specified. There are standard libraries for version parsing.
If we really want to be user friendly, we don't just delete a deactivated parameter from the schema, we leave a stub entry so that stimela can tell the user they've specified a parameter from a wrong version of the cab.
Possible more advanced features:
Support version specification in the step itself, e.g.
cab: wsclean>=3.1
. In the first instance, this at least allows the recipe to error out in prevalidation if the wrong version of the cab is defined.This opens the door to having multiply versioned cab definitions in e.g. cult-cargo, with stimela being able to resolve which one to use if the recipe specifies a particular dependency. Those would have to live under a separately structured
versioned_cabs
(or something like that) top-level section, lest we break existing recipes which usecabs
.This is easily extended to supporting and checking optional recipe versions, e.g.
recipe: tron>=0.1
.Thoughts @sjperkins @SpheMakh @landmanbester?
The text was updated successfully, but these errors were encountered: