-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] How to identify the latest pin in an s3_board #790
Comments
Thanks for this question @jeffkeller87! I think the short answer is "no" because we haven't built either R or Python pins with an eye toward being used by directly from a shell or similar, but since it's all just files and directories, certainly you have options:
I don't think it's likely that pins starts keeping a |
Thank you @juliasilge for the very thoughtful response. I agree that replicating the To that end, is the naming schema sufficient for determining the latest version? I figured the truncated hash would cause issues if more than one pin version was written within the same second. That's probably good enough for what I'm doing, but I can see it causing issues in other scenarios. Do you have a strong preference for the hash over sub-second markers? The manifest file was the other option I considered. My optimism deflated a bit when I saw it was a YAML file rather than JSON--only because of how long it took me to convince my Infrastructure / IT people to install |
Oh yes, you are definitely correct that the timestamp doesn't distinguish between versions written within the same second. This has come up before and to date, the only time this has been a problem is in kind of "fake" situations, like when building a vignette or when people are writing tests in other packages that use pins. We haven't heard of problems with the timestamp in people's real work, since most folks are pinning, say, a model binary or a summarized dataset coming out of an ETL pipeline. Folks are generally not using pins for super high performance writing, at least so far. In your use case, would subsecond information be practically important? ## what we do now:
format(Sys.time(), "%Y%m%dT%H%M%SZ", tz = "UTC")
#> [1] "20230926T161828Z"
## we could do something like:
format(Sys.time(), "%Y%m%dT%H%M%OS2Z", tz = "UTC")
#> [1] "20230926T161828.26Z" Created on 2023-09-26 with reprex v2.0.2 |
In my cases, there should be no chance of a sub-second temporal collision like that. But there's always those unexpected scenarios where another writer sneaks in at just the wrong time, and then pulling hair figuring out what happened when the pin you just wrote isn't the one that gets read immediately after (using the Modifying the timestamp format would shrink the probability further, but it makes specifying an explicit version more onerous in I think the current behavior is fine as-is. If someone is writing this frequently intentionally, they probably don't want a versioned board anyway. |
That makes a lot of sense. I'm going to leave this issue open for discussion in case other folks come by with this same need in the near future; we can reevaluate as we hear more on it. Thanks again for the question @jeffkeller87! |
It sounds like we haven't seen a high need for improvements in this area so I am going to close this issue. We can revisit in the future if we hear more from users on this! 🙌 |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
A follow-up to this post and similar to #590.
I love how I can use {pins} instead of maintaining my own artifact management process. It really cuts down on the amount of boilerplate code in my projects!
However, I often have a need to read pins from a system where installing either the R or Python {pins} package is not possible. In my case, these systems are ephemeral continuous integration runners with a limited set of software installed. Specifically, I am grabbing the latest model artifacts from S3 to COPY into a Docker image.
My current solution is to write artifacts to a
latest/
prefix (or directory, if S3 is not the storage media) in addition to a timestamped prefix. If the storage media is a filesystem, I sym- or hard-linklatest/
to the appropriate timestamped directory. The structure looks something like this:From a system without {pins}, I can then reference a static path to get the latest artifacts.
Without {pins}, is there a straightforward way to identify the latest pin version in a board?
The text was updated successfully, but these errors were encountered: