-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cache for URL downloads #27
Comments
Isn't this basically CRDS? I thought for |
p.s. If you want caching, the download functions in |
I know what he's getting at. If you execute tests locally it will re-download the upstream file(s) for each run. That can become a massive time sink if you're actively developing a test with large data, so I agree... This thing should have an option to cache downloaded files somewhere. On the RT server this would be unwanted but for all other cases it's definitely worth it. |
I thought for local tests, we are supposed to use a local clone of the Artifactory stuff with |
Off the top of my head, I think this is a valid use case... A developer is writing a new set of tests for data that doesn't exist upstream, so they create the directory structure they intend to use (anywhere on disk) and set When the developer decides to run a different test suite (for whatever reason) If a cache was in place the developer would only incur X minutes of overhead once. As files are deleted due to the TTL or a header check, they'll be replaced by the server at a later time. This is useful, because sometimes you might not want So if you know you're going to run your non-upstream tests, and a subset of another test tree's data that does live upstream, the cache will speed up execution dramatically. Of course, if you run the entire suite you'll end up with all of the files, however, those files are considered temporary instead of a cyst ever-growing on your local filesystem. |
And though a dev/user may have Also, another use case involves concurrent development and the fact the data itself is not versioned. If dev A is working on a module, let's say for bugfixing, with caching, dev A would now have a local and, if set to not be cleared, permanent data set to test against. If dev B then works on, not necessarily the same code but code that uses some common set of data, which dev B then changes, these changes would not affect dev A work until such time that dev A is about to merge. At that point, dev A should clear their local cache and check regression. All of this would be with the convenience of just setting/unsetting an environmental (in the strawman case), as opposed to lots of explicit hunting/copying/deleting. |
I don't think this is relevant anymore with #53 |
FYI: This has nothing to do with CRDS. This is caching locally files retrieved from artifactory. |
CRDS handles the cache now. |
OK, I misunderstood. This is for Artifactory data, not CRDS ref files. |
Issue
Implement a cache for URL downloads.
Strawman implementation would be controlled by two environmental variables:
Points to a "more accessible" folder to use to cache downloaded data. If not set or not accessible, no caching would occur.
If caching is enabled, use this time, in minutes, to allow the cache to live. If not specified, default would be 24 hours (24 * 60 minutes). If set to something like "-1" or "0", cache lives on indefinitely.
Cache file names would be a hash based on the requested URL.
The text was updated successfully, but these errors were encountered: