-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locally caching remote or inline resources via "cache" key #243
Comments
Big +1 on Neutral on the description of what implementations should do. What an implementation should do, in this case, is likely highly specific to the requirements of the implementation, and not an issue for the spec to address. |
How about storing the ETAG or some other hash of the document from the last time it was retrieved? |
@rgrp I think I'd want to modify this, that |
@aubergene about ETAG or hash of document serves a different need IMHO. The need for a cache comes from the possibility of a resource hosted elsewhere becoming unavailable. |
I'd be really keen to support url as an option for the cache, even though it opens up a can of worms where the cached copy (of file or url) could potentially be out of date. Although it complicates thing, I agree with @aubergene that having a content-hash for the cache would be useful in determining if it is out of date - which may or may not be acceptable, but at least the user would know. |
OK, this seems fairly sensible and straightforward. Only question is whether cache is an object or a single path value. Object would be something like:
|
|
Our recent look at 60k files we have archived showed < 50% had an etag, so it may not actually be that useful. Agree the hash can be calculated by the client, but where the cache is a URL I need to fetch it first - the useful thing with the hash and lastCached date here is determining if it actually IS out of date. If the hash doesn't match the primary and/or the date doesn't match my expectations, then I know I need to investigate further. If there cache is a naked url, or file path, I have absolutely no idea without processing it whether it may be different than the original data. |
@rossjones ok so etag can be dropped. I guess my question here would be to do a progressive enhancement:
Any thoughts? @pwalsh one aside is interaction with cache and |
OK so something like:
Only remaining question is whether this is a pattern or in formal spec (idea being we may leave a standard pattern and add to spec later if interest is large enough - want to keep spec as streamlined as possible). Logic for pattern is this more for consumers than publishers (publishers would rarely implement ...). |
@rgrp as mentioned in other issues, I'm partial to this being in the spec. |
Consider data package:
What an implementation should do on a concrete user machine after Also there is a problem of cache invalidation.. And storing cached files inside the datapackage could mess with git. Also name |
Also really related to #250 Have we considered |
OK, the blocker here will be someone drafting some specific suggested language. I'm +1 on having this - either in core spec or an extension. Language should be drafted as markdown and posted here in a comment or in a gist. |
@roll |
@rgrp I'm away for 2 weeks, but yes it would be great to have a draft. Also, I'm +1 on have |
@pwalsh +1 for core spec |
@roll some of the info is lost as it is spread out over other issues (eg: #223 (comment) ). However, bottom line is:
|
👍 |
@rgrp and others, any further comments on this? I note that calling it I'm happy to do the PR on this one. Any comments from the @frictionlessdata/specs-working-group before I do that? |
I may be misunderstanding. It seems to me like any tools reading data packages may want to handle caching in their own way - down to where they store the cached file. Imagine a tool that reads remote CSVs into a database. The database already serves as a cache. There's no need to store the remote CSV as a local file at the path described by If by 'cache' we mean 'alternative path', then why not make I don't think I also think there's a security risk if the interpretation of But maybe I'm confused by this proposal. |
hey @jpmckinney By cache we mean alternative path.
But actually, you've just articulated the best and most clear reason why So, I'll instead present this as a "pattern" for usage, outside of the spec, and consider adding support for it to the python and javascript libraries for usage in platforms implementing the spec. ( cc @akariv @roll @amercader ) |
I'm +1 for any spec simplification instead of complication. |
Removing from v1 milestone as now a pattern item. |
This was done in #331 and will close when it is merged. |
As of
1.0.0-beta.15
, only one ofpath
,url
, ordata
is allowed on a given resource. In discussing this, it was suggested that it is useful to have a local path to which to download a remote (url
) or inline (data
) resource so that non-datapackage-aware tooling can access. As a result of discussion in #223, it has been suggested that a new key,cache
, be used on a resource to specify a local path to "cache" remote or inline resources to disk.Examples:
So, when a resource has a
url
ordata
key, itSHOULD
also have acache
key. An implementation that is reading the data in the data package, should then first checkcache
for the data, if it exists, and, if not, save the data as specified inurl
ordata
tocache
. Otherwise,cache
acts just likepath
. When a resource has apath
key, an implementation should ignorecache
.WDYT? @morty @amercader @rgrp @pwalsh
The text was updated successfully, but these errors were encountered: