-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging path and url for resources #250
Comments
+1. Easy enough to implement code and check if we have a protocol scheme or just a plain old path to file. |
Questions:
My sense if we do do this, approach would be:
Aside:
|
For use cases where the data can't be fetched (e.g. too big), one would still like to retrieve a |
@ivotron yes, that issue is covered right now, and would not be removed by the proposal in this issue. |
So @rgrp I'm in agreement with your points:
Additionally, @roll @amercader @akariv @vitorbaptista you are all writing code around this in various places. What do you think? |
|
Sorry if this was discussed before. We've agreed on removing the |
@vitorbaptista I think, always relative to the directory that contains |
So, imagine we have this {
"name": "foo",
"resources": [
{ "path": "data/foo.csv" }
],
} If this datapackage is at http://example.org/datapackage.json, its resource would be at http://example.org/data/foo.csv. If it's in /home/vitor/foo/datapackage.json, the resource would be at /home/vitor/foo/data/foo.csv. All good. Now, consider this other datapackage: {
"name": "bar",
"resources": [
{ "path": "http://example.org/bar/data/bar.csv" }
],
} This resource then doesn't depend on the path for the datapackage itself. Even if the datapackage.json lives in /home/vitor/bar/datapackage.json, its resource would still live at http://example.org/bar/data/bar.csv. If those examples are correct, we will always have to download the resources from their URLs, even if the datapackage is inside a ZIP file. Am I missing something? |
@vitorbaptista those examples are correct as I understand it, but still:
Not quite sure what you are getting it here. |
@pwalsh relative paths work for remote urls too - that's really important. In both cases you'd compute relative to the datapackage.json base directory. @vitorbaptista your examples are correct and 👍 to @pwalsh comment. Note: I'm really thinking explaining this stuff will go in the patterns or FAQs section on the fd.io site (we could add to the spec but it gets kind of bulky). |
@pwalsh Consider the second example: {
"name": "bar",
"resources": [
{ "path": "http://example.org/bar/data/bar.csv" }
],
} Also consider that we have the files in the local filesystem:
Then I open import datapackage
dp = datapackage.DataPackage('/home/vitor/bar/datapackage.json') Where should Going through its |
@vitorbaptista it should load the value of path which is the url. |
@rgrp So if a Sounds fine to me, just making sure this was the intended behaviour. |
@vitorbaptista yes. |
|
OK, I think this is ready to go in. It is a breaking change so I think I will need to add an actual warning to the spec mentioning that to support data packages created under earlier version of the spec implementors should convert url to path. Welcome final +1 / -1 as this is a significant change (please put these as explicit comments rather than emoticons on this comment!) /cc @pwalsh @paulfitz @danfowler @vitorbaptista @Floppy @jpmckinney ... |
Sorry for joining this discussion so late, but I seem to be missing something. I realise that this PR is the result of very long discussions in various threads - but when trying to compare the proposed change vs. the current state, I get this: In the current state, we have:
In the proposed state, we will have
So it looks like we've introduced some ambiguity to the spec, complicated the implementation (which needs to detect network schemas), and it's also a breaking change. Plus it doesn't feel to me that the original goal of reducing cognitive complexity is being achieved. Just an idea - If we allowed |
Do we handle |
Not in this specific change, but based on @pwalsh 's comment I understand On Tue, 9 Aug 2016 at 12:58 roll [email protected] wrote:
|
@akariv whilst it is conceptually simpler for the "experts", for many users there is confusion about which of the two to use and when to use them (or whether both can be used at once - see the issues referenced in the description). Also people get confused whether to use It is true that consumers of data packages (esp tools) may need to do a bit more work: they now need to work out when they have a full qualified url vs a path. However, for publishers it is simpler and the overall spec is simpler. |
OK. I think we want to move this forward.
AGREED: We are going to implement this. Question is what we do for backwards compatibility. I suggest we explicitly state that clients should remap If people are happy i will start a pull request so people can comment on the changes. |
@rgrp No plans to introduce |
@roll what is use case for implementors? How hard is it to work out scheme. |
@rgrp
But you're right for now I don't know a real use case for it on a spec level (may be @danfowler knows when publishers need to ensure path is ONLY local or remote) -> without a use case no reason to consider it. |
Just to clarify what I mean - consider there is a filesystem supporting many letters drive names - |
👍 Since the intent is to modify the spec with this change (and I'm working on the relevant code today), I'm going to go ahead and do this in Dataship. |
I have a question about an edge case. Does the following seem correct? When a path doesn't have a protocol descriptor but starts with Exampledatapackage at
The resource |
@waylonflinn great question - and i've actually worked on a draft here and came up with exactly this case. I'm coming to the view that you do not allow absolute paths without a scheme - only relative paths (and also no
This also addresses security concerns. |
HTTP, zip etc uses |
@rgrp Thanks for giving this some thought and coming up with a reasonable answer. Having the absolute path ( |
@waylonflinn really useful feedback. BTW why do you want "root paths" i.e. Lesson for me: people definitely want to retain allowing Does anyone want to allow For security conscious implementors: they can add a switch to disallow all absolute and |
In my opinion a secure recommendation for implementors will be |
@rgrp do you have a PR in the wings for this? I think the wording for the spec should be quite straight forward. I think we should also have a set of clear examples alongside that that addresses some of the above points. Again here, I think in the various places this has been discussed, there is general agreement that it simplifies the prep of a data package for a publisher, and reduces ambiguity in multiple properties that do essentially the same thing, at the relatively small cost of implementing code needing to handle the file pointer disambiguation (fs vs http). Any comments from the @frictionlessdata/specs-working-group before we start the PR process? |
OK, here's a specific proposal for the text change: NOTE: we will disallow absolute paths like Required Fields - Data LocationResource information MUST contain a property describing the location of the Data associated to a resource can be located online or locally on disk or The location of resource data MUST be specified by the presence of one
Data in Files -
|
@waylonflinn note we are planning to disallow |
At the moment we have
path
andurl
. I originally had this to make it super easy for tool implementors (no lists of web protocols to match against `http://, https://, ftp://, etc).At the same time it adds cognitive complexity to the spec and for publishers and confusion about whether one could use both #223 #232
The approach would be probably going forward just to have
path
and define like:The text was updated successfully, but these errors were encountered: