-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change of the ID, allowing for a URN and not only a URL #456
Conversation
Only a minor issue: the new wording completes the section about Therefore it seems that the first addition (in the URL section) is not useful, even confusing. |
@llemeurfr yeah... it is a bit confusing indeed. The (purely editorial) point is that the document refers to "value categories", ie, sort of, datatypes to describe the acceptable values (and this counts for canonicalization) and one of those categories is called 'URL' for holding, well, URL-s. But in the new setting an identifier is an IRI, so the old description of the category does not apply to it because the old description referred to URL-s only (if you can still follow me:-). Maybe the best option may be to use a different term for the value category, ('address'?) to separate the meaning of the term 'URL'. @mattgarrish, wdyt? |
There is already an 'address' in the manifest properties: Also, following the sentence:
could we add a ligne in exemple 54 like: |
I do not think so. "address" is a real (http) URL, because it is a Web address. Conceptually, and identifier is there for identification that is not necessarily a Web address (although we say it SHOULD). I would think keeping this two notions clearly separated is better.
Yes, and it was actually there in the previous version when that example did not have a |
A higher-level thought I had yesterday was that canonical identifier may no longer belong under the WP section if we're losing the requirement that it resolve to the publication. Wouldn't that make it usable in any profile, with the must/should requirement to resolve being specifically a WP requirement? In terms of classing these, I'd create something more specific to the use case, like "Identifiers". The point I take from @llemeurfr's comment is that we're not abusing notation by allowing URNs, but starting to mangle definitions in a way that makes reading complicated. Create a new category and we're not abusing anything. It is kind of disconcerting to see a return of IRI after we resolved to use URL, but at the same time I don't see how we can call the canon id anything but one. Hopefully, if we use a value class like "Identifier" it won't cause any additional confusion with our use of URL/address elsewhere. |
Let me see if I understand. Does it mean
To be honest: I am not sure it helps us too much (if my understanding is correct). It is good to separate PublManifest and WPUB because they are conceptually different. But I have difficulties to imagine any profile that would directly build on top of PublManifest and not on WPUB. If that happen, we would then have a profile without reading order, publication bounds, etc. But I may not understand what you mean...
Ie, a separate category alongside 'URL'? I thought of that this morning but I shied away, because that is the only term that would use this new category. But, then again, something else may come up later... Will you pick up the thread? (I will be out until tomorrow afternoon soon, so you can make any editing on the branch...)
Yeah well... the Web world, à la browsers, only deals about dereferencable addresses, ie, URL-s, and so far that was all we had. I think using URI (or IRI) only when there may be an explicit need to go beyond dereferencable addresses is actually a good thing. Using IRI-s everywhere in the text, even for, say, an address where we really really want an http-type address, could have been just as disruptive. |
An alternative could be, define the value space of canonical id as union(URL, URN) without introducing IRI there. |
Agreed. The key distinction we need to make is "locator" vs. "identifier" (now that canonical identifiers are not also required to eventually locate something).
I'd keep them separate because of relative expansion via I do think separating URL from URN in the value categories is necessary to avoid confusion all around. |
Isn't audiobooks exactly the genesis of this separation? The canonical identifier can't be provided, or can't be known, until the publication is deployed on the Web, so why drop the restriction for a resolving identifier if the only case is to make things that aren't quite web publications compatible? If the desire is just to have a unique identifier for packaged content, then why are we playing with the canonical identifier definition at all? It's not necessary to include the property, so packaged content is perfectly fine without one. But I'm getting confused what problem we're trying to solve now. Is the goal to be able to provide a unique identifier for packaged content? If so, then I'd agree we leave the existing definition where it is but perhaps rename it the canonical address. We could then look at a unique identifier field to allow ISBNs, UUIDs, and other things to travel with the publication. Or what am I missing? |
The reason one might still use |
Or is the problem here that we talk too much about the publication resolution aspect of the canonical ID when it is first and foremost an identifier? If resolution is just a nice extra feature of the canonical identifier, then I'd suggest it entirely belongs in part 1 of the specification. Perhaps what is confusing me is just this emphasis we have on locating the publication separate from the address. |
No. Audiobooks have reading order, possibly resources, table of contents: all are defined as part of WPUB-s. The same issue may come up with other publications. By restricting identifiers to be http URL-s we cannot properly use things like ISBN-s which are identifiers. |
that is absolutely correct. My worry is how would you explain, into the WPUB document's example, why you would use "id" : "urn:isbn:1234234324" and "isbn":"1234234324" the explanation is that (in my view) schema.org mixes up two concepts (the |
Yeah...this comment confused me too, @iherman. I think that the core data model that is expressed in PublManifest should include reading order, bounds, etc, and it has a need for identifiers that aren't necessarily resolvable (i.e. they're not also locators).
Our current prose states:
In JSON-LD, the Where this has headed is that we remove that WPUB-level requirement returning the |
Maybe so. At the moment, this is not how the document is structured, though. I would try to refrain reorganizing the document again (even if it may not be ideal) and try to make the least possible changes... In any case, audiobooks are Web Publications, too. We may get to something slightly different when they are packaged, but that is a different matter. Audiobooks can be served on the Web (e.g., for streaming) in which case they may have |
Right, this is confusing me in terms of understanding just what we need to achieve with this identifier. It seems we've layered HTML canonical addresses onto the JSON-LD identifiers and created a new beast. I agree with @BigBlueHat that this property is no longer specific to web publications, but has become part of the general data model for digital publications. All publications need an identifier, but only web publications can use a canonical address to achieve that. Why can't the canonical address just be specified in the links section with rel=canonical, though? Doesn't that begin to extricate us from this problem, as then the canonical identifier can be whatever you want, including the canonical address. |
It is important to have the
Which is fine with me if it can done, editorially, easily and properly... At this point I think we should let @mattgarrish look at the document from an editorial point of view to see how these can be smoothed into the document with the smallest possible amount of work... |
What makes an audiobook (or any other similar packaged publication) a "Web Publication"? Is it the JSON-LD manifest? Is it that it has a URL (i.e. can be linked to and retrieved)? Once packaged, it wouldn't have to have an entry page (so can't load in an existing browser even if unzipped) and wouldn't have a URL to dereference and may not have an identifier (given current discussion) in order to find a copy of it elsewhere on the Web (or related content, other human readers/listeners, etc). These aren't meant to be pedantic ontological questions (honest!). They have technical implications to the ecosystem from distribution to consumption to citation.
It's knowing when "all that jazz" is important and knowing when there's a need (or lack of one) to add the necessary bits. Restructuring the document would help explain the inheritance model from a core conceptual "publication data model" through the PublManifest expression and then the bits that make something a packaged publication or a Web Publication. |
I thought the goal of splitting the common manifest format was exactly so that the differences could be tackled at the profile level? With some coercion, any audiobook can be transformed into a web publication. That doesn't make an audiobook a web publication, but a slightly different subset that retains cross-compatibility. |
Are we mixing up the audiobook spec and the packaging note? The former is clearly a "profile" of WPUB, with some restrictions on the type of documents considered, with an extra type on the document itself, etc. Why is that put in question now? Packaging creates a slightly different situation as something that may come from outside the Web, but it is also its intention that it can be "unpackaged" on the Web albeit by a process that creates, if necessary, the PEP to make it really a WPUB. I do not really see any new problem. |
That's not what I'm questioning, but how is it that the packaged form can be invalid? If what we're packaging isn't a valid audiobook, what is it? I was under the impression that we wanted both forms to be valid, but I don't see how that is possible if audiobook inherits from web publications instead of from the general manifest. |
There was indeed a long discussion about the validity/invalidity of the packaged form and the WG, for purely pragmatic reasons, accepted the request that the content in the package can deviate from the WPUB spec on one aspect only: that it is not required to have an PEP but, instead, the manifest alone would be enough for the package. That being said, if the package is unpacked on the Web it is supposed to turn it into WPUB by creating, if necessary, a trivial @llemeurfr I hope what I am saying is correct :-) (also, it may be a good idea to add some words into the LPF document emphasizing these facts to avoid future misunderstandings…) |
Editorially, it kinda feels weird to have [rfc3987], [url] (and probably also [urn]) as normative references, when the URL Standard obsoletes RFC 3987. In our spec a URL is normatively defined by the URL standard. Strings that we're used to refer to as IRIs and URNs are valid URL strings according to this spec (if I'm not mistaken). My suggestion is to:
|
Ya, that's problematic in itself. We should state outright the limitations relative to web publications (e.g., the naming stuff, that the entry page and manifest can't be in different directories, that there can't be any resources above the directory that holds the pep/manifest, that there can't be resources hosted on other domains, etc., etc.). A "Packaging Limitations" section near the top would be extremely helpful so people understand that "lightweight" is relative to what can actually be handled. |
remove unreferenced url and identifier definitions from terminology; change references to urls to url or identifier types; add canonical address relation
I'm not sure if my last commit captures everything, but please have a look and let me know what you think. To recap the changes:
I'm not sure about the last change, but it provides resolution to a preferred address without overloading the canonical identifier. |
@mattgarrish, all in all, the direction does look great, but I do have some comments. Not in priority order:
|
re. #456 (comment), Ivan summarized the situation very well: a Package can represent a publication we could call a "pre-WP", i.e. a publication made of a Manifest and its resources, which may be exposed as a WP after trivial modification. Every LPF Package can become a WP, and many but not all WP may be packaged as LPF (a "user guide" will be useful to illustrate that aspect). |
The scope of what the specification can actually handle shouldn't be in a user guide. |
re. #456 (comment) from Matt: When a WP can be packaged as LPF (again, this is not the case for all WPs), the WP address could technically be retained in the Manifest as a full URL; but as this "frozen" WP is now detached from the Web, it's really better to consider that the WP address becomes the relative URL representing the path to the PEP inside the Package (ex. "url":"index.html"). Note that the PEP already exists for a packaged WP. This corresponds to my proposal in w3c/pwpub#49 (comment). In summary, the discrepancy between WP and LPF isn't wide when we consider the LPF -> WP transform, and totally manageable when we consider the WP -> LPF transform, if applicable. |
If we're making this allowance so that packaged WPs are valid, why should it be a warning not to use a URL? Is there a specific reason why it needs to be a URL? |
Re #456 (comment) and #456 (comment), this is an interesting question: should the LPF specification contain the processing model (algorithms) which specify LPF to WP and WP to LPF (with the contraints on WP structure for being able to package it as WP)? Until now I thought the consensus was NO as LPF is a file format spec (which reuses the Manifest defined for WP and is scoped by the introduction of the LPF spec), and not a pure WP packaging spec (i.e. not the final PWP the WG would like to define). |
re #456 (comment), now that Romain pointed us to the fact that the URL spec now INCLUDES URNs, I don't understand the issue.
|
The definition normatively says a WP may have more than one address. What does that mean if not an array? Other schema.org properties? Other addresses in the "real world" but you're only allowed to practically specify one? |
Sure, but the address isn't a property of the publicationmanifest dictionary. Granted, I'm wondering why we bothered to split the specification at all if we're just working back to packaged audiobooks being somewhat invalid web publications. If we're not attempting to make the packages valid, or we're just going to define WP properties such that they don't have to always be "webby", it's probably more confusion than it's worth. |
@mattgarrish what you just said is key. I forgot that the Publication Manifest specification does not contain the Address and Canonical id, which are defined in the Web Publication Manifest. As the Package is using a Publication Manifest, not a Web Publication Manifest, my issue with Address in Packages is solved. There are still remaining questions related to this PR: |
Oops, I did not remember that. Then the Array is the good one, forget my note. |
It does need some forensics...
I admit it is not very explicit, and it should be. |
+1 |
Actually, I did not realize that the address was not part of the WebIDL before either. The way I look at the difference between Part I and II is somewhat akin to what @llemeurfr said, but not exactly: the Web Publication Manifest is what the content creator has to provide, and Part II, ie, the Web Publications part is how the manifest and is used when things are put on the Web: the PEP, how to locate the manifest, how to process it (e.g., that the PEP's Ie, I do not really think we have some sort of a problem, and I believe the current sectioning is fine.
I do not think so. Separating the pure metadata part from, say, how to obtain the manifest from the address is a good thing imho. The issue of packaging is orthogonal, and should be called out in that document... |
@iherman’s pointers are on point. |
remove canonical address and reintegrate prose; move address to part 1
To see if we can move this PR forward and stop it blocking other work, I've made canonical identifier again a "should" for url and dropped the relation. I've also moved address to part 1 since we seem to be saying there can't be any significant variation between adaptations of the manifest format. I'll take a look at our "URL" terminology again in another PR, as how can you possibly get confused by strings and records? ;) |
Thanks @mattgarrish. We should indeed merge this and we can finesse this later to align it better to the URL spec's terminology... |
This is on the basis of the discussion on w3c/pwpub#47, discussed on the telco on 2019-05-03.
Preview | Diff