Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we identify a web publication and its components? #5

Closed
dauwhe opened this issue Jul 5, 2017 · 94 comments
Closed

How do we identify a web publication and its components? #5

dauwhe opened this issue Jul 5, 2017 · 94 comments

Comments

@dauwhe
Copy link
Contributor

dauwhe commented Jul 5, 2017

From @dauwhe on June 26, 2017 22:17

A Web Publication (WP) is a collection of one or more constituent resources, organized together in a uniquely identifiable grouping that may be presented using standard Open Web Platform technologies. A Web Publication is not just a collection of links— the act of publishing involves obtaining resources and organizing them into a publication, which must be “manifested” (in the FRBR sense) by having the resources available on a Web server. Thus the publisher provides an origin for the WP, and a URL that can uniquely identify that manifestation.

Perhaps the simplest possible answer to these questions is just a URL: https://www.example.com/MobyDick/ would both identify the publication and mean that everything whose URL starts with this is part of the publication.

So I guess that I’m looking for reasons to make this more complicated :)

Copied from original issue: w3c/publ-wg#10

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @GarthConboy on June 26, 2017 22:25

I think we'll need to point to the "manifest" -- we'll need to be able to download or package the entire publication and its constituent resources (given Brady's correct observation that with scripting scanning the markup can reliably determine what's really referenced). Also need to know what markup file should initially displayed and how to progress from there.

If your URL is a "directory root," one could say everything under it is inherently part of the publication (which could resolve the scanning the markup issue [maybe]), but one will still need to find the manifest to know where to start rendering and the reading order thereafter.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

but one will still need to find the manifest to know where to start rendering and the reading order thereafter.

This reminds me of a concern about progressive enhancement. Say I point my browser at https://www.example.com/MobyDick/ but JS is disabled or the user agent doesn't yet support web publications. What should happen?

One option would be to give the first document you want displayed a special name, say, index.html. This file could also point to the manifest, or include it directly.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @mattgarrish on June 26, 2017 22:56

The case usually given for complexity is open textbooks and course packs, where content is aggregated from different locations without having to actually amass the resources under a single domain/directory.

Does "everything" here only refer to html pages? How realistic is it that all the resources are going to be neatly stored together? What if my css is two levels higher up from the publication under a common folder? What if I'm pulling in css or scripts from another domain?

I'm all for simplification, don't get me wrong, but I'm not optimistic about a model that requires the user agent to traverse and parse all the documents to figure out what is in scope and needed, if that's where this is leading.

but one will still need to find the manifest to know where to start rendering and the reading order thereafter

Isn't this where we've considered using link/rel to establish the "belonging"? (And another case of why cross-domain publications get complicated quickly, since their parentage can only be established by starting at an author-controlled location, which then has to be maintained despite what the linked resources might indicate.)

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

The case usually given for complexity is open textbooks and course packs, where content is aggregated from different locations without having to actually amass the resources under a single domain/directory.

Do we need to design something that will support content documents ("spine items" in EPUB-speak) hosted on multiple origins?

https://www.example.com/MobyDick/chapter-001.html
https://www.foo.com/MasterAndMargarita/chapter-002.html

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

I think we'll need to point to the "manifest"

So the URL of the WP would point to the “manifest” rather than a directory? This would then imply (I believe) that the manifest be discoverable from some sort of file. So what sort of file? I would argue that pointing to HTML would be better than the alternatives, given all user agents know what to do with HTML files. But that leaves open the question of whether this HTML file contains the manifest, or just points to the manifest.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @mattgarrish on June 27, 2017 1:40

Do we need to design something that will support content documents ("spine items" in EPUB-speak) hosted on multiple origins?

We need to consider it, at least. Intertwined with what I mentioned above is the problem of iframes and bringing in entire chunks of content below the level of the spine. We need to be open to how the web works and not just publications as we're used to making them.

The problem doesn't seem confined to content documents but affects their constituent resources, as well, so we need some solution.

Taking a publication offline is less of a problem than what happens to references in a packaged web pub. So while we can ignore the problem at this level, we probably do so at our own peril later. Or maybe we add rules farther down the chain that limit what a packaged web pub can reference? (That's kind of a nasty gotcha I'd hate to discover, though.)

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @GarthConboy on June 27, 2017 2:47

"I would argue that pointing to HTML would be better than the alternatives, given all user agents know what to do with HTML files. But that leaves open the question of whether this HTML file contains the manifest, or just points to the manifest." -- interesting. As long as the manifest was discoverable in a known location, I guess that would okay -- I think a browser might be interested in a first HTML page, whereas a Reading System would want to start with the manifest.

"Do we need to design something that will support content documents ("spine items" in EPUB-speak) hosted on multiple origins?" -- I would think "no".

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @iherman on June 27, 2017 7:27

@dauwhe

One option would be to give the first document you want displayed a special name, say, index.html. This file could also point to the manifest, or include it directly.

This is already how the Web works. We routinely use URLs to a directory, and it is up to the server setup on what this means in practice. It can return the index.html file in that directory, if available; in Apache one can actually set up a whole priority list of alternatives. Although not frequent, it can also return, as a first order index.svg (that may be useful for some documents).

Bottom line: I believe your first statement, whereby https://www.example.com/MobyDick/ is the identifier of a particular document is perfectly fine.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @iherman on June 27, 2017 7:29

I think the scope notion of the Web App Manifest is interesting here. If we want to include content document from different "origins", then we may use a scope listing several documents. Although, for different reasons, I may be tempted to say that all directories listed in the scope should be on the same domain.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @baldurbjarnason on June 27, 2017 22:13

The scope notion would play nicely with the proposed packaging spec which IIRC relies on it quite a bit.

Outlining how identification for web publications would work if it followed the expectations set by the rest of the web stack (e.g. web app manifests, atom/rss feeds, etc.):

  1. Each HTML page that is a part of a publication links (with a specified link relation) to a manifest document of some sort to indicate the publication it is a part of.
  2. That manifest then somehow indicates which resources are under its purview. This could be done using scope (like web app manifests) or an explicit listing of some sort, or both.
  3. The manifest lists an authoritative url that identifies the publication it is describing. This might be done indirectly as simply the root URL for the scope it covers or directly as an explicit property. I think most web developers would prefer an explicit URL property but that's just a hunch not backed by data. That would also make the manifest more complementary to the packaging spec if that spec becomes a reality.
  4. That identifying root URL has to return an HTML file that is within the manifest's scope and that file has to link back to the manifest as well.

This is the basic pattern used by feeds, web app manifests, service workers, etc: component files link to a central document with metadata, indication of scope, link to self, and an identifying URL. Even AMP uses a variation of this theme. And as I mentioned above sometimes the identifying URL and scope definitions are interrelated. E.g. atom feeds link to the URL whose updates they list (explicit id, implicit scope).

This pattern gives us discovery (direct links to chapters let you discover the publication ID, its metadata, and all related assets) as well as a single source of truth for the publication ID, publication-level metadata, and publication assets (the manifest). And this guarantees that the publication id is itself a URL to a human-readable HTML resource that in turn lets you discover the manifest.

Of course, this is just going from what you'd expect if you were coming at this from the web development community. I realise that they aren't the only constituency at play here.

And this does not necessarily dictate anything about the format of the manifest. Although, if we're going by the principle of least surprise, most web developers would at least expect a JSON file.


On service workers

Service workers achieve this process programmatically, but the pattern is very similar overall. Although a lot of service worker behaviour by necessity violates common developer expectations.

  • Service workers scope defines the pages whose network access they control. Which (counter-intuitively for some) means that they can control cross origin requests for the pages they control.
  • But the requests the service worker makes itself are no-cors by default (IIRC).
  • You also have foreign fetch service workers who control the network requests for pages outside of their scope as their scope is defined by the resources being fetched not the pages doing the fetching.

Basically, even though service workers are awesome, they do also have a deserved reputation for being confusing (this is only scratching the surface) so anything we can do to avoid that complexity is a win. That means not letting the publication manifest claim scope over cross-domain resources and not letting it control requests in any way.


(Apologies for the brain dump. I didn't have time to edit this down to a concise note 😊)

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @HadrienGardeur on July 2, 2017 20:9

What you're describing is almost exactly what we do in Readium-2 @baldurbjarnason, there are only minor differences or observations that I need to add.

Each HTML page that is a part of a publication links (with a specified link relation) to a manifest document of some sort to indicate the publication it is a part of.

Ideally yes, but what if a resource is included in multiple Web Publications ? What if you can't change the HTML or HTTP headers for that resource ? IMO, such a link to a publication is an important part of how discovery is handled, but it's not an absolute requirement.

That manifest then somehow indicates which resources are under its purview. This could be done using scope (like web app manifests) or an explicit listing of some sort, or both.

In Readium-2 we list all resources under two separate collections: spine for the core resources that are listed in reading order and resources for other resources.

This has some clear benefits over a simple scope:

  • since we know all the resources (URIs) necessary to render a resource, we can easily cache them however we want (Service Worker, App Cache Manifest, proxy, local cache storage for native apps)
  • we also optimize the UX by preloading specific resources (fonts, JS, CSS) and prerendering some of them (using multiple webviews in our mobile apps)
  • since our manifest is using JSON-LD + schema.org, a client that understands schema.org can index these resources as being part of the publication

The manifest lists an authoritative url that identifies the publication it is describing. This might be done indirectly as simply the root URL for the scope it covers or directly as an explicit property. I think most web developers would prefer an explicit URL property but that's just a hunch not backed by data. That would also make the manifest more complementary to the packaging spec if that spec becomes a reality.

That's one of our only requirements. In Readium-2 we always provide a link that points back to the manifest.

The other two requirements are:

  • at least a title in the publication's metadata
  • at least one resource in the spine

That identifying root URL has to return an HTML file that is within the manifest's scope and that file has to link back to the manifest as well.

That's pretty much the only difference between what you're describing and Readium-2/Readium Web Publication Manifest. The "root URL" (a link with self as its relation) points to the JSON manifest, not the first (or any document) from the spine.

One reason for that is tied to the fact that we'd like anyone to create a Web Publication by remixing content already available on the Web.

On Service Workers

I really don't think that Service Workers should in any way influence our design for Web Publications. There are many different ways that content can be cached, and Service Workers are only one method among others.

Let's keep our options open and let people use all the possibilities offered.

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 5, 2017

From @llemeurfr on July 3, 2017 12:58

So, to come back to the initial question, Readium-2 folks propose:

  • an IRI as globally unique identifier for the publication, included in the manifest as one of the few mandatory metadata.
  • a URL linking back the manifest to its origin.

@llemeurfr
Copy link
Contributor

Correction of my previous comment, after discussion in #6:
The unique identifier of the WP is the 'self' link (a URL) to the (original) WP manifest.
If a WP is copied to a new server, it can be considered as a different WP, in which case its 'self" URL is modified.
Or it may be considered as the same WP, in which case its 'self" URL isn't modified and points to the original WP.
Moving a WP to new domain name implies creating a new WP (as the original URL would return a 404, a new 'self' URL is required).

@murata2makoto
Copy link

murata2makoto commented Jul 10, 2017

I think that it should be possible to reference any resource in a web publication (which is referenced by a URI) by URIs. To treat such resources as first-class citizens of the web, use of fragment identifiers should not be required. Moreover, fragment identifiers defined for resources media types should be usable.

An absolute-path reference (a relative reference beginning "/") in such resources should reference another resource in the web publication. Furthermore, the absolute URI constructed from the base URI (i.e., the absolute URI of the web package) and the absolute-path reference should reference the same resource.

When a resource belongs to multiple web publications, depending on which web publication is used a base URI, relative references in the resource should be resolved differently.

EPUBCFI does not satisfy these desiderata.

@mattgarrish
Copy link
Member

An absolute-path reference (a relative reference beginning "/") in such resources should reference another resource in the web publication.

I still wonder about issues like this when multiple domains are involved. Not so much at the WP-level, but at the packaging level. How are such references resolved and domains preserved? Do we at some point need to decide whether multiple-domain publications are out of scope for a first release?

@dauwhe
Copy link
Contributor Author

dauwhe commented Jul 10, 2017

Do we at some point need to decide whether multiple-domain publications are out of scope for a first release?

Perhaps there are two questions. Having multiple domains for content documents seems problematic, and perhaps not worth the complexity. But what about things like fonts and scripts that might come from other origins?

But then our white paper suggests that the publisher has an obligation to provide an origin:

A Web Publication is not just a collection of links— the act of publishing involves obtaining resources and organizing them into a publication, which must be “manifested” (in the FRBR [frbr] sense) by having the resources available on a Web server. Thus the publisher provides an origin for the WP, and a URL that can uniquely identify that manifestation.

@mattgarrish
Copy link
Member

Yes, it's a troubling question. I recall at one point we were discussing URL mapping in the DPIG, but even that's not quite enough, as there have to be rules about domain root independence.

If the manifest were to set a scope and all resources had to be below it, then the problems seemingly goes away, but it also greatly reduces what can be called a publication. Maybe that's not a bad thing, but it invalidates many of the possible applications.

@murata2makoto
Copy link

I think that "/" should always reference the web publication, no matter which resource "/" appears in. I updated my comment above for covering multiple web publications.

@mattgarrish
Copy link
Member

But if we're not redefining how the web works today, how can that work for a publication with documents on different domains?

It can't even work unless the publication root is the domain root, otherwise we're redefining how to resolve a path that starts with a slash, no?

@murata2makoto
Copy link

murata2makoto commented Jul 10, 2017

I would like to begin with desiderata. If we reach consensus on desiderata, we can invent a solution.

But it is true that the domain root should reference a web package. In other words, the domain root of a resource-in-WP URI should contain the WP URI.

@HadrienGardeur
Copy link

HadrienGardeur commented Jul 10, 2017

I strongly object to limiting Web Publications to a single domain, this goes completely against the model of the Web.

On a modern website these days it's not uncommon to have:

  • fonts served by a third party (say Google Fonts or Typekit)
  • JS, CSS and images hosted on a CDN
  • video embeds from services like Youtube or Vimeo

If we require content to be served from a single domain, we're no better than AMP and require a parallel Web to be built specifically for the constraints that we decide.

IMO this is perfectly unacceptable: Web Publications should work with content that exists on the Web today, on as many different domains as the content requires.

For Web Publication and its manifest, what's the issue with using absolute URIs? We can perfectly do whatever needs to be done (preload, cache, prerender...) with absolute URIs.

If you're talking specifically about the use case of transforming a WP into a PWP, that's a very different problem and the difficulty will be tied to the packaging and manifest formats that we select.

The Web Packaging proposal for instance can perfectly support resources across multiple domains: https://github.com/WICG/webpackage#multiple-origins-a-web-page-with-a-resources-from-the-other-origin

@murata2makoto
Copy link

murata2makoto commented Jul 10, 2017

Who suggested "limiting Web Publications to a single domain"? I didn't.

It is true that the path component of a resource-in-WP URI should be able to specify a different domain.

@mattgarrish
Copy link
Member

If you're talking specifically about the use case of transforming a WP into a PWP

Right, and I was speculating about problems we'd face if '/' refers to the root of the publication even though the domain root is not the publication root. The content won't work when it's on the web. I'm not in favour of limiting publications to a domain, but it seems like the only way that could work.

@HadrienGardeur
Copy link

HadrienGardeur commented Jul 10, 2017

Who suggested "limiting Web Publications to a single domain"? I didn't.

I'll quote this thread twice:

From @mattgarrish

Do we at some point need to decide whether multiple-domain publications are out of scope for a first release?

From @dauwhe

Having multiple domains for content documents seems problematic, and perhaps not worth the complexity.

To be fair, @mattgarrish also said:

[...] it also greatly reduces what can be called a publication. Maybe that's not a bad thing, but it invalidates many of the possible applications.

While @dauwhe also pointed out:

But what about things like fonts and scripts that might come from other origins?

IMO, the potential design for PWP shouldn't affect WP in such a dramatic way. Aside from resources (CSS, JS, fonts, images, audio and video) which are often served from a different origin, being able to create a publication across multiple domains would also open up the possibility to remix content from the Web which I personally find compelling.

Even if a single publisher controls a publication, it might want to reuse content across different domains or sub-domains. Let's take an example, publisher A has:

  • plenty of restaurants critics and reviews on food.publisherA.com
  • it also has a separate website with a focus on traveling on travel.publisherA.com

Publisher A decides to remix content about a specific place (let's say Rome) and create a new publication together. It would make a whole lot of sense for this publication to simply point to content documents and resources on food.publisherA.com and travel.publisherA.com instead of being forced to re-publish them somehow.

@HadrienGardeur
Copy link

Right, and I was speculating about problems we'd face if '/' refers to the root of the publication even though the domain root is not the publication root. The content won't work when it's on the web. I'm not in favour of limiting publications to a domain, but it seems like the only way that could work.

I don't think that we need a publication root or the equivalent of the scope element in Web App Manifest.

We can either reference resources in the manifest using:

  • an absolute URI (no problem working with those)
  • or a relative URI, which would always be relative to the self link for that manifest

A scope works when you want to be vague about the constituent resources that are part of an app. If we take a different approach, one that's more declarative (spine, resources) than scripted (Service Worker), having a scope is completely redundant.

@murata2makoto
Copy link

murata2makoto commented Jul 10, 2017

Right, and I was speculating about problems we'd face if '/' refers to the root of the publication even though the domain root is not the publication root. The content won't work when it's on the web. I'm not in favour of limiting publications to a domain, but it seems like the only way that could work.

I think that we need a new URI scheme whose authority component can contain an absolute WP URI and whose path component can contain an absolute resource URI or a relative URI of a resource in the WP.

@mattgarrish
Copy link
Member

it might want to reuse content across different domain or sub-domains

Right, I don't disagree. My point above was only that there's a lot of simplicity in not having to deal with the issues of multiple domains. I can see arguments for it.

I'm arguing for a decision, not necessarily advocating a position. Are there requirements we can start taking for granted as we weigh deeper into the issues so we know how to judge proposals.

Can we drop the idea of a scope and move ahead with an assumption of a declarative file set? Are there objections?

It doesn't mean we don't have to revisit our thinking later, but we can't stay open to all options.

@HadrienGardeur
Copy link

It's also entirely tied to how we package a publication.

If we use the Web Packaging draft, URI stability is available by default since the package is designed with the concept of URIs from scratch. There's no need to do anything specific in the manifest.

@murata2makoto
Copy link

murata2makoto commented Jul 13, 2017

@lrosenthol

@murata0204, Is there more to "URI-stableness" than just the fact that
relative URIs are the same package or not? it might help us understand
more if you could explain more about what exactly this is and why you
believe it is important.

First, why is this important? Smooth transition between a PWP and a WP
is needed for the unification of the Web world and the EPUB world. It should
be possible to create a WP from a PWP (and vice versa) simply by packaging and
unpackaging. Rewriting requires parsing and should be strongly avoided (although
rewriting of the manifest is probably required).

Second, what do I mean by "URI-stableness"? My second desideratum is
about it.

Desideratum 2: Relative references to resoucers-in-(P)WP should be
stable after packaging or unpackaging. See
1.1(https://www.w3.org/TR/pwp/#whatisawebpublication) in the W3C PWP

Suppose that a relative reference in a resource A in a WP is resolved to
another resource B in the same WP. From this WP, we create a PWP
by packaging. Resources A and B are contained in this PWP. It is required
that the same relative reference in A in the PWP is again resolved to B
in the PWP. Here a relative reference is either absolute-path (i.e., begins
with "/") or relative-path (Desiderata 9). Note that EPUB 3 allows
absolute-path references.

The same applies to unpackaging. Suppose that a relative reference in resource A
in a PWP is resolved to B in the same PWP. From this PWP, we create a WP
by unpackaging. Resources A and B are contained in this WP. It is required
that the same relative reference in resource A in the WP is resolved to B
in the WP.

@HadrienGardeur

If we use the Web Packaging draft, URI stability is available by default since the package is designed with the concept of URIs from scratch. There's no need to do anything specific in the manifest.

How do you unpackage a PWP comprising a manifest http://example.com/manifest.foo, an HTML file, /one.html and /two.html? one.html contains both /two.html and two.html as relative references. Can we put the manifest of the WP at a non-root file of a domain?

@HadrienGardeur
Copy link

How do you unpackage a PWP comprising a manifest http://example.com/manifest.foo, an HTML file, /one.html and /two.html? one.html contains both /two.html and two.html as relative references. Can we put the manifest of the WP at a non-root file of a domain?

I feel that unpackaging as a requirement is a mistake and we shouldn't do it.

This WG is not about transporting resources from one server to another, and since URIs can be spread across domains it's impossible to do what you're suggesting anyway.

@murata2makoto
Copy link

I feel that unpackaging as a requirement is a mistake and we shouldn't do it.

This is certainly debatable. I would like to have discussions about such high-level
topics first. Jumping into details is very wrong.

@lrosenthol
Copy link

lrosenthol commented Jul 13, 2017 via email

@mattgarrish
Copy link
Member

@mattgarrish

Actually, it was @lrosenthol, but thanks for thinking of me!

Smooth transition between a PWP and a WP is needed for the unification of the Web world and the EPUB world.

But this is give and take. Can we expect publications to have their own domain? This is no different than the question I asked about using content negotiation and having one directory per publication. The con of such approaches is that we're imposing potentially onerous web architecture requirements on authors. Do we want a domain per article of a journal, and then another domain where all the articles have to be duplicated for the full journal? Whatever decisions we make have to be considered across all the architectures, yes.

Note that EPUB 3 allows absolute-path references.

I'm not finding this. OCF says that all resources must reference each other through relative paths. I tried an absolute path out of curiosity, and epubcheck couldn't make any sense of the reference and threw errors, so is anyone using them if it's true?

The lack of roundtripping from WP->PWP->WP that seems unavoidable with multi-domain resources may be mitigated at the EPUB level, as EPUBs that are probably going to continue to created as a bundle of relatively-located resources and won't face the same unpacking issues. They may flow more easily into a WP environment. If you're drawing sources of cross-domain web-hosted content into an EPUB, what is the likelihood you're doing so with an expectation of another party unpacking it? It's intended for an EPUB reading system to ingest.

We might have to make it an advisement not to use absolute paths if you expect roundtripping, to allow for web-born publications, but it doesn't seem problematic with how epubs are constructed today. But maybe I'm missing something.

@HadrienGardeur
Copy link

The lack of roundtripping from WP->PWP->WP that seems unavoidable with multi-domain resources may be mitigated at the EPUB level, as EPUBs that are probably going to continue to created as a bundle of relatively-located resources and won't face the same unpacking issues. They may flow more easily into a WP environment.

I agree that it's much easier to go from EPUB to WP than from:

  • WP -> PWP (unless we use Web Packaging)
  • WP -> PWP -> WP (IMO impossible in many different situations)

For EPUB 4, it'll depend on what we end up using in terms of packaging.

Going from EPUB to WP is pretty much what the "streamer" component of Readium-2 does:

For WP -> PWP -> WP, we could populate a proxy or a CDN in very specific situations but can't expect to simply unpackage in a folder.

@GarthConboy
Copy link
Contributor

WP -> PWP -> WP (IMO impossible in many different situations)

This is pre-supposing a lax multi-origin decision on WP, right?

@mattgarrish
Copy link
Member

This is pre-supposing a lax multi-origin decision on WP, right?

That's the $64,000 philosophical question we keep bumping against.

Do publications handle what the web can throw at them, or is only a subset of the web able to be a publication?

@HadrienGardeur
Copy link

This is pre-supposing a lax multi-origin decision on WP, right?

That's not the only restriction.

For example:

  • the issue isn't strictly with a single domain (example.com) it also applies to subdomains (sub.example.com)
  • if a manifest is available at https://example.com/pub/manifest.json and it references a CSS at https://example.com/style.css you're also going to be in all sorts of troubles if you're using relative paths and ZIP as a container. If the manifest is always at the root of the package (manifest.json) how do you reference the stylesheet with a relative path?

I see way too many restrictions that shouldn't exist in these discussions.

Any resource on the Web should be a potential Web Publication resource. Instead we're trying to build some sort of special snowflake that has nothing to do with how the Web actually works.

@GarthConboy
Copy link
Contributor

Any resource on the Web should be a potential Web Publication resource.

I'm not sure I'll disagree, but that's a big decision we need to get to as a group.

If a WP is not some sort of subset of the Web, then it's nothing special (snowflake, or otherwise). :-)

@mattgarrish
Copy link
Member

If a WP is not some sort of subset of the Web

It's a bounded set of resources; that's what makes it a unique subset.

Being a subdomain of resources isn't all that unique, just a limitation.

@murata2makoto
Copy link

@mattgarrish

First, sorry for my mistake. I fixed it.

Note that EPUB 3 allows absolute-path references.
I'm not finding this. OCF says that all resources must reference each other through relative paths. I tried an absolute path out of curiosity, and epubcheck couldn't make any sense of the reference and threw errors, so is anyone using them if it's true?

Since absolute-path references are relative references, I believe that they are allowed by EPUB3. But it is certainly possible to disallow absolute-path references in our future specs. Note that unzipping will invalidate absolute-path references.

@mattgarrish
Copy link
Member

Since absolute-path references are relative references

Yes, that's true, but I didn't think it was technically allowed. As an epub is supposed to work on file systems, a path that starts with a slash is relative to the root of whatever drive the content is in. It's only within the abstract container that it makes sense, or if you handle the epub as web content in its own domain.

I think that's the problem epubcheck has with them. Since its validating on a file system, I believe it expects the '/' to resolve to the current drive root, and then complains that the resource is outside the container.

@murata2makoto
Copy link

Here are some high-level questions raised during recent discussions.

  • Should any resource on the Web be a potential Web Publication resource?

  • Are multi-(sub)domain WPs allowed?

  • Does packaging allow any relative reference to occur in a resource
    in the given WP? (relative-path relative reference only? Required
    to resolve to some resource in the WP?)

  • Is round-tripping (WP->PWP->WP) required? Should the generated WP
    be identical to the input WP?

@HadrienGardeur
Copy link

It's a bounded set of resources; that's what makes it a unique subset.

Yes, and it's bounded by its manifest.

"One manifest to rule them all, and in the JSON bind them"

@tcole3
Copy link
Contributor

tcole3 commented Mar 12, 2018

Tim Cole will look through this thread (and issues in the PWP repo) for discreet, potentially more up to date related identify issues that need to be opened (i.e., that have not yet been opened in one repository or the other). In anticipation of this we should close this issue.

@iherman
Copy link
Member

iherman commented Mar 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests