manifest: requirements for offline #22

TzviyaSiegman · 2017-08-08T13:07:23Z

There is no way to crawl a script and find all the resources it might use or cause to be used. If all secondary resources are not listed, it is not possible to cache a WP offline. I think we either list all resources, or abandon deterministic offline caching.

To be discussed: which resources should be listed in manifest

BigBlueHat · 2017-08-08T13:52:53Z

Using a script to load resources and scripts would be (and is often now) considered an anti-pattern that limits the ability of the browsers to help developers. Things like <script type="module"> and the like are in the works precisely to keep developers from falling into JS-only mode.

It seems safe to say that "deterministic offline caching" will do what it can based on the spec we right, but if developers Do It Wrong their apps (and more importantly their users!) will suffer the consequences.

BigBlueHat · 2017-08-08T13:59:32Z

Also, Chrom(ium) is working on a new "offliner" system to replace (or rather enhance for limited space/processing scenarios) the current rel="prerender" based system it's using now.

Here's the conversation about the new system.

And some of the code they're moving toward.

Certainly it would be worth connecting with some of these developers to see where things overlap in our ideal world and what they have on hand now.

mattgarrish · 2017-08-08T14:09:22Z

I confess I don't understand how all resources can be cached. If the needed resource is dependent on user interaction, for example, you can't know what it is until the user interacts with the interface. There might also be context-dependent ways of calling such a resource that can't all be represented in a manifest. All you can know in some cases is the script that will be called, not what it will call.

And I don't know that we need to optimize web publications in a way browsers don't. If a script has no value offline, it seems like that should probably be a property that HTML defines for the script element, as dropping the file from the cache doesn't remove the element from the DOM. We'd only be "optimizing" one side of the problem.

baldurbjarnason · 2017-08-08T15:45:29Z

@BigBlueHat

Using a script to load resources and scripts would be (and is often now) considered an anti-pattern that limits the ability of the browsers to help developers. Things like <script type="module"> and the like are in the works precisely to keep developers from falling into JS-only mode.

Correct me if I'm wrong, but modules only cover JS resources since arbitrary content types for imports haven't been specified yet? Which means that if an author is using a script to arbitrarily load other resources (like images, video, or JSON data) based on user interaction or some other dynamic value neither type="module" nor rel="prerender" can do anything about it.

And the offliner code is, from the looks of it, an attempt to dynamically offline cache a page without explicit manifests or resource lists provided by the author.

It seems safe to say that "deterministic offline caching" will do what it can based on the spec we right, but if developers Do It Wrong their apps (and more importantly their users!) will suffer the consequences.

We can't prevent developers from doing it wrong. We can't police how people will author files. We've been trying that approach for literally decades now and it doesn't work. The specification should be realistic about what it can mandate. If you want full and universal offline support for all web publications, you're better off trying to build something like what the Chrome team is trying to do: pre-render then store offline. Bits of it will break but it's honestly an approach that will work in many many more scenarios than just demanding that authors always list all resources required for the publication to function or else they'll be scolded vigorously.

It's entirely up to the author which resources are listed in the manifest and its entirely up to the User Agent to try and figure out how to provide a decent user experience with the inevitably insufficient data that the authors will provide as a starting point. That's what it's going to be like in practice and the spec can either reflect reality or not. I'd prefer the spec reflect reality because otherwise we are undermining its usefulness as an implementation guide (implementors will have to deviate from the spec to provide an adequate user experience) and its overall credibility as a standards document.

BigBlueHat · 2017-08-08T16:52:57Z

And the offliner code is, from the looks of it, an attempt to dynamically offline cache a page without explicit manifests or resource lists provided by the author.

Yep. That's what it does now. What I'm leaning toward is a way to tell the browser more explicitly + user interaction (most likely) that the user wants to "keep" a publication offline. It would use the "kit" being developed now plus a distinct user interaction moment (like clicking "Reader Mode" in Firefox these days) and bring the entire publication "into" the browser/reading system/thing.

Also, I've never suggested most of what you just said I suggested. 😃 We will develop testable, verifiable MUSTs in our spec that conforming implementations MUST support to get that coveted gold ⭐️ Web Developers who want their stuff to work as intended will follow the spec (and all the content generated later about the spec) to build Web Publications that work in those implementations.

When they drive off the map, there will be consequences...same as with any spec + implementation combo. There's no reason to reiterate that people will Do It Wrong.

baldurbjarnason · 2017-08-08T17:25:17Z

@BigBlueHat

Yep. That's what it does now. What I'm leaning toward is a way to tell the browser more explicitly + user interaction (most likely) that the user wants to "keep" a publication offline. It would use the "kit" being developed now plus a distinct user interaction moment (like clicking "Reader Mode" in Firefox these days) and bring the entire publication "into" the browser/reading system/thing.

Also, I've never suggested most of what you just said I suggested. 😃 We will develop testable, verifiable MUSTs in our spec that conforming implementations MUST support to get that coveted gold ⭐️ Web Developers who want their stuff to work as intended will follow the spec (and all the content generated later about the spec) to build Web Publications that work in those implementations.

What you are describing are SHOULDs, otherwise the whole gold star concept is meaningless. Like I've been trying to say, MUSTs have consequences—they are things that are absolutely required under every circumstance or the sky will fall—and indicating a MUST when a strongly recommended SHOULD would do is pretty darn developer-hostile and absolutely undermines adoption.

You can tell developers that if they want a certain feature to work, they need to do X, Y, or Z. That's fine. And you can tell implementations that they need to do X, Y, or Z to be able to reliably offer a specific feature. That's also fine.

What you can't do is mandate that all developers accommodate said feature even when they have no desire or economic incentive to do so. A huge number of them won't, even if that results in invalid files.

Remember, from the perspective of a web developer, the entirely of the web publication manifest is an optional extra. If we're really unlucky, our complex demands will make devs simply ignore the entire idea and just carry on as they have so far. Possibly just relying on unreliable browser offline features instead of web publication specs because just letting the browser do its work is so much simpler in practice.

And the publishing industry already has ePub 3 and PDFs. Their adoption isn't a certainty either unless we take pains to ensure that web publications are considerably simpler than, say, just making a subscription website with a service worker. Or, even continuing to use ePubs and PDFs.

The more granular we make the features of web publications, the more likely it is that the concept will see adoption and implementation.

I'd even go so far as to suggest that each individual aspect of the manifest be specced individually as either extensions to the web app manifest or to the HTML standard itself (i.e. publication metadata = one spec, spine = another spec, secondary resources = yet another spec) so that they can be adopted and implemented individually. But it's way way way way too early for that conversation 😄

When they drive off the map, there will be consequences...same as with any spec + implementation combo. There's no reason to reiterate that people will Do It Wrong.

I strongly believe there is a reason to reiterate that again and again. It should be a running theme throughout the standardisation process. People will Do It Wrong and in large enough numbers for it to be an issue. And because it hasn't been A Wrong Thing To Do for them so far, we aren't really in a position to scold them for it. We can entice—"you get this nice feature if you list all your resources"—but not mandate.

Unlike many other specs, we are building on top of a pre-existing ecosystem and platform and have limited standing to just waltz in and lay down all-or-nothing rules.

So… I have no problem with any of what you describe as long as they are SHOULDs. We can even hammer on how important they are and that the feature they enable is really nice so people really really should do the work. But we can't just set a hard rule that they MUST do it.

I mean, we can try to set that hard rule. They just won't follow it.

TzviyaSiegman · 2017-08-08T17:51:30Z

Philosophy of spec writing is a fascinating topic, but let's get back to the issue. What are the requirements for making a WP available offline?

lrosenthol · 2017-08-08T17:54:18Z

On Tue, Aug 8, 2017 at 12:52 PM, BigBlueHat ***@***.***> wrote: What I'm leaning toward is a way to tell the browser more explicitly + user interaction (most likely) that the user wants to "keep" a publication offline. I'm OK with that sort of things as one option that an author can use to

have their publication go offline - but certainly not the only way. As long as the author can also use other methods (such as explicit ServiceWorker development) and ignore the declarative model entirely - all is good.

lrosenthol · 2017-08-08T18:01:58Z

On Tue, Aug 8, 2017 at 1:51 PM, Tzviya ***@***.***> wrote: What are the requirements for making a WP available offline?

While I think that's a good question to ask, it's a bit misleading. A WP can go offline today without anything being done in the UA, such as via technology like ServiceWorkers. So from a WP author's perspective, there is nothing more to do - they have everything they need. Maybe we think we need to make it easier for WP authors to have their publications go offline, perhaps by putting requirements on UAs. If so, then we need to look at the problem from the UA perspective and not the WP one. And here's where it gets fun - a user may wish to take a WP offline that wasn't designed/authored to be taken offline. Is that a problem we are trying to solve? and what about the author who wants to restrict the "offline-ability" of their publications?

rickj · 2017-08-08T18:11:14Z

“What are the requirements for making a WP available offline?” Coming in late to this conversation, but is it a requirement that offline=browser access only?

…

-Rick

rickj · 2017-08-08T18:13:15Z

And here's where it gets fun - a user may wish to take a WP offline that
wasn't designed/authored to be taken offline. Is that a problem we are
trying to solve? and what about the author who wants to restrict the
"offline-ability" of their publications?

Agree that this is an issue to discuss. For us, the ability to be only online, only offline, or mixed is a supply chain decision that the owner/distributor controls.

bduga · 2017-08-08T18:15:11Z

I'm getting to the point where I don't even understand what we are talking about. So, "A WP can go offline today without anything being done in the UA". What does that even mean? What is a WP in that sentence, a Web Publication? I didn't think those even EXISTED today, let alone had features like going offline. And this concept of a WP designed to go offline vs not go offline is strange. The ability to be functional while offline is one of the "most important and high-level characteristics" [from the charter] of a WP. And yes, today, I can write a web app that has all the characteristics of a WP as listed in the charter. But do we really intend to have every WP be a web app that can cache itself? Isn't the whole point of this spec to avoid that? Otherwise, what are we writing? I can already do everything the charter calls for in a web app, no new specs needed! I was under the impression our goal was to make a specification for creating a web page or pages that had certain fundamental characteristics not intrinsic to arbitrary web pages.

lrosenthol · 2017-08-08T18:39:36Z

On Tue, Aug 8, 2017 at 2:15 PM, bduga ***@***.***> wrote: I'm getting to the point where I don't even understand what we are talking about. So, "A WP can go offline today without anything being done in the UA". What does that even mean? What is a WP in that sentence, a Web Publication? I didn't think those even EXISTED today, let alone had features like going offline.

Sorry @bduga - point well taken. I should have written that a web page can go offline if it wishes to.

And this concept of a WP designed to go offline vs not go offline is strange.

Why? It's just one of a series of possible approaches that we could take - Put the burden of "offline-ability" entirely on the author - Put the burden entirely on the UA - Some combination of the two

And yes, today, I can write a web app that has all the characteristics of a WP as listed in the charter. But do we really intend to have every WP be a web app that can cache itself?

Maybe...I would say that this is a similar discussion/debate to where the UX of the publication lives and who controls it.

Isn't the whole point of this spec to avoid that? Otherwise, what are we writing?

That's one of the many things we are discussing (or need to be discussing).

I can already do everything the charter calls for in a web app, no new specs needed! I was under the impression our goal was to make a specification for creating a web page or pages that had certain fundamental characteristics not intrinsic to arbitrary web pages.

For those things that we felt were necessary for publications that were not already present - sure. But we have also said (multiple times) that if something already exists in the OWP, we should use it.

lrosenthol · 2017-08-08T18:40:42Z

On Tue, Aug 8, 2017 at 2:11 PM, Rick Johnson ***@***.***> wrote: “What are the requirements for making a WP available offline?” Coming in late to this conversation, but is it a requirement that offline=browser access only?

User Agent (UA) access, yes. Since a WP is consumed by a UA, at least so far as our definitions exist today

baldurbjarnason · 2017-08-08T18:49:29Z

@bduga

I was under the impression our goal was to make a specification for creating a web page or pages that had certain fundamental characteristics not intrinsic to arbitrary web pages.

This is probably the root source of many of these disagreements. I'm definitely not working towards that goal and it's not the goal of my employer. I'm working towards extending the web's feature set to accommodate the working group's stated requirements for web publications. The authors of arbitrary web pages should be able to use the Working Group's output to add individual publication-style features to their web pages. If they add them all—congratulations—you have a web publication.

That's a very different goal that requires a considerably different approach from the one you state.

"The UA can make your publication function offline if you provide a list of its resource—no service worker needed" is a feature that can be assessed, specified and implemented independently from other features that fulfil the publication requirements. And that's it, that's the requirement right there: the UA needs the author to list the resources the publication wants to be made available offline. This could then potentially be widely used outside of publications specifically and would improve the web as a whole. If we are speccing offline-publications as an independent feature in a specification of its own (as I think we should but, again, that's a topic for a later debate) then we could have it as a MUST because otherwise offline is meaningless. But in the context of the manifest as a whole, it has to be a should.

That is, if we're taking the approach of extending the web feature by feature. If people want to intentionally diverge from regular web pages, then that's a different thing entirely.

"This webpage becomes something fundamentally different from a regular web page if you add these bunch of things together, and you must add them all for it to work" is a Different Thing. And, yes, that different thing requires an all or nothing approach because otherwise you're just extending the web's feature set imperfectly and one at a time and you're back at the other approach.

The let's-make-a-fundamentally-different-kind-of-web-page is a valid approach to take. But it's also uninteresting to many of us who are coming at this from the web end of things. And I suspect that includes many browser vendors.

baldurbjarnason · 2017-08-08T20:08:01Z

Irrespective of our differences in overall goals, offline is rather complicated. (As others have pointed out frequently.)

Even just the idea of relying on a list of resources in the manifest (primary + secondary) to let the UA cache the publication raises some questions:

How does this interact with the publication's service worker, if it has one?
Is this a mechanism for pre-populating the service worker cache (that the service worker can then manage) or a separate mechanism (that only activates if you don't have a service worker) or a mix of the two (e.g. always fills the cache but only handles request/cache update strategies if there isn't a service worker)?
How are the manifest and the offline resources updated, e.g. in case of a vital security update?
Does the UA have to check the manifest for updates or changes on a regular basis?
Does it check very time it goes online, whenever the publication is opened, on some pre-determined schedule, or all of the above?
Would this involve WebSub or push notifications in some way?
Should the UA notify the reader of updates?
How does the UA know that the offline resources have online updates even after it has an fetched an updated manifest?
Is the modification time for each individual resource listed in the manifest?
If not, does the UA have to make conditional HTTP requests for every single offline resource every time it goes back online?
Does it check for updates each time the resource is requested?
What happens when an updated manifest would delete large parts of the publication?
How does this interact with the Web Packaging specification that's in progress?
Isn't this redundant work if Web Packaging becomes a reality?

The answers to each of these questions has implications for what we need to put in the manifest for offline to work. E.g. if we don't want to require UAs to regularly re-request each resource, then offline requires modification times in the manifest in addition to a resource list for it to function properly. And much in the same way that the fetch api is now how WhatWG is defining network requests in general, we probably will have to specify this behaviour in terms of how you'd implement them in a Service Worker, even if that's not how everybody will implement it.

The simplest thing to do, AFAICT, is to define offline publications as pre-populating a default Cache store coupled with a pre-defined caching strategy (e.g. either cache and update, or cache, update, and refresh) that is overridden if the publication provides a service worker. The service worker can then take over managing that cache by opening it by its predefined name. That way we only need to provide a list of resources without worrying about an update strategy.

mattgarrish · 2017-08-08T21:05:14Z

Assuming a default reading order is specified, is there any greater requirement to a manifest than to list all additional non-embedded resources that are rendered to the user? (e.g., non-primary resources referenced by an a tag so the UA doesn't get confused about which of those is in or out of scope)

All subresources, with the exception of some script-used resources, can be determined by the user agent by inspecting those resources, even if it isn't as efficient as having the user list it all out.

Anything else is optional to list, with perhaps strong encouragement to list those pesky script resources.

And/or do we add a flag that content-inspection isn't necessary if a complete manifest is provided?

Would that provide enough information for any offlining solutions, without us getting bogged down in what they might be, and without bringing onerous manifesting requirements to the web?

Or is that too simplistic a thought?

lrosenthol · 2017-08-09T00:43:49Z

On Tue, Aug 8, 2017 at 5:05 PM, Matt Garrish ***@***.***> wrote: Assuming a default reading order is specified, is there any greater requirement to a manifest than to list all additional non-embedded resources that are rendered to the user? (e.g., non-primary resources referenced by an a tag so the UA doesn't get confused about which of those is in or out of scope)

@bduga is the only person that has requested that. The rest of us are perfectly fine without this...

All subresources, with the exception of some script-used resources, can be determined by the user agent by inspecting those resources, even if it isn't as efficient as having the user list it all out. That's incorrect. Since HTML, CSS, SVG and other WP tech can also

reference other things - it's not necessarily only scripts.

mattgarrish · 2017-08-09T01:44:06Z

That's incorrect. Since HTML, CSS, SVG and other WP tech can also reference other things - it's not necessarily only scripts.

But if they're explicitly referenced, they can be found by inspecting the resource, else how does anything load in a browser?

The one fault in the model is resources that are dynamically needed, as discovering them requires initiating the script(s). Sometimes they'll be static resources and can be listed, other times...

Aside from caching, listing the primary resources along with those that will be directly rendered provides a full context of what is in the scope of the publication. If you don't have this information, how does the user agent know the bounds of what belongs to the publication? (e.g., to unload itself?)

baldurbjarnason · 2017-08-09T03:10:12Z

@mattgarrish

But if they're explicitly referenced, they can be found by inspecting the resource, else how does anything load in a browser?

The problem is that both CSS and JS are both complex languages that can dynamically respond to context and user input. And SVG has a set of animation elements that can dynamically modify the attributes of other elements (and, quite frankly, are a security hazard).

So if you want to statically analyse a publication (i.e. without actually rendering it in a browser view) to discover which resources are needed, you are going to miss a bunch of stuff.

HTML is pretty simple to analyse: stylesheet links, app manifest links, the prefetch/preload/prerender trifecta (as these might indicate dynamically loaded resources), scripts, src attributes, srcset attributes, href and xlink:href attributes in SVG elements, then inline styles. Which leads you to CSS, which is a bit tricky as you need a proper CSS parser to find all of the url() values but you can with a bit of work get a list of resources out of it that is exhaustive in all but the weirdest of edge cases.

SVG is doable if you ignore things like and (those are a pain in the rear in general). With a bit of work you might even be able to cover the animation element edge case as well.

JS requires running the actual code in an actual browser to get anything meaningful so that's out of the picture for static analysis.

Assuming JS-loaded assets are by definition external to the publication (even if they aren't, really) and assuming that HTTP content negotiation always returns predictable and roughly equivalent resources (which it should if it's working properly) then yeah, static analysis can do the job. It won't do it perfectly, but it will do most of it. And if you rely on authors to plug in the missing gaps in the rare cases they give a damn, you'll might even get to around 85-90%.

I think that is Good Enough™, personally. But others have disagreed strongly and insisted that authors must provide an exhaustive list of resources.

AMP solves this problem by limiting the format to a subset of CSS, SVG and HTML and forbidding non-AMP JS entirely. Which works. It is a pre-existing, offline-capable, and portable version of HTML. The fact that it's basically Google's version of HTML is supremely problematic, of course. As a technology it's full of interesting ideas, though.

Chrome is (if I'm reading things correctly) hoping to solve this problem by rendering the page in the background and doing a full runtime evaluation and inspection to get all resources, static and dynamic. This might still miss out on stuff, e.g. things that will only load on mobile or on desktops, as the background render is a version of the current rendering context. But with a bit of clever querying it's a method that could get them very close to 100%. But this is not an easy path to take.

The one fault in the model is resources that are dynamically needed, as discovering them requires initiating the script(s). Sometimes they'll be static resources and can be listed, other times...

Aside from caching, listing the primary resources along with those that will be directly rendered provides a full context of what is in the scope of the publication. If you don't have this information, how does the user agent know the bounds of what belongs to the publication? (e.g., to unload itself?)

I'm not sure what you mean here. The UA will always know which resources it has stored offline so it follows that it can remove them if needed. At runtime it knows all of the resources, at least in this context, because it's running the publication's JavaScript and CSS.

iherman · 2017-08-09T10:22:45Z

@baldurbjarnason

@bduga

I was under the impression our goal was to make a specification for creating a web page or pages that had certain fundamental characteristics not intrinsic to arbitrary web pages.

This is probably the root source of many of these disagreements. I'm definitely not working towards that goal and it's not the goal of my employer. I'm working towards extending the web's feature set to accommodate the working group's stated requirements for web publications. The authors of arbitrary web pages should be able to use the Working Group's output to add individual publication-style features to their web pages. If they add them all—congratulations—you have a web publication.

I must admit, but it may be my happy-pills (™Garth) but I do not really think you disagree. Indeed, I do not see where @bduga said otherwise. Maybe one should say "web site" rather than "web page", but I believe the goal is the same.

iherman · 2017-08-09T10:37:42Z

Referring back to what @mattgarrish said in #22 (comment) I have the impression that, in fact, we do have some sort of a consensus for now (remember that the goal is to come up with a First Public Working Draft ASAP, and not solve all the problems between now and the end of the year!). Indeed, it seems that a list of the resources as part of the Manifest is Good Enough (™Baldur). In my view, this is the answer to the question raised in the issue. Indeed, this works for the important use cases that, at least, I have in mind (e.g., I want to be able to look at a research paper on the Web and be able to read it offline or online).

Will there be cases when this set of information will not be enough (eg, if the user uses all kinds of sexy javascripts dynamically loading things as a result of interaction)? You bet there are. Does it mean that not all Web sites can be turned, in fact, into a Web Publication? Yep, that is true. So what? We are not aiming to change the Web as a whole, we did not said every Web page can be a WP; we merely aim to provide a way for “publications” (like the F1000 article I referred to) to find their place on the Web as first class entities. And that is perfectly enough for me.

Will there be open issues? Yes, that is possible. Let us list them, record them, refer to them from the spec and move on for now.

lrosenthol · 2017-08-09T11:06:31Z

On Wed, Aug 9, 2017 at 6:37 AM, Ivan Herman ***@***.***> wrote: Referring back to what @mattgarrish <https://github.com/mattgarrish> said in #22 (comment) <#22 (comment)> I have the impression that, in fact, we do have some sort of a consensus for now

You really have that impression?? I have the exact opposite impression. I see two *very* divided camps. There are those that want to (mandate having to) list non-primary resources and those that do not.

Indeed, it seems that a list of the resources as part of the Manifest is Good Enough (™Baldur). In my view, this is the answer to the question raised in the issue.

As long as the list is *optional* (not even a should, but a may!) - then I agree we would have consensus. But I am not sure if everyone is even willing to go with that.

mattgarrish · 2017-08-09T11:56:35Z

I'm not sure what you mean here. The UA will always know which resources it has stored offline so it follows that it can remove them if needed.

Sorry, it's probably the weird terminology of not having a concept like a "web page" to refer to.

Yes, the user agent will know what subresources a primary resource needs, but we've said in another issue that not every resource that is directly rendered to the user (i.e., not wrapped in html or svg but standing alone in the viewport) has to be listed as a primary resource (the non-linear issue).

So, say I have a choose your own adventure book. The first document has a couple of a tags that refer to the possible continuation points and an a tag that goes off to some wikipedia article for more information about dinosaurs. None of these are listed as primary resources, but obviously a couple of the links go to additional documents in the publication. Unless I list these in addition to the one primary resource, though, the user agent has no idea the next pages are part of the publication. As far as it is aware, I've left the publication, just as if I'd gone off to the wikipedia page.

I'm not suggesting that the authors list every script, style sheet, image, etc., although they could have the option to do so. But to establish the bounds of the publication we need to know everything that the user is possible to encounter in whatever reading progression they follow that is considered within the scope of the publication. If we require all those resources, the subresources can (for the most part) be programmatically determined.

I had assumed that establishing the bounds was an important part of a web publication, as it is what allows for features like taking the publication offline.

The problem is that both CSS and JS are both complex languages that can dynamically respond to context and user input.

I stand to be proven wrong by saying this, but CSS is, in my mind, easier to determine the possible necessary resources for. Yes, which ones to apply can only be known at run time, but a user agent could grab them all for caching. At least more easily than JS. Maybe it grabs them all, maybe it only takes the CSS applicable to the current context -- those are issues we don't have to solve if we don't try to make our own caching mechanism.

(Of course, if the CSS itself is dynamically generated on the server, all bets are off, but we can't try to handle everything.)

lrosenthol · 2017-08-09T12:09:47Z

On Wed, Aug 9, 2017 at 7:56 AM, Matt Garrish ***@***.***> wrote: I'm not suggesting that the authors list every script, style sheet, image, etc., although they could have the option to do so. But to establish the bounds of the publication we need to know everything that the user is possible to encounter in whatever reading progression they follow that is considered within the scope of the publication.

Why do we need to know that? You are pre-supposing some sort of implementation or requirement that has not been either stated or agreed to.

If we require all those resources, the subresources can (for the most part) be programmatically determined. True. But again, we don't have a requirement that says we need that... I had assumed that establishing the bounds was an important part of a web publication, as it is what allows for features like taking the publication offline.

No it is not. It is *just one way* that would allow this. It is not the only way nor necessarily the way we have agreed to.

The problem is that both CSS and JS are both complex languages that can dynamically respond to context and user input. I stand to be proven wrong by saying this, but CSS is, in my mind, easier to determine the possible necessary resources for.

Compared to JS, yes CSS is easier. But there are still a whole lot of dark corners where things can hide and be missed.

laudrain · 2017-08-09T12:30:53Z

I had assumed that establishing the bounds was an important part of a web
publication, as it is what allows for features like taking the publication
offline.

No it is not. It is just one way that would allow this. It is not the
only way nor necessarily the way we have agreed to.

Yes it is, and we agreed upon it in a long mail thread titled "definition of Web Publication".
Perhaps it was as a conceptual item. Now the point is how to specify it, and I agree with Matt.

iherman · 2017-08-09T12:49:51Z

@lrosenthol, in #22 (comment) you said:

You really have that impression??

I have the exact opposite impression. I see two very divided camps. There are those that want to (mandate having to) list non-primary resources and those that do not.

You are right on this aspect. What I was reflecting on (forgetting this) is whether there is any other information that a Manifest must provide for offline usage and, I believe, we have not listed any.

So you are right, there is the issue on whether all the secondary resources must be listed or not, which is mostly discussed in issue #6.

Let us go back to the definition of the WP in the current draft:

A Web Publication is a collection of one or more primary resources, organized together through a manifest into a single logical work with a default reading order. The Web Publication is uniquely identifiable and presentable using Open Web Platform technologies.

and it also says

A secondary resource is one that is required for the processing or rendering of a primary resource.

First of all, what this tells me is that the manifest must list all the primary resources of a WP. If it does not do it, it is simply not the manifest of a Web Publication. It can be useful for other purposes, but we are not talking about that.

I also believe that the manifest in the abstract sense must contain information on the secondary resources. As @mattgarrish put it in another comment, the boundaries of a WP must be set, otherwise a WP might fold the whole of the Web.

We are still talking about the abstract information that the UA has to know about via the Manifest. It may be (to be decided further) that this can be done via a means that does not require to list all the secondary resources (e.g., by some scoping mechanism, listing some of the base URL-s whose discovered resources are considered to be secondary resources), but that is a matter of the practical realization.

To summarize: I am strongly in favour to say that a manifest MUST include information about all the resources, primary or secondary. Put it another way, it MUST ensure that the UA is in position to discover the boundaries of the WP, and to decide whether a particular resource is within or outside a Web Publication.

iherman · 2017-08-09T12:53:30Z

I have created a separate issue (#23) to concentrate on the question whether the manifest must contain information on secondary resources or not.

baldurbjarnason · 2017-08-09T15:33:34Z

@iherman
Agreed that since must-ness is a separate issue we have a rough consensus on this issue: the publication should list its resources in the manifest, primary and secondary.

The UA can implement other mechanisms to improve the offline user experience, but that's up to them.

The author can use a service worker to add logic and dynamism to how the publication works offline (or just to improve its caching when online), but that's also just up to them.

We will need to outline how the caching mechanism is going to interact with service workers but that's also a separate issue.

Does that make sense?

@mattgarrish
It makes the most sense to me if the bounds of the publication are defined by the HTML resources listed in the manifest. It shouldn't matter how the reader encounters the web page, if a web page isn't in the manifest, it isn't a part of the publication even if it is linked to from the ToC as a chapter. This would also give authors a UX incentive, hopefully, to be more thorough in what they list in the manifest.

My personal rule of thumb, which may or may not be useful, is:

The boundaries of a web publication are defined by the primary resources (who can have a number of media types) and the HTML resources listed in its manifest, and the subresources they include.

Subresource being, using the SRI spec's definition: resources fetched by a web page.

The boundaries of a web page and whether it can be fully offline are two separate issues, IMO.

A publication is often going to have subresources within its boundaries that aren't listed in the manifest and thus aren't secondary resources in publication terms (even if we demand that authors not make such publications at the pain of invalidation, it's going to happen).

I don't know if that works as a formal definition but I've found it to be a useful guiding heuristic when I'm thinking about this topic.

iherman · 2017-08-09T15:47:01Z

@baldurbjarnason

@iherman
Agreed that since must-ness is a separate issue we have a rough consensus on this issue: the publication should list its resources in the manifest, primary and secondary.

This has been refined a bit in #23 (comment). We are getting there...

iherman · 2018-03-02T10:43:40Z

Propose closing: the draft has now a number of references and to this, and this issue became extremely long an a bit lost focus. We may be better off closing it and, if necessary, open new, more focused issues when the time comes.

jmulliken · 2018-03-12T21:54:54Z

Closing this issue and redirecting conversation to Issue #141 in Affordances

iherman · 2018-03-13T09:07:24Z

See also https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2018/2018-03-12-minutes.html#resolution5

TzviyaSiegman added the topic:manifest label Aug 8, 2017

TzviyaSiegman mentioned this issue Aug 8, 2017

Minimum Viable Manifest #15

Closed

iherman mentioned this issue Aug 9, 2017

MUST the manifest include information about secondary resources or not? #23

Closed

tcole3 mentioned this issue Aug 21, 2017

Proposal: an HTML-first Table of Contents approach to Web Publication #35

Closed

TzviyaSiegman added this to the Offlining Web Publications, relationship to Web Packaging milestone Aug 21, 2017

iherman added the topic:offline access label Sep 2, 2017

Treora mentioned this issue Dec 23, 2017

Feature Request: Linked Dats/files beakerbrowser/beaker#794

Closed

TzviyaSiegman mentioned this issue Feb 13, 2018

WP affords offline reading capabilities. #141

Closed

iherman added the propose closing label Mar 2, 2018

jmulliken closed this as completed Mar 12, 2018

mattgarrish mentioned this issue May 17, 2018

Is an exhaustive "resource list" required to create a Web Publication? #198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

manifest: requirements for offline #22

manifest: requirements for offline #22

TzviyaSiegman commented Aug 8, 2017

BigBlueHat commented Aug 8, 2017

BigBlueHat commented Aug 8, 2017

mattgarrish commented Aug 8, 2017

baldurbjarnason commented Aug 8, 2017

BigBlueHat commented Aug 8, 2017

baldurbjarnason commented Aug 8, 2017

TzviyaSiegman commented Aug 8, 2017

lrosenthol commented Aug 8, 2017 via email

lrosenthol commented Aug 8, 2017 via email

rickj commented Aug 8, 2017 via email •

edited

Loading

rickj commented Aug 8, 2017

bduga commented Aug 8, 2017

lrosenthol commented Aug 8, 2017 via email

lrosenthol commented Aug 8, 2017 via email

baldurbjarnason commented Aug 8, 2017

baldurbjarnason commented Aug 8, 2017

mattgarrish commented Aug 8, 2017

lrosenthol commented Aug 9, 2017 via email

mattgarrish commented Aug 9, 2017

baldurbjarnason commented Aug 9, 2017

iherman commented Aug 9, 2017

iherman commented Aug 9, 2017 •

edited

Loading

lrosenthol commented Aug 9, 2017 via email

mattgarrish commented Aug 9, 2017 •

edited

Loading

lrosenthol commented Aug 9, 2017 via email

laudrain commented Aug 9, 2017

iherman commented Aug 9, 2017

iherman commented Aug 9, 2017

baldurbjarnason commented Aug 9, 2017

iherman commented Aug 9, 2017

iherman commented Mar 2, 2018

jmulliken commented Mar 12, 2018

iherman commented Mar 13, 2018

manifest: requirements for offline #22

manifest: requirements for offline #22

Comments

TzviyaSiegman commented Aug 8, 2017

BigBlueHat commented Aug 8, 2017

BigBlueHat commented Aug 8, 2017

mattgarrish commented Aug 8, 2017

baldurbjarnason commented Aug 8, 2017

BigBlueHat commented Aug 8, 2017

baldurbjarnason commented Aug 8, 2017

TzviyaSiegman commented Aug 8, 2017

lrosenthol commented Aug 8, 2017 via email

lrosenthol commented Aug 8, 2017 via email

rickj commented Aug 8, 2017 via email • edited Loading

rickj commented Aug 8, 2017

bduga commented Aug 8, 2017

lrosenthol commented Aug 8, 2017 via email

lrosenthol commented Aug 8, 2017 via email

baldurbjarnason commented Aug 8, 2017

baldurbjarnason commented Aug 8, 2017

mattgarrish commented Aug 8, 2017

lrosenthol commented Aug 9, 2017 via email

mattgarrish commented Aug 9, 2017

baldurbjarnason commented Aug 9, 2017

iherman commented Aug 9, 2017

iherman commented Aug 9, 2017 • edited Loading

lrosenthol commented Aug 9, 2017 via email

mattgarrish commented Aug 9, 2017 • edited Loading

lrosenthol commented Aug 9, 2017 via email

laudrain commented Aug 9, 2017

iherman commented Aug 9, 2017

iherman commented Aug 9, 2017

baldurbjarnason commented Aug 9, 2017

iherman commented Aug 9, 2017

iherman commented Mar 2, 2018

jmulliken commented Mar 12, 2018

iherman commented Mar 13, 2018

rickj commented Aug 8, 2017 via email •

edited

Loading

iherman commented Aug 9, 2017 •

edited

Loading

mattgarrish commented Aug 9, 2017 •

edited

Loading