-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is an exhaustive "resource list" required to create a Web Publication? #198
Comments
The reading order (however it's created/expressed) points to the primary resources, which in turn state what resources they depend upon. so, the way you would "indicate that a resource is part of the publication" is by referencing it from a primary resource (i.e. The default consumption of the publication within a browser "just works" (as this is how the browser do things now--there's no "resource list" for a web site you visit, just pages referencing their dependencies). The caching/offlining scenario can currently be handled by setting up a ServiceWorker and requesting the primary resources (currently via In either case, anything not in the reading order and not referenced from within a primary resource would not be considered part of the publication. |
One instance of handling "very large resources" on the Web is the Additionally, there is an in-progress spec for adding "HTTP Client Hints". These headers would again allow browsers to make determinations about resource loading based on a combination of the headers and the usage scenario. Point being, determining "very large" is best kept up to the device (and software on it) based on the contextual knowledge of device space, screen size, etc.
Caching processes can be limited (currently) via ServiceWorker Alternatively, more descriptive approaches such as Lastly, I envision the packaging process to be somewhat similar to caching. In the case of Google's Web Packaging format, the whole thing is stored as a set of HTTP exchanges--so presumably you'd have to make all those requests to gather the request/response pairs to store in the HTTP exchange bundle. Internally, the format is very similar to the HTTP Archive format--which can be output from a browsers dev console. Additionally, other webby formats like MHTML (current supported in Chrome) and the TAG's upgrade of MHTML also use HTTP headers to express However, until Packaged Web Publications is farther along there's no immediate requirement (that I know of) to address packaging concerns explicitly in the Web Publication design and architecture. |
There's no requirement to "render" anything--just to make the related HTTP requests and cache (or potentially package) the responses. Additionally, since the dependencies are expressed from each primary resource (i.e. their relationship is known), then the UA could potentially offer the user the option to cache only part of the publication. Whereas, if the list of resources is exhaustive and contains both primary resources and dependencies in a single list (with no stated relationship between them), then the UA could not offer that option because there'd be no way to determine that relationship from the exhaustive list. |
In as much as I'm expecting any supporting Web Publication UAs to be Web-connected (at least at first request of the publication address), then all such UAs would have the ability to: |
The entry page could This is another case where the intention of the author/publisher would be expressed more clearly than an exhaustive list of primary resources and dependencies. In the exhaustive list case there's no expression which items are of greater importance (or larger size, etc). However, in this more contextual case the desire to prefetch/preload are expressible and expressible throughout the publication (i.e. prefetching a font from chapter4.html that's needed for the rest of the publication). |
whilst this is correct, the practical reality is that 99% of authors/publishers/etc. will have no way of influencing the HTTP headers returned by a server to the clients. I do not think we should base our specs on such headers to avoid pulling in possibly large files. |
A resource list is not only relevant in terms of managing caches, packaging, etc. There are affordances that rely on these, too: e.g., a client side search should not be required to search through all references in a content, but only through those that constitute the Web Publication. Beyond all the efficiency, etc, issues: In more generic terms, a WP is actually defined by the list of resources that it contains: that is what makes it a WP in the first place, as opposed to an average Web page. It determines the scope of various metadata, for example. "Just" relying on extracting all the links in the entry page and declare them to be the list of resources is going against the very definition of a WP. |
This scenario is why I'm an advocate for keeping HTML Imports for descriptive (non-JS-based prescriptive) references. Additionally, things like the Essentially, the underlying premise is to reference resources as close to their actual use as possible, and in such a way that requests for those resource can be optimized from an understanding of their use. As you noted, that is currently unclear if one simply uses an unrefined
Much of this will depend on how and what does the packaging. If using one were attempting to package a Web page now which referenced multiple videos (or images or audio) of varying size and quality, it would be up to the packaging software (and the package format it intends to output) which of those resources were added to the package based on similar heuristics used by browsers at request time. The expression of the available options is provided in context and with a refined expression which includes sizes, etc. which can be used by a packaging tool (or browser) to make those determinations given its intended output. A singular "resource list" will either lack that information (as wpub does now) or the addition of such information will ultimately look like aggregating all these in-context HTML expressions into that list--which then ultimately will look rather similar to lumping all the HTML references (with the
Not long term, no. It's what's currently available and (consequently) can be used to inform our thinking when we look toward getting this work into the browsers. The important part of that comment was that "requests outside of that scope (in my understanding) is not handled by Handle Fetch routine. Presumably, something like that would be in place to limit what gets cached." Point being that we may also need (regardless of how we express the structural properties) something similar to the
True. In order to "gather" the entire publication, each primary resource's references must be considered/processed/parsed in order to retrieve the entire publication (into cache, etc). The exhaustive "resource list" approach doesn't guarantee that one can avoid that, however. The exhaustive list also raises concerns of things potentially being (or becoming) out of sync during the publications history (i.e downloading resources one doesn't need or not downloading resources one does need). Ultimately to make the exhaustive list, something somewhere is going to have parse every single primary resource, gather it's dependencies, and put them in that list. That "something" might be a human at a keyboard, or it might be some Python script using BeautifulSoup, or it might be a browser-based editor of some kind. Regardless, in the case of the exhaustive list, any reference made from any primary resource which is needed for the experience of that primary resource MUST be recorded in that exhaustive list. If something isn't recorded in the exhaustive list, then...what happens? Invalid cache? Fallback to the network (i.e. use the Web)?
I'd been seeing this as a potential application of Random Access to Content--especially for massive works like textbooks where only a few chapters/sections may be needed offline at a time. In the exhaustive list case, the entire publication would have to be available/considered/cached before Random Access could be provided. In the case of a textbook from which a student only needs/wants to offline "Chapter 4" (and it's dependencies), it's not clear how the UAs would provide that that Random Access request without either: Ultimately the primary resources themselves become authorities over their own content and dependencies. |
Let's not guess at statistics. 😉 I also wasn't suggesting we "base our specs on such headers" just that they exist (or may/will) and subsequently will be considered by the browser when making any of these requests (see Content Security Policy for one such example). Consequently, we shouldn't ignore their existence--as any deployment of these things will ultimately have to work within them, and conversely should make use of them wherever it makes sense for our use cases. Any spec we bring to a browser vendor will be considered in light of the existing (and potential) Web Platform specifications. HTTP Headers are (increasingly) a big part of that puzzle. |
This is what the "binding" is about. It provides a linear progression of resources which are "bound" into a (new) thing called a Web Publication. Those resources can then be (in some lovely future): searched, offlined, etc. A resource list will be built in either of these cases. It's just the "how" and the "when" that's in question--afaict.
I've never suggested we "[extract] all the links in the entry page," but rather narrowed them (as our spec currently does) to a defined area of expression (currently However, we can certainly say that the approach of gathering the resources vs. listing them exhaustively are different approaches to the same problem--both of which have their consequences (good and ill). |
@BigBlueHat I just try to understand. Is your proposal that, instead of listing the resources in the JSON part, you would use something like
and that (and only that!) defines the list of resources? |
And what if resource2 needs a script or references an image? Each of which would need to be in said list of required resources... we can't require a crawl to generate a resource list. We wouldn't know where to stop. I think a non-explicit or non-exhaustive resource list should be considered a non-starter. |
@iherman as long as everyone's clear about the "something like" part--meaning, there's room for (and should be more!) exploration here. But in short, yes. The proposal is that HTML is the best place to put resource relationships because the mechanics of doing that are already defined (and we can build up from those), that they'll be used regardless (see the failure scenarios mentioned above), and moving these structural properties into JSON brings a less webby model into play upon the Web. |
Then (as happens now) a request for resource2 (either directly or via some process) retrieves those dependencies.
In the exhaustive list case, they would have to be listed in both the exhaustive list and referenced from the primary resource which depends on them.
There's no "crawling" (at least not in the open ended since used here). There are two things:
The "gathering" of All The Things (if/when needed for a use case), would happen by requesting the primary resource(s) needed and then requesting their resources--which is how loading a Web page works now. If one were to collect a specific list of Web pages together (no crawling! just that list) right now, this is exactly the process one would use.
We do know where to stop. We only get the primary resources and their referenced dependencies. We don't "crawl" random, inline For instance, if the Moby-Dick reading order reference an "About the Author" page which in turn linked (inline) to the Herman Melville Wikipedia page, there would be no expectation that the "gathering" process should collect that Wikipedia page--since that Wikipedia page was not included in the reading order.
The list of primary resource is explicit and exhaustive--in that it defines the primary boundary of the publication in a linear fashion. The dependencies are the thing in question. Defining an exhaustive list which contains both primary resources and dependencies (or even just a list of dependencies) seems likely to:
Those would be (some at least) of my "cons" for the exhaustive list approach (echoing the "cons" listed earlier of the "gathering" approach). If there are specific scenarios of use not yet expressed here, it'd be great to hear more about what's informing the "non-starter" thoughts. As yet, all of these thoughts/comments assume consumption of a Web Publication via a Web browser (in the broadest since of the term). |
I just wanted to add a clarifying point of this discussion about why, to my perspective, some of the contributors in this and other tickets seem to be talking past each other. It might help clarify the points of contention. (Obligatory disclaimer: I take no stance on the following dichotomy.)
As I said, I take no stance on either of the sides in this dichotomy, except to say that, as Ivan pointed about above, we actually have documented affordances and use cases. If we think of the two sides in their most extreme representation (which nobody actually is advocating for) as "a WP is just an exploded EPUB viewable in a browser" vs. "a WP is just a collection of HTML pages with some metadata and chapter navigation", both of those would fail to serve our use cases and affordances completely. (As I think everyone would agree!) I don't think there's a fundamental conflict between these two points of view. But I do think it's worth clarifying them, because they are the source of so many of these contentious issues. |
Just as a little experiment, I made an EPUB that did not list secondary resources—CSS, images, etc— in the manifest. This is obviously illegal, and the EPUB rightly failed validation. But it worked perfectly in the three reading systems that I tried. Of course this doesn't prove anything. But the entire web operates without such lists. Web applications don't need such a list. Having such a list involves costs, and we shouldn't immediately assume that the costs are negligible, or that the benefits are so obvious they don't need to be stated. |
As much as I love @dauwhe ... such an EPUB would not work on, say, Google Play Books. And, I view this resource list as required for packaging, or for that matter off-lining, so the fact that it works on the Web or some EPUB RS, doesn't sell me. :-) |
On the bright side, you inspired me to install Google Play Books! And yes, it won't open my little experiment. Does it run epubcheck on upload? Now I have to go write a demo of packaging without an explicit resource list :) |
@dauwhe yes we do run epubcheck on upload. But that just an early detection... we wouldn't serve un-manifested resources. "Now I have to go write a demo of packaging without an explicit resource list" – be prepared to let it run for awhile, as you might end up with the whole Web! :-) |
I think it's also worth pointing out again that the "list of resources" is an optional infoset item. It's up to the author to decide what should or shouldn't be listed in there. Unlike EPUB, there won't be validation involved for that list (EPUB requires authors to list every resource from the package in If you don't want to be bothered by providing such a separate list, that's fine. But you can't expect the UA to be able to guess everything for you in that case, because there are many reasons that this could fail. |
That's not how it's ever worked, @GarthConboy, so I'd appreciate it not being restated as if it were a possibility. It's only causing fears and concerns about things that don't happen. Thanks. The technical response posted earlier hopefully makes it clear that the approach which I've proposed is not in danger of crawling the whole Web: |
Wait...so it's not exhaustive? The resource list is currently defined as "all resources":
If it is instead merely additive (i.e. explicitly stating that the UA not forget to get that resource), then I've far less of an issue with it--however there's still concerns (for me) around moving structural/resource-loading semantics out of the HTML (which is by definition for such constructions) and into a new place. @HadrienGardeur could you clarify whether you believe an exhaustive list is a requirement, or that there's simply a need for something to reference dependencies which should "not be forgotten" (or perhaps a list of things that "must not be gotten" 😉)? Thanks. |
It's always exhaustive in the sense that this list + the reading order taken together are responsible for establishing the boundaries of the publication. But you certainly don't have to list all the resources in there (for instance, you would definitely avoid including your Google Analytics script or similar resources that won't work offline+packaged), it's up to the author to decide which resources are part of the publication or not. |
But it is a possibility. The reading order is much less stringent in its requirements right now than the resource list, so you can't rule out that hyperlinked resources are part of the publication if the resource list becomes optional. Of course, any sensible user agent wouldn't actually crawl them. I think the point is more that this obscures the bounds of the publication. I recall we already went round and round this problem back when we were having fun with primary and secondary resources and their relation to the reading order and resource list. Is it that hard to find a compromise between the two extremes: you must list all primary/top-level resources in the resource list, but should list all dependencies if you don't want to risk a user agent failing to properly cache (or whatever) your publication. You should also list any resources that might not be easily determined by inspection (script-necessary files, etc.). And maybe flag complete/incomplete lists so user agents know which need processing. Sort of a return to: #22 (comment) The answer is probably different for EPUB, where listing all resources becomes a requirement and probably not an unreasonable ask. |
@mattgarrish what you expressed in #22 (comment) is spot on, and describes precisely what I seem to have been failing to express. Thank you, Matt! 😄 What Matt said in #22 (comment) also matches what I'm seeing in @iherman's examples where he's only referenced resources which were not already listed in the spine/ToC:
Additionally, I understand now (based on the comment above and a quick read on
With that as the backstory, I now understand @GarthConboy's concerns. 😃 |
Indeed... I was trying not to type the dreaded |
We may, I think, conflate three problems (at least they are mixed in my mind):
These affect the "offline" aspect of the WP (whether offline temporarily/locally via a cache or a package) but also affordances. The relevant section in the current draft seems to give an answer to (1). Is there any reason to re-open that draft, or should be taken as granted for now? Has there been any new evidences that would warrant to reopen (1)? I do not think so. Ie, we should concentrate on (2) and (3). Note that the draft also says:
which means that there may be references in the primary document that are not part of the final resource list (as it should be, imho), which makes (2) above fairly unclear. |
Indeed. I think it's pretty clear that there is a required list of Primary Resources (default reading order). It seems to me that we need to fully decide whether a WP MUST be offline-able/package-able, or if it MAY be (as in up to the author). The section of the current draft than @iherman pointed to above implies this is a MAY ("it is strongly RECOMMENDED to provide a comprehensive list of all of the Web Publication's constituent resources"), which seems to imply that supplying the list of secondary resources may be optional. I tend to lean more to a MUST. But, if we affirm this, one way or anther, we can decide Secondary Resource list question. |
The Working Group just discussed The full IRC log of that discussion<dauwhe> Topic: https://github.com//issues/198<dauwhe> Github: https://github.com//issues/198 <dauwhe> garth: there's a primary reading order, but what about the rest of the issues. Must they specified fully? <dauwhe> ... there are some comments in there that I like, from Ivan <dauwhe> ... the list of secondary resoources should be those needed for offlining <duga> q+ <dauwhe> ... the Q is whether that list of secondary resources is required. MUST all web pubs be offlinable/packageable? <josh> q+ <dauwhe> ... so the secondary list is author-optional? <garth> https://w3c.github.io/wpub/#wp-resource-list <dauwhe> garth: it does have the statement that it is strongly recommended to supply a list of all resources <dauwhe> ... that's a may, not a must. <garth> q? <dauwhe> ... if that's really what we mean that drives the issue <dauwhe> duga: we can discuss this again, but this exact question has been asked <Hadrien> I think the current draft is fine <dauwhe> ... the very clear answer from the group is that not all publications can be cached/offlined <dauwhe> garth: I don't love it, I can live with it <Hadrien> q+ <dauwhe> ack duga <dkaplan31> q+ <ivan> q+ <dauwhe> duga: I'm not necessarily in the camp, but I asked the q and the answer was clear <garth> ack josh <dauwhe> josh: I'm not entirely clear how the requirement for a resource list factors into packagability <dauwhe> ... even a non-offlineable publication could have a list of constituent pieces <dauwhe> ... but I wanted to weigh in that I feel strongly that WP 'may" be packagable <dauwhe> ... there are a lot of publications where it would be impractical to package <dauwhe> ... and we don't want things to be automatically packageable <garth> q? <dauwhe> ... we want to tag things as unpackagable, possibly with a license <dauwhe> ... in terms of offlinable, I feel less strongly <dauwhe> ... it might be a worthwile challenge to say they must be offlinable <dauwhe> .... what that means to me is that they are desgined so that if you have minimal bandwidth, then a minimal amout of data to start the publication, and it doesn't lock up if you go into the proverbial train tunnel <dauwhe> ... perhaps without videos etc <dauwhe> ... but at least it doesn't "lock up" when it loses connection <dauwhe> garth: let me answer the first thing <dauwhe> ... my view is that the exhaustive list of secondary resources is required to make things offlinable <dauwhe> ... or packaged <dauwhe> ... if that list is missing <garth> q? <dauwhe> ... then you're going thru the list of primary, then web browsers do know how to get the associated resources <dauwhe> Hadrien: the current spec language is fine <dauwhe> ... I don't agree with what you said <dauwhe> ... it's possible to have publications with everything in html <dauwhe> ... with CSS inline, images as base64 <dauwhe> ... so I don't think we should tie ability to offline to a list of resources <dauwhe> garth: then you don't need a list of secondary resources <dauwhe> ... I agree <garth> q? <garth> ack Hadrien <ivan> ack Hadrien <dauwhe> Hadrien: basically the list of resources is not what is going to indicate if a WP is packagable <dauwhe> ... you can always package or cache the primary resources <dauwhe> ... but if your WP depends on JS, css, etc, and you don't include those in the list of resources, than this can affect the quality of the experience <dauwhe> ... this is not similar for all publications <dauwhe> ... some will heavily rely on JS, and if you don't include JS they will break <dauwhe> dkaplan31: I want to say a variant <dauwhe> ... this is a controversial thing <dauwhe> ... we shouldn't just say let's agree <dauwhe> ... we are conflating too many things <dauwhe> ... it's hard to differentiate between packaging and offlining <dauwhe> ... packaging is not the same as rights management <dauwhe> ... packaging and piracy are separate <dauwhe> ... it's important to understand what secondary resources are <dauwhe> ... in some cases ALL CSS and JS are necessary for the publication <dauwhe> ... if they are necessary they're not secondary <dauwhe> ... if the publication is not usuable without the resource, then there's an arguement that it's not secondary <garth> q? <duga> q+ <garth> ack dkaplan <dauwhe> ivan: I won after all :) <garth> ack ivan <dauwhe> ... we have to be careful to distinguish between offline and packaging <dauwhe> ... these are two different things <dauwhe> ... we may have to have yet another entry in our infoset which says does the author allow offlining or packaging <dauwhe> ... if we decide there are non-offlinable WPs, we must state this, we cannot deduce this from magic <dkaplan31> so to clarify: <dkaplan31> 1. Let's have an up/down vote on language if we're deciding today, not a quick silence = consent <dkaplan31> 2. Packaging != offlining != rights management <dkaplan31> 3. If a resource is necessary, it's not secondary. If it's necessary, it needs to be listed. If it's not necessary, it doesn't need to be listed. <dauwhe> ... I would prefer we say every WP is at the minimum offlineable <dauwhe> ... and the author should say this explicitly <dkaplan31> I agree with Ivan, re: offlineable. +1 <Bill_Kasdorf> Note that there is a difference between "allowing" offlining and "enabling" offlining. <dauwhe> ... the list controls what goes into the offline version <timCole> q+ <dauwhe> ivan: I think every web publication is offlineable, but it's up to the author to say what's in the offline version <dauwhe> garth: I think with the language that the full list is recommended but not required <dauwhe> ... that means the pub may or not be fully offlineable <dauwhe> ivan: what do you mean? <dkaplan31> I don't agree with that, Garth. <ivan> q+ <dauwhe> garth: if such a list of 2ndary resources isn't there, then offline would get primary resources plus their direct links, which might not be enough <garth> ack duga <josh> +1 (please God, no DRM) <ivan> +1 to duga <dauwhe> duga: I want to remind people that DRM is out of scope, and we shouldn't worry about it <dkaplan31> duga++ <laudrain> +1 <dauwhe> timCole: we did have a converstation that relates to offlinability and caching <timCole> https://github.com//issues/183 <dauwhe> timCole: that doesn't get all the way to DRM, but in browsers you might say that something shouldn't be cached because it changes too quickly <dauwhe> ... we also haven't defined what offlining means <garth> q? <ivan> ack timCole <garth> ack timCole <dauwhe> ... so I don't agree with Ivan that everything should be techncially offlinable, that might be too much <dauwhe> garth: I'm gonna paste something in that Ivan might disagree with <garth> “Is an exhaustive "resource list" required to create a Web Publication? <garth> No. Such an an exhaustive list may be needed to make the WP fully offline-able or package-able. But, an exhaustive list of resources required (beyond the primary reading order) is not required, as it is up to the author whether a WP is fully offline-able or package-able.” <dauwhe> ... the issue: is an exhaustive list required? I'll propose above as resolution <dauwhe> ivan: you said something that worries me, garth <dauwhe> ... you said you take the primary resources offline, and the CSS and etc... that is pandora's box <dauwhe> ... there are things I didn't mention explicity that are offlined <dkaplan31> q+ <dauwhe> garth: I'm for requiring the list <garth> q? <dauwhe> ivan: I don't think we need a decision <duga> q+ <dauwhe> garth: the spec says the list isn't required <dauwhe> ivan: let's not go into linguistic analysis <garth> q+ Hadrien <garth> ack ivan <dauwhe> ... it just meant that the resource list may be a selective list that are used by WP but selected by what can go into a cache and what can't <dauwhe> ... this is not easy <ivan> ack dkaplan <dauwhe> dkaplan31: i agree with ivan <dauwhe> ... we are not stating about packaging and offline <dauwhe> ... this is a Q about resource list <dauwhe> ... offline and packaging aren't in the scope of the next 11 min <dauwhe> ... but it's legitimate to say what that list of resoruces will enable <dauwhe> ... what is in the resources defined what could be cached, what could be offlined, what could be packaged, what could be preloaded <dauwhe> ... here are affordances provided by a such a list. if a resource isn' tin the list, then these things aren't possible <dauwhe> ... this is just a minimum requirement <garth> ack duga <dauwhe> duga: +1 to understanding what this list is for <dauwhe> ... if CSS is not listed in 2ndary list, am I forbidden from downloading it? <dauwhe> ... one reason we didn't require resources to be in this list is scripting <jbuehler> +1 dauwhe beyond scope of next 10 min - I need to think about this more myself <garth> q? <dauwhe> ... where it might not be possible to determine what resources are used by script <garth> ack Hadrien <bigbluehat> q+ to ask for clarity around how exhaustiveness (related to the "why do we need this list?" questions) <dauwhe> Hadrien: a comment on terminolgy <dauwhe> ... i think primary and secondary are confusing <dauwhe> ... we're talking about reading order and list of resources <dauwhe> ... we need to be careful <dauwhe> ... talkign about offlining is confusing. <dauwhe> ... there are many ways to do that. <dauwhe> ... we should talking about caching and packaging <dauwhe> ... packaging is a way of offlining, too <dauwhe> ... I think default reading order and resources are two lists where we ahve expectation of user agents <dauwhe> ... we expect UA to do something more, put them in package, to have a proxy to intercept requests <garth> q? <dauwhe> ... this is what we should be discussing <dauwhe> ... what the UA should be doing that the web doesn't do now <dauwhe> ... everyhting else should be like the web <dauwhe> ... if we have image with cache-control headers, it might still work offline because of caching even when it isn't in the list <dauwhe> q? <timCole> +1 Hadrien <garth> https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2018/2018-05-07-pwg.html <bigbluehat> +1 to minutes <dauwhe> minutes approved <ivan> Resolved: last meeting's minutes approved <garth> q? <dauwhe> ack bigbluehat <Zakim> bigbluehat, you wanted to ask for clarity around how exhaustiveness (related to the "why do we need this list?" questions) <garth> ack bigbluehat <dauwhe> bigbluehat: the Q that came up as to what H said, why do we need the list... <dauwhe> ... working out the scenarios for its use <dauwhe> ... currently it's recommended, but now we say that it has to include all the primary resources, and we've doubled everything <dauwhe> ... we also need to define exhaustive <dauwhe> ... and we haven't repeated existing components <dauwhe> ... it would be good to work thru the "why is this here" stuff <dauwhe> garth: I'm giving up on my fantasy of closing this issue <dauwhe> ... we do have agreement that the primary reading order is required <dauwhe> s/primary/default/ <bigbluehat> +1 to not restating stuff <garth> q? <dauwhe> ... I would envision the resource list is stuff beyond the default the reading order, so we don't restate stuff <bigbluehat> (as in not restating stuff in the resource list) <Hadrien> +1 to avoid redundancy between default reading order and list of resources <ivan> +1, too <laudrain> +1 |
Comments from WG call: -- Provision of such an exhaustive resource list (resources beyond those in the required default reading order) is, indeed, RECOMMENDED (per current spec). And this likely should be considered settled. -- Some agreement around that if the exhaustive list of resources is provided, it should somehow not duplicate the default reading order -- it should be those additional resources, to avoid duplication. -- It was pointed out that publications can be created such that no such exhaustive resource list is required for the WP to be offline-ed or packaged (all resources bundled in with those in the default reading order). But, such an exhaustive list may be required for full/correct offline-ing or packaging. Garth's proposal for this issue remains: "Is an exhaustive "resource list" required to create a Web Publication? No. Such an an exhaustive list may be needed to make the WP fully offline-able or package-able. But, an exhaustive list of resources required (beyond the primary reading order) is not required, as it is up to the author whether a WP is fully offline-able or package-able." (from the RECOMMENDED above)." |
@deborahgu said that we need to talk more about how such a list would be used. I think this would be very helpful! I'm also wondering if there's some confusion about what such a list makes possible. HTML has APIs to get a list of images ( EPUB had an exhaustive list of all those things, partly to help with validation, and partly so that a reading system could learn something about the resources without parsing the HTML documents. If we're making arguments about that, we should be explicit about them. And we should remember that in the early days of ebooks, many reading systems were stunningly underpowered. We've also talked about how the exhaustive list defines the content of an WP. I'd like to see more detail about that. If I'm reading a WP, and encounter an image that isn't listed in the exhaustive list, what happens? If I then cache the WP, what happens? If I package it, what happens? If I click a link in an HTML document in the default reading order, which leads to a HTML document on the same origin which is not part of the default reading order, what happens? |
I would assume the primary purpose of the resource list is so that the reading system can establish when a user is within the scope of the publication. You don't need the secondary resources for that, but you also can't rely on the reading order in all cases we've discussed. If you don't have a list of all the primary resources somehow between those lists, the publication would get "exited" whenever you navigate to an unlisted primary resource. The secondary resources are inconsequential to this process. It doesn't matter whether what is loading within the page is listed, as we're not going to change HTML rendering/security/etc. As far as offlining and packaging go, having all the secondary resources would speed the process up, and be more accurate in some cases, but why put this requirement exclusively on authors when user agents are capable of performing the step? I suspect that web publication authoring tools are going to give a complete list of resources and make the issue moot in a large number of cases, but I also think the simpler it is to create a web publication, no matter your choice of tools, the better. I sort of question whether a user agent should ever rely on authors to get the list of needed resources correct, or should always be inspecting the primary resources to determine what is needed and might be missing. If we want reliability, depending on authors is not the best idea. |
In response to @dauwhe 's:
and @mattgarrish 's:
I think it's clear that UA/RS' can, for simple publications, suss out list needed to cache/offline/package some WP's with a light crawl of the resources from the default reading order list – finding the CSS, images, and scripts referenced and including (only) those. However, there are clearly cases where that can't be done – e.g., "required" content that is not in the default reading order ( |
@mattgarrish I agree that a list from the author of "supporting" resources such as fonts, CSS, and images is perhaps not terribly useful. Or at least not required for many use cases. The interesting question is HTML that's not in the default reading order, but is linked to from the publication. If it's not part of the publication, then it's just like any other web link. But what if it's part of the publication? EPUB hasn't really solved this problem—just say What does it mean to be part of the publication but outside of the default reading order? This would certainly affect some affordances. "previous" and "next" controls would presumably be disabled. How would one return to the linear reading experience? Display such content as a modal? Provide a link back to where you opened the resource? |
That's why I don't think we can rely on the reading order being a useful listing of primary resources. It's also why the current prose only requires one document in the reading order. Not because some publications will only be one resource, but because people have expressed a desire to create publications that don't offer a linear progression by default but rely on other means, like following links.
It has to mean nothing more special than that there is no automatic path forward, but it's one of a variety of things about bringing an epub reading experience to the web that is thoroughly confusing to me, too. If you go back to a linear document, and it assumes a different next document, what does that do to the browsing history? Assuming new tabs get spawned would make a mess of the link-based model. |
I'm going to repeat what I said during our last call:
Among some of these affordances and expectations:
As we can see, caching and packaging are only two affordances among many others. |
I think we can ask such questions to the Edge team (cc @BCWalters ) since they've already addressed those issues for their reading mode (which covers resources from the Web, EPUB and PDF). As seen in their presentation in Berlin, the reading mode has its own affordance for moving forward/backward in the reading order but they also keep the URL bar and the back button in there as well. |
I must admit I continue to be wary of the approach whereby the UA would crawl through all primary resources to gather the list of all resources. The existence of We may of course decide to include some sort of an "exclusion" list rather than an "inclusion" lists. Ie, instead of listing what the list of resources includes, we may require to list what is excluded when crawling the resources. However, we always have to see what the consequences of bad authoring would mean, and forgetting to explicitly exclude something may lead the addition of a full DNA dataset of several GB-s in the WP… Ie, I am not sure that should be a good idea either. Alternatively, we may make a strong use of Another thing that worries me is the time it takes. While I realize that UA-s operate in much better environments than in the early EPUB2 days, fetching and parsing a whole series of HTML content just to gather what would be part of the WP is still a significant effort; parsing an HTML doesn't only mean parsing the syntax (which is significant already) but also building 2-3 different "trees" (DOM, CSS, Accessibility…). |
Since that quote was from me, I'll respond by saying it's not an approach I would take. But I'm not averse to there being a process whereby the user agent does the work, with a clear caveat emptor for anyone who takes advantage of it. I also don't think a complete list of supporting resources is necessary if, as Hadrien says, you don't want those affordances. But we haven't addressed how an author specifies what a user agent can do with a publication. We'll need explicit metadata at some point. More what I was wondering, though, is whether user agents are going to crawl the resources regardless of any stated completeness to discover whether there are any missing resources. If they have to do this for some, will they do it for all? We don't forbid this anywhere, but we also implicitly accept that if resources aren't listed they won't be put in caches or packages. Should we have both a global no-offlining and per-resource no-offlining instructions so that we don't forgo completeness of the resource list to achieve an unrelated need, and so user agents don't try to put these resources back in the list? |
Can you expand on what you mean by "within the scope of a publication?" In other specs, "navigation scope" is limited by a path prefix such that any navigation request made that does not contain that prefix causes the navigation to happen elsewhere (i.e. outside).
It sounds like you're wanting/expecting a similar sort of experience scenario where navigation is somehow limited within some view/container, and any navigation that happens must match a list of URLs or it opens elsewhere. Is that correct? |
In EPUB, a number of reading apps handle things similarly. Here's how iBooks handles resources:
I'm not suggesting that this is how we should handle things as well, but it's worth knowing about the current behaviour for EPUB. |
I'm not a fan of having "no-caching" or "no-packaging" directives in the manifest. They can be just as easily ignored and this breaks some of the promises of WP. You're right that user agents may attempt to crawl resources and various reading modes (plus dedicated services like Pocket) probably have their own heuristics for that. Crawling is not necessarily a bad thing though, it mostly depends what you do with those resources. In general, I think that UA should only deploy a proxy with a "network then cache" policy on resources listed in the default reading order or the list of resources. We haven't discussed yet how UAs should handle caching, mostly because we keep getting lost discussing "offlining". |
Looking back at the thread, I am a little bit worried that we are complicating things too much. The simple model, whereby it is recommended that the author explicitly lists the resources (that are not in the reading order anyway) as part of the manifest seems to be simple and I do not see any major downside to it. Whether the UA does offlining/caching/whatever with those is not for the author to care about; it is up to the UA. The affordances, whenever appropriate, have a clear scope with those resources. And that is it... The only easy expansion of this may be what I said earlier:
Meaning that these are supposed to be used (ie, considered to be part of the resources) by the UA automatically. (Provided that CPU/memory requirement of crawling is acceptable.) |
-1 to the idea of mentioning While we can't stop UAs from gathering resources in the background, we shouldn't push this forward as an acceptable alternative. |
I think we need to be wary of specifying RS implementation details. I'll posit an (only slightly tuned) proposal for closing this issue: An exhaustive "resource list" is not required to create a Web Publication. Such an an exhaustive list may be needed to make the WP fully offline-able or package-able or to enable provision of other affordances. Providing such a list of required resources beyond the default reading order is RECOMMENDED. However, it is ultimately up to the author whether a WP is fully offline-able or package-able or provides a RS/UA sufficient data to enable all desired affordances. [Note, this is not really my preferred solution, but I think I have been convinced that changing the RECOMMENDED to REQUIRED is likely just not practical.] |
How the publication is visually manifested is more a secondary consideration. What I'm concerned with here is the idea of there being a publication state that transcends the resources. It doesn't matter if that state actually persists in the background of the user agent and is checked against a list of URLs as each new resource is requested, or whether it is unloaded and reloaded with each resource and re-checked. Whatever the case, the state needs to be grounded in a concrete list of resources. Otherwise, what stops me from spoofing your publication simply by putting a manifest link into any malicious document I feel like? Without the bounds, the centre cannot hold. |
No, didn't mean to imply I'm a fan. And maybe I've missed where the discussion now is. I thought I saw that omission from the resource list was an acceptable practice for not caching/packaging resources, but maybe I was seeing things. If so, though, that would make validation and authoring of web publications much harder. Similarly, no-cache would not be an effective means of declaring what to package/not package. Caching and packaging are not congruous concepts. I may not want a resource cached between page views, but I still need it in a package just as I need it to view the page properly. If we want a way to exclude resources, we may need to have something more explicit. I thought we had a use case that would provide a way for authors to indicate that they don't want their publications offline-able or package-able? (Personally, I think offline should always be on the table.) It wouldn't provide any measure security against those actions being taken, of course, but at least conforming user agents would be expected to respect them. |
Maybe we are approaching consensus? From the perspective of a UA, the relevant defining characteristic of a WP for this discussion is that it may be (typically will be) an aggregation of several HTML documents, media files, etc., rather than just a single HTML document or file. The reading order and when necessary a non-exhaustive list of essential 'non-linear' resources must be enumerated sufficient to establish the boundary of a WP. On the other hand it sounds like we do not want an 'exhaustive' list of all resources (are we agreed about this yet?). In fact, given the ample number of specs that tell a UA how to deal with (e.g., how to cache, stream, etc.) HTML documents, media files, etc. I would prefer that WP authors be discouraged from enumerating all the CSS, JS, embedded images, codec files etc. needed to cache, package, render or otherwise process a WP - leave this work that UAs already know how to do to the UAs. I don't think any performance advantage is worth the risk of the manifest in the entry point document getting out of sync with the links in the files that comprise the WP (imagine you change the name or location of a CSS or JS file). I suspect that some or most UAs would ignore the inclusion of such items in a WP manifest list anyway and rely on what they find when they open each constituent file of the WP. However, regardless of our consensus on this point, I do have a couple of concerns left:
|
(Admin comment!) Notwithstanding the (genuine!) issues raised by @tcole3 in #198 (comment), I have the impression (see also the comment of @GarthConboy #198 (comment)) that we have a consensus on the original question of the issue. The answer being "no", and it also looks like the current text in the draft stands as is. I would therefore propose to close this issue with no further action. Except that... there is one problem that is pending and needs further discussion, but it is not exactly on the issue as asked here. I would therefore also propose to propose an issue "how should the infoset item 'resource list' be expressed in the WP manifest?". There were several different approaches listed here on whether it is in the manifest, it is fully in HTML, partially here and there... That should be decided and I do not believe we have consensus on that technical problem. |
+1 to Ivan's above. We have carved out some time for the 2nd issue raised to be discussed at the F2F. Also, at that time, it's worth discussing that this list of "required but not in the default reading order" resources could be needed by a WP to determine its bounds. But, such a list would not need to be "exhaustive" (as much could be determined normal page loading); the list would, however, need to include those resources that couldn't be otherwise determined (e.g., any logical Also, strangely, breaking from EPUB, I view this discussion as really most relevant to WP, not PWP/WPUB4. As in the latter cases, an "exhaustive resource list" could really be considered "the stuff in the package." |
The Working Group just discussed The full IRC log of that discussion<Rachel> Topic: Infoset<Rachel> tzviya: the infoset is a hot topic that leads us down many rabbitholes <Rachel> ...we are going to attempt to finalize the infoset before lunch <Rachel> ...we need to resolve some issues, make the spec more precise, the infoset does not (or does it??) need to include everything <Rachel> ...we can start by going through some github issues <Rachel> ...luc had looked over our existing infoset and let us know nothing is missing <Rachel> laudrain: We should have the simplest infoset possible <Rachel> ...it should be possible to have a web publication starting from the webpage <garth> Requested DPUB-ARIA issue: https://github.com/w3c/dpub-aria/issues/13 <Rachel> ...we had long discussion around things like, does it need a title <Rachel> ...I found the current requirements very short, but enough for the web publicatio and certainly for epub4 in the future <Rachel> ...if we would compare what e have today in epub3, this infoset is too short <Rachel> ...there a gap analysis that I did <tzviya> zakim, open the queue <Zakim> ok, tzviya, the speaker queue is open <Rachel> ...I don't know if we need to add the full infoset <Rachel> tzviya: thoughts? <Rachel> ivan: we have to start somewhere. At some point in time we will begin to map this into clear serialization. <Rachel> ...the bulk of the serialization wil be in json which is inherently extensive <leonardr> q+ <Rachel> ...we should being this work of mapping and then new items may come up <RickJ> q+ <tzviya> https://w3c.github.io/wpub/#infoset <Rachel> ...we can always see if we need additional things but I believe we are at the point that we are ready to get dirty <Rachel> ...bigbluehat breakdown (issue 197) was helpful <Rachel> bigbluehat: Matt has put this in the draft <tzviya> ack leonardr <dauwhe> https://github.com/dauwhe/html-first/wiki/WPUB-examples#1-minimal-wpub-based-on-todays-spec <Rachel> leonardr: I looked over the current draft. I think it's a good set of material, well defined. And we have an extensibility mechansim which gives us a good foundation to start from <tzviya> ack RickJ <Rachel> RickJ: I've not been involved in a lot of the conversation around infoset. Everything I read is around markup except for privacy <leonardr> q+ <Rachel> ....how can we expect the system to know this <Rachel> ivan: the only thing we say re privacy policy is that there should be one and it should be linked from the infoset <tzviya> ack leonardr <Rachel> RickJ: we are clearly defining the markup - what is the privacy policy for <Rachel> leonardr: that is if you have a publication that is declaritive <duga> q+ <bigbluehat> q+ <Rachel> RickJ: we need to make clear what the privacy is for <tzviya> ack duga <Rachel> tzviya: let's open a ticket to clarify this language <Rachel> RickJ will open a ticket re: clarifying language <tzviya> ack bigbluehat <BenWaltersMS> q+ <dauwhe> q+ <Rachel> bigbluehat: I'd love for us to ring out what we're affording in these things (including privacy policy which the reading system may not know what to do with) <garth> q? <Rachel> ....are we saying this because it has an effect on manifest etc <Rachel> ...how does this spill out experientially <leonardr> q+ <Rachel> tzviya: we have to clarify the effect on the user and the system <tzviya> ack BenWaltersMS <garth> q+ <Rachel> BenWaltersMS: of all the infoset, privacy concerns the most... <bigbluehat> if we don't explain what we're affording for with the stuff we're expressing, then we're missing the point of expressing them at all <RickJ> my (first!) issue on the privacy policy info set https://github.com//issues/203 <Rachel> BenWaltersMS: my big concerns are 1. compatibility wuth the web today <Rachel> 2. there's not one privacy policy or one way to interact with privacy <tzviya> q? <Rachel> ...are we enforcing that everyone interact with privacy policies? that they all click yes on them? are we requiring that everyone follow the same policy <Rachel> tzviya: privacy seems like a publisher specific thing <bigbluehat> Privacy Policy was added via PR #95 https://github.com//pull/95 <Rachel> garth: I'm with ben - the farthest we could go with this is that you may put a privacy policy in, and Reading Systems may interact with it <ivan> q? <garth> q- <tzviya> ack dauwhe <Rachel> dauwhe: websites can do privacy policies, most of them do <ivan> q+ <Rachel> ...often with a footer that repeats <DavidWood> q+ <laudrain> q? <Rachel> ...how do we define things that apply to the publication as a whole <rdeltour> +1 <tzviya> ack leonardr <duga> q+ <tzviya> ack ivan <laudrain> q+ <leonardr> https://github.com//issues/204 - calling out UA items <Rachel> ivan: I propse to remove privacy from infoset <Rachel> laudrain: we do not do any privacy policies within epub but we have contracts with distributors that say how the epub can be used <Rachel> ...we do have privacy policies <Rachel> ...we have applications which are programmtic <Rachel> ...they include privacy policies <RickJ> q+ <Rachel> ...there is a question of privacy, usage, and the data that is collected <bigbluehat> q+ <Rachel> ivan: we agreed that this is the minimal basic infoset <Rachel> ... not that this is the complete one <Rachel> ...we acknowledge that additional ones mat come in <Rachel> ...the manifest, the seriaization of the infoset, is based on schema.org struture in json <tzviya> s/mat/might <Rachel> ... I agree with everything you said but we need to decide if this is part of the basic content of the infoset <tzviya> ack DavidWood <Rachel> DavidWood: of all the affordances that we;ve discussed, privacy is the key ddifferenec between webpage and epub. it's more iportant than page break. <Rachel> ...There are lots of good reasons to sweep this under the rug and good reasons to not to <laudrain> +1 <Rachel> ...it is the difference between a society where we have the expection of privacy when reading a book <Rachel> ...if we do the convenient thing by treating privacy as a legal requirement or vendor requirement, we risk being complicit in a shift in the social experience of reading <tzviya> ack duga <Rachel> tzviya: I don't think we can solve that with a privacy policy <dauwhe> q+ <Rachel> duga: I think that's an important point - privacy is important. I don't know that we can hit the requirements necessary to address that in this spec <tzviya> q? <Rachel> ...the interaction of privacy policies make it impossible for us to specify this <Rachel> DavidWood: we should at least make a philosophical stance <Rachel> ...we expect privacy even if it is not provided <rdeltour> /me @hadrien https://rakuten.webex.com/join/wendy.reidrakuten.com <Rachel> DavidWood: I have an outstanding action to approach this in a ticket <tzviya> q? <tzviya> ack laudrain <Rachel> laudrain: <tzviya> ack RickJ <Rachel> RickJ: I am the privacy officer for our company and implemented GDPR <Rachel> ...we're conflating a concern we have with privacy and privacy policy determined by jurisdiction <Rachel> ...we need to separate privacy policy, rather than privacy <ivan> q? <dauwhe> q- <Rachel> tzviya: RickJ and DavidWood will be our privacy task force <tzviya> q? <Rachel> ...please clarify to our group what we can and cannot do <duga> q+ <Rachel> ivan: 3.3.9 doesn't cover what RickJ and DavidWood are talking about <dauwhe> q? <Rachel> DavidWood: in relation to gdpr and I go to a website and the website collects info from me they have to tell me what they're collecting, allow the right to be forgetten, etc <Rachel> ...if we put books on the web anyone that reads a book <Rachel> ...even on an ereader rather than a traditional browser <Rachel> ...the requirements apply <tzviya> q? <Rachel> ivan: what should I, as an author put on the webpage <DavidWood> q+ <tzviya> ack bigbluehat <Rachel> bigbluehat: it's recommended that it be in html - why isn't this just content in the publication <Hadrien> q+ <Rachel> ...if your jurisdiction mrequires it, why not just add it to the content itself and as a publisher, you ccan express the requirements/concerns to the users of your content <tzviya> ack duga <Rachel> duga: I hear the question of what do you want done - privacy policy: is it in the infoset or not <Rachel> ...we have to put privacy somewhere according to w3c policy <tzviya> ack DavidWood <Rachel> ... it's seems that we can't put enough requirements around this in order to put this in the infoset <Rachel> DavidWood: I think part of the reason that we disagree is that we are talking about publications as if they are just content - i take a book, make it into html, and then make an epub3 <Rachel> ...therefore the publisher of the content doesn't have anything to say about proivacy <Rachel> ...that may not be right, because if we allow js to be a part of that package we are in a different environmnet <Rachel> ...ow we have privacy, legislation, and regulation from multiple parties <Rachel> ... there has to be SOME mechansim if we are going to allow js in the packages <Rachel> tzviya: is it required in the default infoset <Rachel> DavidWood: are we ready to say it's not supposed to be in the infoset? <Rachel> leonardr: it's currently recommended - you CAN put it in the infoset, but you don't have to <dauwhe> q? <Rachel> tzviya: in the default metadata set does this belong? <Rachel> ...many of us are saying it does not belong <tzviya> ack Hadrien <Rachel> ...we have other items we must discuss <George> q+ <rkwright> I can hear him perfectly <Hadrien> I think that we need to be careful for infoset/metadata that are not default <Hadrien> they're very likely to be ignored by most reading systems <dauwhe> q? <dauwhe> q+ <Hadrien> for the privacy policy, there's already a rel value in the IANA link registry <Hadrien> why can't we simply use that? <Hadrien> there's no need to do a lot more than point to a privacy policy <bigbluehat> +1 <laudrain> +1 <josh> +1 <tzviya> +1 <duga> +1 <BenWaltersMS> q+ <garth> +1 <George> q- <Jean_Kaplansky> Someone gave Tzviya a gavel!?! <garth> scribenic: Garth <bigbluehat> https://tools.ietf.org/html/rfc6903 has all the goods <BenWaltersMS> q- <Rachel> BenWaltersMS: it's there but not used by anyone <DavidWood> +0 only because I'm uncertain of the consequences <Rachel> tzviya: can you fix that? <bigbluehat> q+ <Rachel> BenWaltersMS: to convince edge to do something like this means convincing the other browsers which means there has to be a major user need <tzviya> ack dauwhe <garth> Ben: unclear anybody uses said IANA privacy link <Rachel> ...it hasn't happened yet, which means it's unlikely <tzviya> ack bigbluehat <Rachel> dauwhe: we have needs that are so specific that it requires a brand new data structure - why can we not just put this in html <BenWaltersMS> +1 dauwhe <Rachel> bigbluehat: its not clear to people who write links that they need to do this - we need to define the affordance before they stick it in there <Rachel> tzviya: we need a proposal for how to include privacy policy within the publication <Rachel> garth: if we can't mandate the reading of thepolicy, it belongs in the content. if the publisher cares - they'll include <Rachel> bigbluehat: until the time the reading system can recognize it <ivan> Suptopic: list of resources <tzviya> https://github.com//issues/198 <dauwhe> github: https://github.com//issues/198 <garth> https://github.com//issues/198 <Rachel> garth: this was raised by Ben - we are in agreement a default reading order as part of the infoset <Rachel> ...this issue is around what other resources may/must/should be included in the infoset <Rachel> ..the reason for includingother resources is to show the bounds of the publication <Rachel> ...for search, offlining, packaging and other affordances of the publication <Rachel> ...end notes and footnotes are an example of the break from the linearpathway <Rachel> ...there may be things in the publication that are not in the default reading order that can't be sussed out <Rachel> ... ie images referenced as top level, CSS probably <tzviya> q? <leonardr> q+ <dauwhe> q+ <Rachel> ... what is required to be in this other list of resources <Rachel> ...reading systems may want exhaustive <Rachel> ...other perspectives say that the web changes costantly, how can it be exhaustive <garth> q? <BenWaltersMS> q+ <Rachel> leonardr: I don't think we need to mandate anything in this list but we should say "if these resources are important to your publication in XYZ use cases, then they must be here" <tzviya> ack leonardr <Rachel> ...having a specific list doesn't buy us anything <tzviya> ack dauwhe <ivan> q+ <bigbluehat> q+ <Rachel> dauwhe: it's a burden on the author to enumerate every single list <duga> q+ <Rachel> ...that kind of thing doesn't happen on the web in general <Rachel> +1 dauwhe <laudrain> q? <rdeltour> q+ <josh> q+ <Rachel> garth: I agree with that and I've come to the perspective that by the time we package this the resource list is exhaustive <tzviya> ack BenWaltersMS <garth> ack BenWaltersMS <laudrain> q+ <josh> +1 dauwhe <Rachel> BenWaltersMS: I agree with everyone. I don't like a partial list that's confusing and so not used. <Rachel> ...if I'm a tool that wants to take web pub offline, how do I know which elements should be? <garth> q? <Rachel> ...images? videos? etc <Rachel> ...how is that decision delineated? <Rachel> ...we need to avid a halfway decision <garth> ack ivan <tzviya> ack ivan <Rachel> ivan: if we have the reading order which are html files mostly, any image or CSS files referred from that reading order are automatically a part of the infoset items <Rachel> ... we have to be precise about htat <leonardr> q+ <Rachel> ...we probably don't want to extend that to videos which are dangerous thing <garth> q? <Rachel> ... any resource that the author wants to be a part of the publication needs to be included <Rachel> ...like datasets <wendyreid> q+ <BenWaltersMS> q+ <Rachel> ...I would feel fine with images, CSS... js is tricky <tzviya> ack bigbluehat <Rachel> bigbluehat: rel=external is in the html spec, which can be used on link and anchor tags <clapierre> q+ <garth> aq? <Rachel> ...and could be extended to making exceptions to what to grab (ie images outside the publication) <RickJ> q+ <Rachel> ...the video tag presents a similiar opportunity - the video and format that I get depends on the venue I am using to view it <garth> ack duga <tzviya> ack duga <Rachel> duga: the manifestin epub is somewhat unecessary - it was written before there was a package document. <Rachel> ...for pwp - it's the stuff that's in the package <Rachel> ...for the web, it's the stuff that's in the web <Rachel> ...for an offlinable wp there will be bits that are not findable <Rachel> ...even the stuff that is findable, the user experience is not great because of the processing need <Hadrien> +1 for what duga is saying <dauwhe> q+ for priority of constituencies <Rachel> ...it slows down downloads and burns through battery <tzviya> ack rdeltour <garth> q? <Rachel> rdeltour: we have to ask ourselves how our publications differe from the web <Rachel> ...aer we asking how to cache? how to work offline? <Rachel> ...the author has control how she develops the serviec worker <garth> q? <garth> ack josh <ivan> q+ on list is also to be used in other affordances like search <Rachel> ...we are not clear what the user agent is in the publication. <Rachel> josh: I am focused on the pub that is not offlinable etc. At some point it may be one of those things. Bt right now, we need to focus on the minimal definition of the WP. <Rachel> ...if you want a WP - you need these three things <Rachel> ...making it offlinable? here are the other things <garth> q? <duga> q+ <garth> q? <Rachel> ...if there is a lot of complication in creating these files <Rachel> ...no one will do it <garth> ack laudrain <tzviya> zakim, close the queue <Zakim> ok, tzviya, the speaker queue is closed <clapierre> q- <Rachel> laudrain: will we have badly structured web publications if we do not provide more information about what's required <tzviya> q? <garth> q? <Rachel> ...I agree with what josh says <garth> ack leonardr <Rachel> leonardr: we seem to be coming down to the offline conversation <Rachel> garth: also search <Rachel> leonardr: not search - from a technical perspective i disagree <Rachel> ...caching - maybe, maybe not <Rachel> ...if this are the use cases, could we treat them as these specific things instead of a special resource list <garth> q? <Rachel> garth: maybe those use cases can be generalized <Rachel> leonardr: I don't think we need to establish the bounds of the publication <Rachel> ...when we take t offline, we need to know what is coming offline <Rachel> josh: you must be able to search within a publication <Rachel> garth: the one bullet point is that we have a bounded publication <tzviya> ack wendyreid <leonardr> @josh - but it's not clear what that means when you have external references... <laudrain> +1 to Hadrien <Rachel> wendyreid: as a user agent/user expereince rep - if we're giving the user the option of offlining <Rachel> ...we need to give the appropriate information <leonardr> @Hadrian - I think it is entirely on your definition of "bounds"... <Rachel> ...want to download this? You're getting 50 gigs of video. Still want it? <tzviya> ack BenWaltersMS <garth> ack BenWaltersMS <Rachel> ...we need the info to be presentable and the user agent and user should have ptions around them <Rachel> BenWaltersMS: is search needed outside the default reading order? is that a requirement? <rdeltour> q? <garth> ack RickJ <Rachel> garth: right now in epub you search the whole thing - is it important to include include nonlinear content in the search <tzviya> ack RickJ <Rachel> RickJ: it seems that when we talk about offline, we talk about packaging as a part of offlining <Rachel> ...it needs to be separate <garth> q? <tzviya> ack dauwhe <Zakim> dauwhe, you wanted to discuss priority of constituencies <Rachel> dauwhe: conclusions - the exhaustive list of resurces is optional <Rachel> ...there are circumatances where it is not needed <Rachel> ...the spec will not stop people from exhaustively listing resources <Rachel> ...we need to clearly define the boundary of the publication <garth> q? <Rachel> garth: I think yes <tzviya> ack ivan <Zakim> ivan, you wanted to comment on list is also to be used in other affordances like search <Rachel> ivan: there is a difference between what the web does and a publication <Rachel> ...if I begin to read a book of 5 chapters and I want it offline, the current web will offline what I read <Rachel> ...the author has to specify somehow <Rachel> ...search is one of the affordances that are important <Rachel> ...personalization as well (ie I want my book to read in night mode) I want that to apply to all chapters, not one <Rachel> ...what dauwhe proposes is incomplete because we have to specify what the user agent does in terms of affordances and offlining. <Rachel> ...there may be an optional list <bigbluehat> +1 to defining affordances all the places <tzviya> ack duga <Rachel> duga: so we're back where we started - we should have an optional list of resoources with clear instructions about what they should be used for <Rachel> ...we also need to define default with or without this list <Rachel> Hadrien: if you don't list something, you can't expect things to work magically <Rachel> garth: details need to be worked out <Rachel> ...but it sounds as though there may be concensus <Jean_Kaplansky> Have fun guys... I'll check in later. <garth> Proposal: There is a default reading order. There is an optional list of resources that may be provided to extend the bounds of the publication beyond the default reading order. <Rachel> rdeltour: you and every guy on tinder <tzviya> s/you and every guy on tinder// <dauwhe> github-bot |
There is now a draft including this. |
This was another topic which surfaced during the #193 discussions.
@HadrienGardeur posed some concerns to a "dependency gathering" approach proposed by @BigBlueHat.
There are several potential consumption scenarios which should be considered:
Current:
Future:
@HadrienGardeur's concerns to the gathering process are below...
The text was updated successfully, but these errors were encountered: