-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: an HTML-first Table of Contents approach to Web Publication #35
Comments
Very well thought out proposal - thanks for submitting.
Overall, I think it solves a lot of problems and I like the direction. I
also like the fact that an author that doesn't want/need it, can simply use
CSS to make it hidden and leave it strictly for the UA.
That said, I have a big concern with it - that it doesn't actually provide
a way to define "publication wide" information. Let's consider the
discussion around Language definition (#29). If I have a publication that
is multilingual, where some primary resources are in English and others are
in French (such as in parts of Canada) - I can't do that in your method
since the lang needs to be that of the resource in question.
If we can find a way to solve the publication-wide issue, then count me
in. Otherwise, sorry...
|
@lrosenthol great question! HTML saves the day again. 😄 I'd expect that the So... <html 📖 lang="en">
...
<nav>
<ol>
<li lang="jp"><a href="hello.html">今日は <span lang="en">Konnichiwa</span></a></li>
<li><a href="author.html">About the Author</li>
</ol>
</nav>
...
</html> Additionally, authors would also have the power of Hope that helps! |
Echoing Leonard, thanks for an insightful proposal, replete with demos! My main issue (and one that could be resolved with additional metadata or links) is around secondary resources -- we need to be able to use whatever we come up with to take a WP offline, and to transmogrify it into a PWP. Yes, the demo does some caching, but that can't be done deterministically without an explicit list of secondary resources (e.g., to avoid pulling in the entire web by following a link to wikipedia). We have (I think) agreed that such a list is a WP requirement -- whether that list is a MAY, SHOULD or MUST has been purposely left TBD. So, we would need to augment this proposal with a way to specify secondary resources -- either metadata, or additional (non-displayed) nav sub-elements, or header links, or some such. |
@GarthConboy the whole publication is available offline--including secondary resources. 😄 Also, CORS, CSP, the Single Origin Policy, and many other such things set a boundary on ServiceWorker's (on which the demo is based). Additional constraints--beyond those being defined elsewhere by browsers--could certainly be specified. That said, there's no additional need to re-state the secondary resources. The browser knows how to get them, and does. Packaging's a separate topic, but one we don't prevent with this approach afaict. |
Several things:
|
On Wed, Aug 16, 2017 at 5:12 PM, BigBlueHat ***@***.***> wrote:
@lrosenthol <https://github.com/lrosenthol> great question! HTML saves
the day again. 😄
<https://www.w3.org/TR/html52/dom.html#the-lang-and-xmllang-attributes>
I don't see that, sorry...
I'd expect that the lang attribute on the ToC would be considered the
"primary language" of the ToC document as well as the publication.
Agreed. However, what if the publication has dual primary languages - as
in my Canadian example (which is a legal mandate for government
documents/publications in parts of that country). HTML (AFAIK) doesn't
allow for multiple values to lang.
Additionally, lang is a global attribute, so is usually on (nearly)
everything.
<li lang="jp"><a href="hello.html">今日は <span lang="en">Konnichiwa</
span></a></li>
So that's a great example. What language is "hello.html" going to be? No
idea, because you can't put a language on the link element (IIRC).
|
|
On Wed, Aug 16, 2017 at 5:17 PM, Dave Cramer ***@***.***> wrote:
Several things:
1. The language on the "index" file would describe essentially what
language you want any UI to be in. This is like #29
<#29> where there's a "manifest
language".
Absolutely not - in either case, @dauwhe.
First, UI/UX is completely out of scope to any of the work we are doing
here (manifest, content, etc.). Unless you mean any UX that is presented
as part of the OWP content itself - in which case, I agree, but then it's
just content.
Second, while it has been suggested that way, I don't see that value in the
manifest as the "manifest language" - but the language(s) of the
publication.
1. The nav could indicate the language of a linked resource via the
(unknown to me until minutes ago hreflang attribute: <li lang="fr"><a
href="c002.html" hreflang="fr">Entre Terre et Ciel</a></li>
Cool! I wasn't aware of that either. Nice find!
1.
If you want to create metadata that describes the language of the
intended audience of a page, rather than the language of a specific range
of text, do so by getting the server to send the information in the HTTP
Content-Language header. If your intended audience speaks more than one
language, the HTTP header allows you to use a comma-separated list of
languages. source
<https://www.w3.org/International/questions/qa-html-language-declarations#metadata>
You are assuming that users have any control over the servers on which
their content lives...an assumption which is almost never true outside of
professional publishing. Try setting those headers on DropBox or Google
Drive :). (or for that matter, getting Amazon to include them with your
publication :))
1. Any general-purpose metadata vocabulary (RDFa, etc.) could provide
more granular or detailed information about language.
It could also be used as part of the <nav> as well, to support other
metadata aspects that we haven't considered. Would also (somewhat) force
the use of systems like schema.org - and that's a good thing (IMO).
|
Quick thoughts:I keep changing my mind back and forth on the ‘book’ attribute. I’m told that browser vendors are not fond of mode switching in general (i.e. some sort of toggle that changes how the page works as a whole) or HTML profiles specifically so relying on that sort of functionality might be risky. It works well when you know you aren’t going to rely on browsers implementing the toggled features (like AMP) but creating a ‘fork’ of sorts of HTML as a format might be a hindrance to browser support, long term. I wonder, since Digital Publishing WAI-ARIA seems to be a fait accompli, whether the presence of
|
I just did a little experiment with one of our demo books, and removed several chapters from the My real hope is that it would be fine to enumerate secondary resources in complex situations, but that an author would not be required to in simple situations. |
Since we've moved from an email thread to Github, I'll repost my initial comment here.
You're conflating two different things here:
They can be the same thing for a novel like Moby Dick but they can also be vastly different, for instance a ToC could:
Saying that a list of primary resources duplicates a ToC (or any other navigation) is therefore incorrect in the general case. The "huge benefit" for caching secondary resources comes at a very expensive cost:
|
@BigBlueHat responding to:
I don't think that's the case, the specification of secondary resources is needed to constrain what's taken offline or coalesced into a PWP. Not an issue of getting too little, it's preventing getting too much. But, happy to defer this discussion until the call on Monday. Perhaps @dauwhe 's comment of:
can save the day, but I'm not sure that "complex situations" won't be most/all situations. :-) |
Also after carefully reviewing the proposal, there's absolutely nothing that I haven't seen proposed before (during the BFF work for example), aside from the fancy 📖 . I still think that this is a bad idea and that HTML is poorly suited for our use case:
|
Bravo 🎩 and 🐳 for the very interesting concrete proposal! 🎉 I'm ambivalent about the conflation of ToC and resource lists, for the same reasons @HadrienGardeur and @baldurbjarnason already exposed. I like the approach's simplicity, but I'm not (yet?) convinced that it scales well to the diversity of non-trade publications, and the complexity of ToCs. Another use case to consider –and maybe it's a stupid idea– is one where the ToC itself is dynamic, and is updated dynamically following the users' reading (for instance, the reader discovers new chapters when reading, or there's some in-book purchase options, or some content in educational material is unlocked by a student, etc). HTML's inherent dynamicity could be a deal-breaker for what is essentially a static bit of info. In any case, I believe the two approaches can probably be combined: @dauwhe’s and @BigBlueHat’s approach doesn't prevent the linking to a static JSON manifest, which could include a list of resources. |
Warning: I'm in the mood for a rant before I head out on holiday and leave the internet behind :)
|
I'm always in a mood to reply to a rant too.
I don't think anyone ever suggested 3 different XML files as the foundation for a WP. But at the same time, our goal is to create a format that can handle various types of publications, not just novels.
... and this has nothing to do with going full HTML. It's perfectly possible to have progressive enhancements with a JSON based external manifest linked from primary resources too.
Which is unrelated to our discussions in this specific issue. I would point out that sacrificing primary and secondary resources just to achieve "HTML purity" can be quite destructive and impactful too.
Good luck to authors dealing with RDFa. |
Wow, people have been busy while I was asleep:-) I try to avoid things that have been said by others. Just a few, hopefully additional remarks. I put them into separate comments, to make it easier to respond and followup. On the problem of secondary resources: finding them can be drag on a User Agent. @dauwhe said that in his experiments this happened automatically; that is reconformting. However, I know that in respec2epub tool (turning respec documents into EPUB) locating those secondary resources was the main problem. We know that listing all those is an issue for authors, that we may need fallbacks, use URL patterns (separate discussion on this with @HadrienGardeur notwithstanding), etc, in a concrete manifest, but we should not underestimate the problem. |
I want to take a step back, however, because there are aspects that I do not really understand. We do have the concept of an abstract manifest and this proposal concentrates on one or two particular manifest items, and the serialization thereof. That is one part of the discussion. The abstract manifest referst to other items like the language tag. There was a separate discussion with @lrosenthol on the language tag issue, which seems to suggest that the proposal is to serialize the whole of the abstract manifest in an HTML file, more exactly the What does this mean for manifest items that do not have direct counterpart as HTML elements or attributes (i.e., in contrast to language)? An example may be the canonical identifier. Let alone more complex metadata that would have to be expressed in its own syntax anyway. I presume the idea would be to use the
There may be other issues as well, like the fact that the |
However... I see a lots of merits in the proposal insofar as making it easy to author simple WP-s. We should not underestimate the power of this. There is an interesting section in the Web App manifest that does refer to the issue in general. It also contains the following:
What this tells me is that this may be an avenue to expand on the fallback idea, which we discussed before (like on our last meeting). Just like we said that if the concrete manifest does not have a title, the UA would make an attempt to locate the title in one of the primary resources, can't we expand this in general: something like
I guess that would make it possible to rely on the undeniable attractive aspect of the proposal (make it easy for simple cases) but make it possible to express more complex cases. It is a bit of an additional drag on UA-s, but if it makes life easier for publishers, it may be worth it... I am not sure this line works, but it may be worth exploring it imho. |
FWIW, it seems that Service Workers are now "In Development" in Webkit: |
I also thank Dave and Benjamin for this concrete proposal and prototypes. remark: this proposal enforces the view that Web Applications and Web Publications are different levels of distribution: a "save to homescreen" feature can be added to a Web Publication, a Web Publication Manifest is not necessarily mixed with a Web Application Manifest. I personally like it. question: How do Dave/Benjamin represent alternative navigation structures (e.g. list of illustrations) ? |
Plus another question: How do Dave/Benjamin satisfy the Requirement 21 of https://www.w3.org/TR/pwp-ucr/ ? Req. 21: There should be a way to discover that one or more new components have been added to or deleted from a Web Publication. |
The link types are listed in the HTML spec[1]. The WG would have to, in some cases, define new link types, register them somehow, etc, and retrofit it into the HTML spec, because the link types would be valid for documents that are not meant for WP. The same holds for what HTML calls the metadata names for the meta element.
Digging in the Web App Manifest Github issue relative to our own discussion (1) I found the "standardized" list of html meta names (2): seems like a joke !!
(1) w3c/manifest#97 <w3c/manifest#97>
(2) https://wiki.whatwg.org/wiki/MetaExtensions <https://wiki.whatwg.org/wiki/MetaExtensions>
Cordialement, Laurent Le Meur
|
First, huge thanks to everyone for taking the time to read the explainer and try the demo! Second, there's no way on earth that I (or Dave...when he gets back) will be able to wrangle, answer, and address issues as they arise (from each of you) in a single thread--here or on email. So. I'd like to propose the following:
I hope that's a sensible approach that will keep us from re-stating things and tripping each other up. 😃 Thanks again! |
OK, first and foremost, sorry if I’m missing some info, it’s super difficult to keep up as an outsider since pieces are scattered all over the place (including the mailing-list) and it’s sometimes difficult to understand where discussions are going (multiple topics). What worries me at third sight.
Which is “Design for humans.” in the proposal repo. Straw man argument. Depends on the human involved. Jodie Swagger, React.js expert, will find it difficult to read and use, while Johnny Tumbler, good old front-end chap, will prefer that over any other option. And mommy Panoz won’t understand it al all because HTML is gibberish to her. It’s about comfort, fluency, etc., it’s not absolute. And, in the proposal repo,
Another straw man argument as it completely obfuscates context. It ignores a lot of human beings use CMS, which does make authoring easier for them, it also ignores a lot of devs actually use Markdown, etc. Also, given sufficiently deep nesting, everything will suck, be it HTML, CSS or JS, even if you’re fluent in those languages. At some point, it’s just about an awful amount of delimiters you can’t process (tags, characters, etc.). Back to toc.ncx.
Actually, I never had any problem with that, because tools dealt with it for me. On the other hand, when EPUB 3 was released and tools didn’t deal with nav.xhtml yet, it was terrible. I’ve had my fair chair of complex publications, and quite frankly, I would use anything else than a monolithic piece of HTML with attributes everywhere. Long story short, I ended up building a Mac applet: drop your EPUB file on its icon, let it build nav.xhtml from toc.ncx and content.opf’s The easiest authoring is the one you don’t have to deal with (e.g. automate). I urge you to take that into account when designing a proposal. What I find most disturbing though are the cavalier ways in which authors are regularly mentioned. I can hear a lot about them, but cannot read very much from them. The ReadMe in the proposal repo antagonized me to be honest, because it feels like it is using authors as a mere protection, not human beings asked for feedback. I’ve discovered this proposal this morning, discussed it with other authors this afternoon and all of them didn't know it existed. What worries me a little bit more.
This is not a huge benefit. Offline storage is complex, it’s not just about service workers, it’s also about DOM storage, persistent storage, indexed database, etc. but that’s another issue. More importantly, service worker storage is limited:
UA stands for User Agent, which implies authors have to do things responsibly. As a user,
sounds utterly terrible. I would expect you carefully listed which secondary resources should be cached offline, because I don’t want you to bloat my storage. Since
Please also note there was criticism from the dev community because the whole book was cached offline, as opposed as progressively cached. Sorry if that sounds harsh, but quite frankly, some moves are super hostile to authors. If this is going to be EPUB all over again, then I have no interest in caring about this spec. It is not acceptable that authors should be presented with faits accomplis, advocated on their behalf de surcroît. P.S.: Sorry mom. |
@JayPanoz first, thanks for being here! Given your experience with Blitz and EPUB "wrangling" I'm certain your contributions will be valuable. Second, please keep in mind the html-first idea is a proposal. It's nothing like a faits accomplis. In fact, its a proposal for the consideration of a very early-state W3C Working Group who has only just begun writing the very beginnings of a spec that won't be ready for First Public Working Draft status for sometime to come...let alone for it to be published as an official Technical Recommendation. Apologies if that was somehow unclear. Also, the demo is a demo. 😃 It wasn't every meant to be a "this is how browsers will do it." The browsers and reading systems will implement their own offline systems/plans/strategies based on whatever this group produces (regardless of its format). The ServiceWorker demo was simply to show that it could be done no. No, it's not sufficient. Yes, there are limitations. It's a demo. 😁 You also pointed out some concerns about HTML authoring and/or generating. Those I'd very much like to discuss and address. However, that's probably best done on the html-first repo--with references back here (ideally). It's early days yet. In every possible way. Hang in there! |
See comments about Web App Manifest's decision to use JSON at #7 (comment) |
Based on this demo / experiment, HTML seems a good serialization option for ToC. But given that the extent of what might be included in the manifest is still open-ended (see still open #15, #20, #21, #22, #23, #29) , and given the as yet unclear relationship with Web App Manifest, I do not believe that HTML is an optimum choice for Manifests. At the very least the Manifest will be a superset of the ToC and may contain much more than demonstrated here. As a working decision on serialization, I think JSON the better choice for Manifests, in part for the reasons cited by Web App Manifest as explained in #7 (as mentioned above) - see also the discussion in Appendix A of the the Web App Manifest Living Doc itself. Note json can be embedded in HTML so there is potential to consider the option of conflating an HTML view of the ToC and the Manifest in a single file for simple use cases. And even if working decision is made to require JSON as the only serialization for Manifests, we could still decide to allow more than one serialization for ToC. |
Currently, this HTML-first approach covers all the MUSTs outlined in the forthcoming "infoset" PR. I've not yet seen something that couldn't be accommodated by the HTML-first approach. That doesn't mean it doesn't exist, however. 😃 |
After thinking about this for a while I've come to the conclusion that I'm very much against the direction this proposal would take web publications as a format. It's been very valuable in the discussion and has been a useful counter-balance to other proposals. It has helped me enormously in clarifying my own thoughts on the subject. But, in the end, I think the proposal itself is absolutely not the approach we should take. While HTML is a very powerful and useful format and in many ways underestimated in this day and age, it is very much not easy to author and I think @BigBlueHat and @dauwhe are vastly overstating it's author-friendliness. It's actually a huge pain. HTML is sort of manageable when you have several years of familiarity, but even then most of the time people author HTML to make sure they can minimise their need to author HTML: They author templates, which when combined with simple data structures, and either rich text UIs or minimal markup languages like markdown, mean you don't have to deal with HTML on a regular basis and your users never. Going from ePub's XML to HTML is only a marginal improvement as HTML's more forgiving parsing isn't going to be much of a benefit when authoring the manifest's data. Not having to deal with namespaces is a plus but, like I wrote above, only a marginal improvement. Add all of that to the information in @TzviyaSiegman comment #7 (comment) the issues I raised earlier #35 (comment) (i.e. relying on implicit browser behaviours substantially disadvantages a number of really useful edge cases) and other concerns people have raised, the case against HTML as a manifest format becomes quite strong from my perspective. This proposal has weaknesses of its own that are not inherent to HTML as a general approach for the manifest. Its reliance on the browser engine to fill in the gaps of the manifest is going to be risky in practice, in addition to diminishing the format's overall usefulness for less linear publications. The approach:
Many of these ambiguities here are a consequence of caching in this proposal being an incidental side effect of pre-rendering the chapters as opposed to being a pre-defined algorithm that happens to be implemented as a Service Worker. And how you are supposed to derive a set of unique and normalised URLs from the ToC is also ambiguous since it relies on the DOM for href value normalisation (which doesn't handle fragments IIRC). To resolve these ambiguities you need to normalise the data you get from the HTML to clearly defined and specified data structures (primary resources, secondary resources, reading order, basic metadata). And in the web world mapping HTML to data structures the browser is supposed to act on means specifying those structures as a DOM API. Which in turn means we'd need to define how variations in the HTML get interpreted as a consistent API across the web platform. And then we need to tackle the whole WebIDL thing and we're already well into the weeds. Immediately, we are in for a much more complicated format (and API) to specify and implement than with any of the other approaches that have been proposed. And once you add in testing and HTML's inherent complexity when used for basic data structures I am strongly of the opinion that this makes authorship harder in too many cases for me to be comfortable. I don't think this is simpler overall than other proposals I've seen so far. Once you add in the inherent complexities common to all SGML-style markup formats and the work we need to do to remove the built-in ambiguities here, we get to 'just as complicated' at best for all parties involved. When you're dealing with a small number of flat lists of simple objects (like most of the bits in the manifest so far except for the ToC) JSON is, in my not so humble opinion, easier to teach, author, and consume, all else being equal. So, based on all of the discussions so far my ideal—striking a balance between flexibility, complexity, ease of use, and ease of implementation—would be:
All of the above is irrespective of whether the HTML files are used as fallbacks when data is missing from the manifest which is an idea that I firmly support. Specifically, making single HTML publications as easy to author as is possible is a worthwhile goal whose benefits would be far-reaching. But I also think that the simplest way of making that work reliably is by defining a JSON format first that has clearly defined HTML fallbacks. Once you have that then it becomes almost trivial to predictably and automatically transform any given single HTML file into a useful and accessible web publication, as long as the file has no javascript or is of a strict subset like AMP. That capability on its own dramatically increases the usefulness of every single tool, library, app, script or service that is built to support web publications. |
Whereas I agree with most of @baldurbjarnason's comments, I think I disagree with some of the final conclusions. I made a separate set of comments in #32 (comment) which, for the time being, makes me believe that we should start by the Web App Manifest, extend it for our needs (which is possible), but rely therefore on the work being done around that spec in how a WP management could be incorporated into the browser world. Otherwise we will have to reinvent the wheel. Just for the good order, I believe this discussion should take place on issue #7, though, an not here. |
@dauwhe and I have been working on a proposal to use HTML's
<nav>
element as a web publication manifest: https://github.com/dauwhe/html-firstTL:DR define the primary resources of a WP to be the files referenced in the first
element of an "index" file. This file would also host WP metadata.We feel this approach has many benefits:
Human-focused. User agents need a list of primary resources and their default ordering, but so do actual users. Most web publications would benefit from a human-readable table of contents. TOCs are crucial for accessibility.
Simplicity. Given the broad need for a TOC, using that as manifest is a straightforward way to avoid duplication (as in EPUB's nav/manifest/spine/ncx). And we've discovered a huge benefit, as we don't need a list of secondary resources to facilitate offline caching via service workers (see the demo books)!
Ubiquity. Everyone in the web space is already familiar with HTML, and there is a large and mature ecosystem around authoring, rendering, and validating HTML.
Expressiveness. HTML's language and styling support allows for a richer experience for humans.
Progressive enhancement. Existing web user agents know what to do with HTML.
A Path to the future. Every EPUB3 has a nav document. Many "web books" already use such a design pattern.
Note we've created a couple of demo books that work offline, based on the HTML manifest.
Thanks,
🎩 and 🐳
The text was updated successfully, but these errors were encountered: