Is it acceptable to use HTML for the serialization of some infoset items, or should it all be in separate (JSON) file? #193

iherman · 2018-05-09T12:18:07Z

This discussion has permeated many of the various issues (e.g., lately, #159, #181, or #186). It would help to get this design principle settled once and for all. In practice, the issue is whether the "entry (HTML) page" could be used as containing the infoset items, or not.

Note that the answer may not be clear-cut, and may depend on the nature of the infoset items. Indeed, it is different if:

the item is, loosely speaking, some sort of a metadata, i.e., expressible via an HTML <meta> or <link> element (e.g., creation modification date or links to an ONIX file)
the item is a slightly more complex structure that cannot be expressed fully in the HTML header (e.g., language and base direction)
the item is an item that would be naturally expressed in HTML, and is often indeed done that way (e.g., Table of Content)

Another aspect that influences this decision is whether the WP consists of a single HTML file (which is also the entry page), with adjunct files like CSS or images. This is the typical case for, e.g., a scholarly journal article.

The text was updated successfully, but these errors were encountered:

iherman · 2018-05-09T12:19:25Z

Trying to collect pros and cons, based also on earlier discussions. (Let us try to collect all the Pro/Con arguments in the most concise manner possible to make an informed decision...)

Pro: mainly in the case of single HTML file based WP this is a natural way of expressing the information.
Pro: provides a much simpler WP structure in a large percentage of cases, closer to the current Web usage
Pro: provides a more natural bridge towards non-WP aware browsers
Pro: forcing all the data into a separate manifest would lead to redundancy, ie, the same data being stored at separate places. This is also error prone. TOC is a typical example.
Pro: the current definition of the '/' elements provide a number of features (e.g., usage the @rel or @type attributes) that we would have to reproduce for a JSON based manifest.
Con: per the HTML standard, the <meta> element's role is to express "document-level metadata" (see html5; emphasis is mine). Using it for expressing metadata for other entities (ie, the WP) is semantically not clean. (In the case of a single HTML file based WP one could argue that the document and the HTML file is the same, which would make it all right.)
Con: the metadata may be used for various purposes handling the WP instance itself, eg, indexing, bookshelves, etc. Parsing an HTML file to extract the information, though would use a standard toolset, requires a significant effort for the User Agent: parsing the HTML, building the DOM, the CSS DOM, the Accessibility DOM, etc, before giving access to the <meta> element. Compared to that, parsing a JSON file into Javascript structures is a breeze.
Con: having both a manifest file and some data in some HTML resources complicates implementations that should follow a more complex path to get hold of the infoset item. (Note, however, that this argument has less weight than ease of authoring; there are more authors than implementers...)

Note that, although we are not discussing WAM-s, the arguments in the section on the same issue in the WAM document (and the links in there) are also relevant.

llemeurfr · 2018-05-09T12:56:45Z

Note that editing such infoset by hand would be equally difficult in the html and json cases. Meaning that whatever the choice is btw a highly specific web page and a json structure, an authoring tool seems mandatory. Which leads to an additional Con.

Con: creating an authoring tool capable of generating an html page both highly semantic (metadata, reading order, tables of illustrations, list of additional resources etc.) and highly flexible regarding its layout would be really difficult. In comparison, creating a tool capable of generating a json structure + letting authors use html tools to create an entry page is a breeze.

HadrienGardeur · 2018-05-09T13:11:21Z

Just a few quick notes first:

we've already agreed on using JSON as our manifest format
which means that we're discussing if some infoset items can be contained in HTML resources from the publication, not all infoset items
the list provided by @iherman assumes that additional infoset items would be expressed on the "entry page" (I hate that term), but it's very doubtful that all navigation (not just the TOC but also page lists, landmarks and various other lists) for example would be listed on a single page or on the entry page

I'd also like to list an additional con: may require additional network requests that could block the processing of the WP.

If I discover a publication through one of its chapter, this means that:

I'll need to fetch the manifest first
then I'll need to fetch one or more HTML resources before I have all the info that I need to properly render the publication and/or provide the relevant affordances

Since I'll only be able to discover these additional HTML resources through the manifest, this means that these fetch requests (plus all the processing related to HTML) will have to be done sequentially and not in parallel.

The majority of the pros listed by @iherman could also be challenged IMO because they're mixing up two different issues:

how we serialize this information
vs how we access it

In the case of a single-HTML document, I don't think that using <meta> + <link>+ potentially RDFa is in any way better than just embedding JSON-LD in the HTML document.

There are less semantic issues with JSON-LD (the metadata is not necessarily about the document that contains them) and I would argue that it's easier to author JSON-LD than RDFa.

To go back to the list of pros, we could also say that JSON-LD embedded in HTML is:

a natural way of expressing metadata for single page publications (cf AMP)
provides a simple structure as well

I'm not really buying the redundancy arguments (we're not expressing the same information) or the more "natural bridge" one (browsers ignore the vast majority of metadata and links that we would end up using in HTML).

I'd like to hear @BCWalters opinion on this as well, now that we have a major browser actively participating in this WG, I think there's a lot of value to what they have to say about this.

RachelComerford · 2018-05-09T16:46:45Z

There is a business consideration that weighs into the HTML vs JSON question because it is easier and cheaper for me to find HTML coding resources than JSON coding resources and my team is less likely to follow a standard that is (even more) expensive to maintain. To confirm this, I reached out to our most commonly used vendors - all replied that they would need time to staff up and train JSON developers but that they had plenty HTML developers on staff.

iherman · 2018-05-09T17:29:04Z

Thanks @RachelComerford, this is a very important, non-technical point...

BigBlueHat · 2018-05-09T19:54:24Z

@iherman

the item is, loosely speaking, some sort of a metadata, i.e., expressible via an HTML or element (e.g., creation modification date or links to an ONIX file)

I'd not limit it to just <meta> and <link>. The growth and widespread usage of data-in-HTML formats (RDFa, Microdata, JSON-LD) show that developers and web publishers do know how to put metadata in their publications and apps, and are already incentivized to do so because search engines. Why not follow suit rather than creating a different, currently unknown, out-of-band location to look for metadata?

BigBlueHat · 2018-05-09T20:19:21Z

Con: per the HTML standard, the element's role is to express "document-level metadata" (see html5; emphasis is mine). Using it for expressing metadata for other entities (ie, the WP) is semantically not clean. (In the case of a single HTML file based WP one could argue that the document and the HTML file is the same, which would make it all right.)

Depending on how this is modeled and "gone about" it maybe that the "binding" document is imperceptible from the publication itself.

Or, alternatively:

publication address might be http://example.com/moby-dick/
binding document (currently "entry point") is returned upon that request (i.e. index.html per most server defaults), but has it's own URL http://example.com/moby-dick/index.html and consequently could have it's own metadata (in <meta> or wherever).

Regardless, this is easily avoidable...so no an implicit "con."

Con: the metadata may be used for various purposes handling the WP instance itself, eg, indexing, bookshelves, etc. Parsing an HTML file to extract the information, though would use a standard toolset, requires a significant effort for the User Agent: parsing the HTML, building the DOM, the CSS DOM, the Accessibility DOM, etc, before giving access to the element. Compared to that, parsing a JSON file into Javascript structures is a breeze.

There's no requirement that a DOM, CSSOM, Accessibility OM, etc. be setup or available when extracting metadata from HTML files. It's possible to get it directly out of the markup without those things.

Additionally, when "browsed to" the browser will provide all those things, and could potentially make that data more easily extract-able by the developer (or within the UI of the browser).

Con: having both a manifest file and some data in some HTML resources complicates implementations that should follow a more complex path to get hold of the infoset item. (Note, however, that this argument has less weight than ease of authoring; there are more authors than implementers...)

Couldn't agree more...but that's not a "con" of an HTML-driven approach to these problems.

If, for instance, all the primary resources are referenced from an HTML-based "binding document" (perhaps through something like a latent-loading <iframe> or a <nav role="doc-toc"> like thing), then the request and processing needs are already defined and taken care of by the browser and the HTTP ecosystem specs (CORS, CSP, etc). However, if they're in the JSON (as noted in #104), there's an unknown relationship with the things stated there and the rest of the request/response processing constraints, browsing contexts, etc (again; hence #104). So...that's ultimately a "vote" for primary resources to be expressed from within the HTML.

Each of the current infoset items are expressible from within an HTML document (see my last comment for a handful of options), and what's needed next is to know how to enhance their expressions as available now such that they are more useful.

Moving such core concepts as the primary resources or redundantly expressing dependencies into a separate "manifest file" is duplication, will cause errors when out-of-sync, does create an over dependence on tooling, and ultimately puts the processing power out of the reach of the publisher/developer and into the hands of the "reading system" developer exclusively.

Consequently, I'd not see our currently defined <nav> processing algorithm as a fallback, but as the expression (or something like it) of the primary resources.

Ultimately, we'd go through the same process of finding homes for each of the infoset things in the HTML "binding document" (which is clearer than "entry point"), remove them from a/the JSON serialization until we find things that must be expressed in JSON.

tl;dr web publications exist already (built from HTML, JS, CSS, RDFa, etc), so how do we make them better, stronger, faster, more accessible, offline-able, etc.

deborahgu · 2018-05-09T22:09:50Z

I'd like to make another non-technical point: we should not be creating a complex creation systems for publishers. Descriptive metadata, including navigation items, should go in as few files, and as few formats, as is technically possible.

As Ivan said:

Pro: mainly in the case of single HTML file based WP this is a natural way of expressing the information.

If we tell publishers "in order to create a WP, you need to put this infoset data over here in HTML, and this infoset data over there in JSON," we're raising the barrier to entry for anyone who doesn't have a WP-aware authoring tool.

IMO, much better to choose an imperfect design which publishers will actually be able to use than the most perfectest beautifullest awesomest architecture which is a pain for creators.

(I have no horse in the race of actual location and format, and personally I'd be happiest if all the players in this conversation came to a place where they realize that no solution is perfect and all the people disagreeing have valid points. Unfortunately a classic compromise is the worst possible solution, because we really just need to pick one. There is literally no solution on offer without cons; we still have to choose one and move on to the rest of the work.)

iherman · 2018-05-10T09:34:16Z

@BigBlueHat

@iherman

the item is, loosely speaking, some sort of a metadata, i.e., expressible via an HTML or element (e.g., creation modification date or links to an ONIX file)

I'd not limit it to just <meta> and <link>. The growth and widespread usage of data-in-HTML formats (RDFa, Microdata, JSON-LD) show that developers and web publishers do know how to put metadata in their publications and apps, and are already incentivized to do so because search engines. Why not follow suit rather than creating a different, currently unknown, out-of-band location to look for metadata?

I know there can be more data than just <meta> or <link>. But I believe the characterization of a "different, currently unknown, out-of-band location to look for metadata" is a bit harsh. Putting metadata into a separate file, and link to it, is not a new approach, see (beyond the WAM) the work on Payment Method Manifest, and was also the routine approach to get to metadata before the creation of RDFa, with the metadata stored in different formats, let that be Turtle or (God forbid!) RDF/XML. (This was, e.g., the way to refer to CC metadata from an HTML page.)

iherman · 2018-05-10T10:11:11Z

@BigBlueHat,

Con: per the HTML standard, the element's role is to express "document-level metadata" (see html5; emphasis is mine). Using it for expressing metadata > for other entities (ie, the WP) is semantically not clean. (In the case of a single HTML file based WP one could argue that the document and the HTML file is > the same, which would make it all right.)

Depending on how this is modeled and "gone about" it maybe that the "binding" document is imperceptible from the publication itself.

Yes, I agree; this is the case of a single-document PW; this is one of the "Pro" arguments.

Or, alternatively:

publication address might be http://example.com/moby-dick/

binding document (currently "entry point") is returned upon that request (i.e. index.html per most server defaults), but has it's own URL http://example.com/moby-dick/index.html and consequently could have it's own metadata (in <meta> or wherever).

Sorry, but I do not agree. The quoted HTML specification does not refer to a URL, it refers to the document itself, whichever path was used to get there. I believe the HTML standard is pretty clear about it. If we use the HTML headers, we should simply accept that we are willfully overstepping the bounds that the HTML standard defines (but I am not sure the rest of the community would accept it, we may face major objections).

Regardless, this is easily avoidable...so no an implicit "con."

I think we have to agree that we disagree on that point.

Con: the metadata may be used for various purposes handling the WP instance itself, eg, indexing, bookshelves, etc. Parsing an HTML file to extract the information, though would use a standard toolset, requires a significant effort for the User Agent: parsing the HTML, building the DOM, the CSS DOM, the Accessibility DOM, etc, before giving access to the element. Compared to that, parsing a JSON file into Javascript structures is a breeze.

There's no requirement that a DOM, CSSOM, Accessibility OM, etc. be setup or available when extracting metadata from HTML files. It's possible to get it directly out of the markup without those things.

This is theoretically correct, but I do not think it is practically true. Any implementation will use one of the many, possibly "built-in" HTML parsers, and all those parsers build up the DOM. I do not think we can expect an implementation to have a different parser that would just look at the syntax or do some other tricks.

Additionally, when "browsed to" the browser will provide all those things, and could potentially make that data more easily extract-able by the developer (or within the UI of the browser).

I am not sure I understand what you mean. Yes, of course, if the UA begins to render, display, etc, the WP, then those data are already there, because they are in the DOM. The "Con" is for the cases when, say, the Reading System or the browser builds up, say, bookshelf, for which a number of Infoset items are necessary.

That being said, if we go along with the idea of finding the manifest file via a <link> element, then the same problem applies. So this may be one of the 'con'-s that we have to live with whatever we do, and we can consider it neutral in our discussions:-)

Con: having both a manifest file and some data in some HTML resources complicates implementations that should follow a more complex path to get hold of the > infoset item. (Note, however, that this argument has less weight than ease of authoring; there are more authors than implementers...)

Couldn't agree more...but that's not a "con" of an HTML-driven approach to these problems.

True... except that it remains to be proven that all infoset items can be expressed easily and in a user-friendly manner via the current HTML element set.

To take an example: we did say that the language tag in a content file (ie, an HTML file) is not the same as the language tag for the publication as a whole. In other words, the regular @lang attribute in the HTML file cannot be used as an encoding of the relevant infoset items: it has to be put somewhere else. We will have to define our own definitions for that in some way or other, which will not look very natural in HTML (and hence not very user friendly). We may have similar issues with, say, the title of the WP...

If, for instance, all the primary resources are referenced from an HTML-based "binding document" (perhaps through something like a latent-loading <iframe> or a <nav role="doc-toc"> like thing), then the request and processing needs are already defined and taken care of by the browser and the HTTP ecosystem specs (CORS, CSP, etc). However, if they're in the JSON (as noted in #104), there's an unknown relationship with the things stated there and the rest of the request/response processing constraints, browsing contexts, etc (again; hence #104). So...that's ultimately a "vote" for primary resources to be expressed from within the HTML.

To be honest, you lost me here; more exactly, I do not see the problem. If we say (as we seem to converge to in #104) that we simply take the browsing context as given, I just do not see the issue accessing the separate JSON file in this browsing context. That information is accessed from the entry point (in its own browsing context, as we seem to converge to in #104), then all the rest is clear: that is the context we are operating in. Let alone the fact that many elements in the infoset (title, authors, etc) are unaffected by the browsing context.

Each of the current infoset items are expressible from within an HTML document (see my last comment for a handful of options), and what's needed next is to know how to enhance their expressions as available now such that they are more useful.

See my comment above. I am absolutely not sure it is as simple, more exactly that the resulting definitions would be clearer and simpler than doing it in JSON.

Note that the experience in RDFa is not really good (alas!), meaning the relying on RDFa may not be that helpful. (Authoring RDFa can be a major challenge, and is very opaque for non RDF-savy persons (and is sometimes difficult even for people like me, I frequently have to run RDFa+HTML through my own distiller to see what the generated RDF is). Microdata is, maybe, even worse, because there are features that cannot even be expressed in microdata...)

iherman · 2018-05-10T10:27:10Z

Trying to move forward: would the usage of a <script> element alleviate the problems? (See also #122). Here is what this would mean:

we define the manifest in JSON (LD or not); that would define many? most? all? of our Infoset elements (not "all" would meant that some entries are put into an HTML content of the entry point somewhere)
this JSON content may be a separate file referred to from the entry point and linked to via a special <link> element as defined in the draft
this JSON content may also be part of the entry point file, encapsulated in a <script> element.

What this means is that there is not necessarily a separate file to be authored; all is in the same file; would that alleviate your issues, @deborahgu and @RachelComerford ? It would not necessarily help with the issues of @llemeurfr because today's authoring tools rarely help for the authoring of embedded data. On the other hand, the semantics of the <script> element's content is under our control, ie, we would not violate the HTML spec.

The experience shows that authoring JSON for metadata-like information is simpler than doing it in, say, RDFa, so we would gain that.

Note also DanBri's comment: Schema.org also uses this JSON(-LD) wrapper to extract information.

(An even more radical proposal would be to use the embedded <script> element only. I am not sure I would go that far.)

danielweck · 2018-05-10T11:09:52Z

Ivan, +1 to the JSON-in-script / JSON-as-file approach (although I suspect reading system developers would prefer a directly-accessible standalone JSON, as this saves parsing an HTML document and performing an additional fetch request).

dauwhe · 2018-05-10T11:47:32Z

Sorry, but I do not agree. The quoted HTML specification does not refer to a URL, it refers to the document itself, whichever path was used to get there. I believe the HTML standard is pretty clear about it. If we use the HTML headers, we should simply accept that we are willfully overstepping the bounds that the HTML standard defines (but I am not sure the rest of the community would accept it, we may face major objections).

Consider the following document returned from www.example.com/book/

<!DOCTYPE html>
<html lang="en">
<head>
  <title>Moby-Dick</title>
  <meta name="author" content="Herman Melville">
</head>
<body>
  <nav>
    <ol>
      <li><a href="c1.html">One</a></li>
      <li><a href="c2.html">Two</a></li>
    </ol>
  </nav>
  <iframe id="c1" name="c1" src="c1.html"></iframe>
  <iframe id="c2" name="c2" src="c2.html"></iframe>
</body>
</html>

If c1.html does not have a meta name="author" element, who is the author of c1.html? The content of c1.html is literally a node in the document object of the original URL. Would the answer be different if c1.html was included via object, html imports, or a custom element?

mattgarrish · 2018-05-10T11:53:27Z

There's no requirement that a DOM, CSSOM, Accessibility OM, etc. be setup or available when extracting metadata from HTML files. It's possible to get it directly out of the markup without those things.

This is theoretically correct, but I do not think it is practically true.

How are these steps avoided? Is the idea that user agents will go through the process of obtaining and processing the manifest before the user makes any decision about whether they even want to initiate the reading experience, and stop rendering the document until a decision is made?

In other words, does an external file really save anything in processing time, except perhaps in the (rare?) situation where a user says to always initiate publications and the link is available in an HTTP header?

llemeurfr · 2018-05-10T12:16:02Z

@iherman about

Note that the experience in RDFa is not really good (alas!), meaning the relying on RDFa may not be that helpful. (Authoring RDFa can be a major challenge, and is very opaque for non RDF-savy persons (and is sometimes difficult even for people like me, I frequently have to run RDFa+HTML through my own distiller to see what the generated RDF is). Microdata is, maybe, even worse, because there are features that cannot even be expressed in microdata...)

I totally agree with that statements. At allocine.com, we embedded RDFa, then microdata (preferred), in our film / star etc. pages. But it was the work of the technical team, in page templates: certainly not the work of the editorial team. And I'm pretty sure that this is how 99.9% of websites containing RDFa or microdata are constructed.

mattgarrish · 2018-05-10T12:23:47Z

If c1.html does not have a meta name="author" element, who is the author of c1.html?

It might be simpler to use something like dcterms/schema.org isPartOf/hasPart to associate the fragments than duplicate metadata, but I don't follow the argument that a multi-part document cannot be wholly identified by the first of its resources.

iherman · 2018-05-10T12:28:21Z

@dauwhe (referring to #193 (comment)) great questions...

I am not sure, and I do not think the HTML spec clearly says anything about this case. However, looking at the HTML spec, a document within an iframe has its own Document element (and own context), so my gut feeling is that, in your example, the author of the iframe-d content would be unknown. It is probably the same with object. The import case is even less clear, the current draft does not really say anything about Document elements or contexts (is that work still alive, b.t.w.?).

RachelComerford · 2018-05-10T14:11:39Z

@iherman... to be honest, I don't understand the solution?

_Trying to move forward: would the usage of a <script> element alleviate the problems? (See also #122). Here is what this would mean:

we define the manifest in JSON (LD or not); that would define many? most? all? of our Infoset elements (not "all" would meant that some entries are put into an HTML content of the entry point somewhere)
this JSON content may be a separate file referred to from the entry point and linked to via a special element as defined in the draft
this JSON content may also be part of the entry point file, encapsulated in a <script> element.

What this means is that there is not necessarily a separate file to be authored; all is in the same file; would that alleviate your issues, @deborahgu and @RachelComerford ?_

iherman · 2018-05-10T14:37:25Z

Sorry to be terse, @RachelComerford. My bad.

The choices discussed so far were:

the infoset items are in a separate JSON file, which is linked from the main HTML entry point of the WP using a <link> element; or
use the HTML entry point to the WP to also "encode", in some way or other, all the infoset entries using, mostly, <meta> and <link> elements, as well as, possibly, some HTML elements (I presume salt-'n-peppered with RDFa or microdata terms')

Both you and @deborahgu commented that to edit two separate files would be a load on your developers; as @deborahgu said "Descriptive metadata [...] should go in as few files, and as few formats".

I do have some significant problems with the 2nd approach. In essence, I believe, it is not really possible to avoid some extra "formats" (where by format I also mean microdata and/or RDFa, or complex set of attributes on HTML elements, etc.) and, among those, I also believe JSON is still the simplest one. But at least the problem of handling several files can be alleviated. Indeed, the HTML standard allows to use the following in the header of an HTML content:

<script type="application/json">
    "something"      : "whatever",
    "somethingselse" : "whateverelse"
</script>

I.e., we can use this element to encode the infoset items in JSON, while staying within the (entry point) HTML file.

This is how webmasters mark up their files for schema.org, b.t.w., if they choose JSON-LD for their data. (See a random example: go to the bottom of the page to choose the right tab for some examples.) In other words, we would be in good company:-) and, in fact, the metadata part of our infoset may automatically be used, when on the Web, by schema.org (at least for schema.org terms) which is a bonus...

I hope this is clearer...

HadrienGardeur · 2018-05-10T14:45:36Z

Handling multiple files is only a "problem" for single resource publications.

With publications spread across multiple resources, going through an HTML resource to extract JSON-LD in a script element makes things more complicated than it should be.
I would recommend limiting this embedded case strictly to publications that have no explicit reading order (=single resource publications).

That said, I fully agree that for expressing metadata on an HTML resource, JSON-LD + schema.org is the preferred method on the Web today.
It's much easier to author than RDFa and doesn't have the limitations associated with the <meta> element.
If we go down that road, we should be careful how we author such metadata if we want to maximize compatibility with SEO bots. Not all of them are truly JSON-LD/RDF aware and they tend to have limited support for context documents.

BigBlueHat · 2018-05-10T15:18:15Z

I believe the characterization of a "different, currently unknown, out-of-band location to look for metadata" is a bit harsh.

There was no intention to be "harsh."

I meant the technical term "out-of-band":

Out-of-band is activity outside a defined telecommunications frequency band, or, metaphorically, outside some other kind of activity.

And "out-of-band data":

out-of-band data is the data transferred through a stream that is independent from the main in-band data stream.

"different" and "currently unknown" were meant to be positioned in contrast with where Search Engines (the primary incentivize driving web publication metadata) get their information.

I apologize that the sentence was so easily misread as to contain a "harsh" tone. I'll work to link to any technical terms that may have taken on a different cultural meaning.

BigBlueHat · 2018-05-10T16:00:07Z

@iherman it was never my intention to have this HTML approach be limited in anyway to just what is expressible directly on elements and attributes (i.e. RDFa and Microdata).

Using JSON(-LD) (or any other format) in a <script> tag is called "data blocks" in the HTML5 spec, and has been around since "forever"--so I'd never expected that to be removed from consideration.

Apologies for tot being clearer about these specifics.

Building up from HTML means we have everything HTML provides at our disposal...including JSON. 😄

BigBlueHat · 2018-05-10T16:04:52Z

Additionally, the infoset includes both metadata pieces (title, reading progression, etc) and request/response/hypermedia related things (primary resources, etc).

The point of using HTML as a foundation is that it already has a known (and carefully crafted) set of specifications for handling the inclusion, processing, and contextualizing of Web resources. We don't have that (to my knowledge) with any other format (because SVG doesn't have iframes 😉).

The metadata concerns and the "binding"/rendering/presenting concerns are different domains of use, experience, security, etc. We should model them accordingly.

BigBlueHat · 2018-05-10T16:40:57Z

How are these [infoset extraction] steps avoided?

The underpinning premise is that "web publications" (lowercase on purpose) exist today (ex: http://guide.couchdb.org/ http://hpbn.co/ http://resilientwebdesign.com/ )--you can load them up and read them now.

However, their metadata (mostly RDFa because ogp.me) and "binding" (which is mostly a next/prev experience) is provided in idiosyncratic ways and either expressed "inline" (next/prev links in each resource) or built via JS (as is the cause with the CouchDB Guide).

Consequently, the "infoset" items are currently expressed with some overlap in consistency in only a few places (mostly ToC and some ogp.me metadata), so there's not much the browser can do to provide additional or enhanced reading experiences for the publication as a whole (i.e. no publication-wide search unless via a service, linear progression experience is identical to clicking any other link in the book, etc).

Is the idea that user agents will go through the process of obtaining and processing the manifest before the user makes any decision about whether they even want to initiate the reading experience, and stop rendering the document until a decision is made?

The point is that the experience of the "web publication" (lowercase on purpose again) isn't blocked by anything. Obtaining and processing any additional, consistently expressed data or affordances could (in a future browser or via a polyfill in the meantime) enhance the reading experience by adding things like publication-wide search, linear progression, etc.

In other words, does an external file really save anything in processing time, except perhaps in the (rare?) situation where a user says to always initiate publications and the link is available in an HTTP header?

By conceiving of "web publications" (case sensitive one more time 😁) as extant and enhance-able, we can lay a foundation to build up from.

In which case, a user choice or action to "initiate [a] publication" would look like (auto)triggering new experiences (search, linear progression, etc) either directly within the browsers UI or perhaps via some dedicated reading UI "space" (or like one of the many things we've not currently imagined 😃). The provision of those enhancements though, should not block the Web experience already available to readers of "web publications." The reader's lives should only get better from our work. 😸

llemeurfr · 2018-05-12T08:50:42Z

In reply to @deborahgu,regarding the the possibility of a direct copy from a library catalog to JSON or HTML ("any of these solutions").

As you can see, the data structure of JSON is very similar to the structure of your library catalog : this is raw structured information, and translation is immediate. Moving such library structure to HTML is less immediate (use of non-recursive meta elements or use of content elements for embedding a metadata structure using on of the discussed solution, RDFa, microdata ...).

About extensibility: JSON is extensible, as @iherman states, but even more interesting, JSON validation tools, i.e. JSON Schemas (a specification + well know software tools), also allow for that type of free extension. A schema can check all required metadata and their value, check that optional metadata have a proper value, and let free external metadata be added without choking. On the HTML side, people can freely extend a metadata vocabulary, but there is no validation mechanism for required and optional metadata: somebody has to code something specific to enforce such rules (a full epubcheck).

It's not to say that ANY metadata should be embedded in JSON; only JSON serialized metadata can. XML serialized metadata must be expressed somewhere else and can be referenced from the JSON manifest (the nervous system of the publication).

I think that the decision to externalize every metadata that is not in the infoset, a decision that was taken in the early years of EPUB, is to be revised. But this should be discussed in another issue.

I also believe that the initial question of this very long thread has been answered: splitting the infoset in JSON + HTML (meta tags or content) is boring for both authors (information to put in two places with different serializations) and developers of user agents. On the other side, embedding the JSON manifest containing the full infoset in an HTML header is possible and is even interesting for publications with a single resource.

HadrienGardeur · 2018-05-12T18:53:27Z

Sorry, that's not what I was referring to. Whatever you map @id to, there MUST be one at the top level or the subject of all those statements is "unknown" (i.e. a blank node). @iherman's fixes that by adding the publication address as the subject/identifier of his example document.

@BigBlueHat that's exactly what we've discussed on the mailing list before and @iherman even hinted at a solution to this issue based on JSON-LD 1.1.

The reading order is defined (in this case) by a series of <iframe>'s and referenced from a TOC via fragment identifiers. Obviously that (like everything else everywhere) has it's own set of limitations, but it does get closer to presenting the sort of HTML-based modeling of a "binding document" on which we could build interesting affordances (via just a handful of new things...hopefully).

Yikes, I'll have nightmares with that paragraph.

The reading order is the most important item in our infoset IMO and a list of <iframe> elements referenced by a a TOC using fragment sounds like the kind of examples that would scare any reading system developer away.
I also have a very very hard time believing that anyone would rather author something that complex instead of providing a straightforward list in JSON.

(Any thoughts on this @JayPanoz?)

It does, but that "list of resources" (which I've been calling dependencies here because the relationship seems clearer) are referenced "in place" from the content document(s) which use them.

What that practically does is put the responsibility of "gathering" that list upon the browser/reading-systems (via an optional--depending on the use case--and possibly asynchronous) "gathering" process rather than on the author/publisher who would have to keep that list in sync by gathering all those things and listing them in the JSON prior to publication.

OK, let me get this straight, with your suggestion the list of resources:

would potentially be scattered across resources
would require the UA to fetch every resource in the reading order (which is expressed using URIs to fragments + <iframe> elements)
would not be explicitly listed in each resource (links), but would rely instead on the ability for the UA to intercept network requests

Is that accurate?

JayPanoz · 2018-05-13T09:47:30Z

Well, all I can say at this point is that I keep seeing the “ease of authoring” concept/argument used in (too) many ways that are partly or completely disconnected from what I have witnessed in the EPUB trenches for 6 years, in a lot of discussions.

If you really want to take graphic designers and independent publishers into account, then it means “if the authoring tools they’re using don’t implement it, they won’t use it.”

A few examples:

they will use Calibre and/or Sigil to add metadata – that’s XML;
it’s already quite a feat to make them use the correct output markup in InDesign – which is why so many files have everything as a <p> or a <span> – that’s HTML;
they will use Sigil to generate the EPUB3 nav even if InDesign can manage that – that’s HTML;
if a feature is not implemented, they simply don’t use it, albeit it would be 1 change in the HTML documents and 2 extra lines of CSS;
for interactive books, they will simply import an existing lib/plugin – that’s JS;
the most popular question during training sessions, by very far… “Is there an app for this?” – that’s their artisanal workflows.

I’m stopping there but there is so much more. The sad truth is HTML is already asking way too much from a significant number of the “spec users” anyway, and they will rely solely on authoring tools – which tend to be underrepresented in a lot of discussions related to authoring, unfortunately.

Note that even if they master all of HTML, CSS and JS, a significant portion of authors are very likely to use tools that ease their lives, cf. PWABuilder by Microsoft.

RachelComerford · 2018-05-13T14:51:12Z

Forgive me but I am going to ask that we go back to basics here for a minute... what is the problem that these 2 proposed solutions:

Is it acceptable to use HTML for the serialization of some infoset items,
or should it all be in separate (JSON) file?

are meant to solve?
Can someone give a rundown of the alternatives that we're considering?

iherman · 2018-05-14T04:52:24Z

@RachelComerford, you are right, it is a good idea to pause and maybe formulate the high level choices...

I think a way to formulate the two positions, by formulating them to the extreme might be as follows (and this is obviously my view of things at 6am in the morning...)

All infoset items should be defined in JSON (possibly JSON-LD). It is a bit like what we have today in EPUB; all various information should be in one place with a unified syntax, including not only metadata, but the list of resources, ToC, whatever. That unified syntax should be in JSON because that is the syntax used by Web Developers, because it is very easy to "parse" it and work with the result, as opposed to an XML syntax which is way more complex to handle. The infoset file is accessed via a special <link> in what is often called an “entry page”, which is in HTML.
All infoset items should be expressed via a combination of HTML elements (not only <meta> or <link>, but also, say <nav>) and possibly some information conveyed via HTTP. All (or most of?) these informations are collected into that entry page.

There are some intermediate proposals floating around which may make this less clear-cut. No. 1 above, for example, may lead to redundancy of information (which is also present in EPUB) mainly in terms of the list of resources that make a specific WP, or the ToC: there are some entries in the current draft that propagates the re-use of an HTML ToC to extract the list of resources to mitigate that.

The extra proposal that I have put forward is somewhere between the two (although I admit closer to No.1), namely to use the JSON syntax as described in No 1, but incorporate it into the header of the entry page as using the <script> tag. That would open the way of some sort of a possible compromise whereby some infoset items could be expressed via good-old HTML elements within the entry page, but other entries could go to JSON if it is more natural or semantically clearer. But even that compromise solution leads to the same dichotomy: should the <script> element be the only way to use JSON or would it be an optional usage of No.1?

What is more difficult is to get a clear set of pros and cons. I will try to get some below, knowing that I will not make full justice to any of the two sides...

A JSON formulation is (much) simpler. We do have the experience (good and bad...) with EPUB, and the infoset is defined, in the abstract sense to a, say, 75-80% level. We can have discussions on syntactic details, whether to use JSON-LD or not and, if yes, how, but none of these are really major issues. Properly defining the HTML approach seems to be much more costly in time and energy: we (I?) do not have a clear idea on what exactly should be done and how.
The JSON approach may lead to a redundancy that was also present in EPUB, eg, for the list of resorces. Whether that redundancy is really really bad, or we can live with it is not clear and opinions clearly diverge on this. (There is a big difference, in this sense, whether the WP consists of a lot of HTML content resources or only one.)
It should be possible to define the entry page with all (most of?) the items in HTML in such a way that a browser that does not know anything of WP-s could make some sense of it. If that is where the "spine"/ToC content is, if some of the traditional HTML-syntaxed metadata are (on dates, licenses, titles, etc.) than this may be doable and that makes sense. (That being said, it is much less clear whether that statement is true even in cases when there are lots of resources in a WP.) A JSON syntax has no meaning for a browser that does not know WP-s.
There are some tricky issues around, e.g., security (what scripts/files can I, should I, may I load, for example) that have an immediate answer if only HTML is considered (because these are handled by existing standards) whereas it is not entirely clear what happens to that if all extra information are in a separate JSON structure.
The HTML only approach stretch the limits of the current HTML specification, which was never defined with the concept of several documents being in the same logical place. A proper way of doing some of the things we need may be to try to slightly modify the HTML specification itself, which is an extremely tall order (which I consider as unattainable).

I am sure I will get lots of flames now... Fire off!

HadrienGardeur · 2018-05-14T08:21:40Z

@iherman thanks, that's a good summary overall.

My own personal take on this matter is that:

it's easier to express almost all infoset items using JSON
it's also easier to parse JSON for UAs
the only infoset item where we might have redundancy is the TOC (but it's entirely optional, unlike the default reading order), IMO that's the only true candidate for HTML serialization
access is a different question from serialization, but it's easier to access a manifest if it has its own URI and doesn't need to be extracted from a <script> element
single resource publications are a special case (but a common one), for which we may or may need to allow a number of optimizations (for instance we could say that there's no need for an explicit default reading order since the entry page is enough, and embedding the manifest in <script> could be allowed strictly in this context)

The current draft allows the reading order to fallback to HTML and underdefines the TOC. This should be changed IMO:

we need to limit the default reading order strictly to the manifest (Limiting reading order to the manifest #148), which will simplify how the manifest is processed significantly
a better definition for the TOC, with a decision regarding where it can be found (always on the entry page or do we provide a way to identify that a resource contains the TOC?) and how it's expressed
we also need to work on other navigation items (such as page lists for example) and decide if they're better expressed in HTML or in JSON

BigBlueHat · 2018-05-14T15:08:08Z

the kind of examples that would scare any reading system developer away.

It's been my operating understanding that the primary audience for this specification is for Web browsers. I've also tried to based my proposals on "web publications" (lowercase on purpose) as they exist on the Web today. My goal has been to find ways to enhance existing web publications by adding the least amount of tech required to add key affordances ( topic:affordances ) currently unavailable to the human reading the publication.

Perhaps its just that our objectives differ?

BigBlueHat · 2018-05-14T15:17:56Z

@iherman your summaries in #193 (comment) are spot on. The <script>-based data block for metadata is completely fine for encoding HTML-based metadata about the publication.

The core sticking point seems to be around the expression and processing of the "binding"--i.e. the thing that defines the multi-resource/document experience.

I'd like to shorten the distance between publication address and readable/experience-able publication. Since the Web (browser) does HTML by default, that (to me) means building up from that foundation.

RachelComerford · 2018-05-14T15:27:50Z

This is what I understand the problem and a potential compromise/solution to be. Interested in hearing feedback:

Problem: There is information that must be available via the infoset and this needs to be coded and housed somewhere

Proposed Solution: Some information (what exactly is TBD) will be housed within the HTML, some within JSON. That JSON may or may not be a separate file.

Alternatives Considered:

All infoset items should be defined in JSON (possibly JSON-LD). It is a bit like what we have today in EPUB; all various information should be in one place with a unified syntax, including not only metadata, but the list of resources, ToC, whatever. That unified syntax should be in JSON because that is the syntax used by Web Developers, because it is very easy to "parse" it and work with the result, as opposed to an XML syntax which is way more complex to handle. The infoset file is accessed via a special in what is often called an “entry page”, which is in HTML.
All infoset items should be expressed via a combination of HTML elements (not only <meta> or <link>, but also, say <nav>) and possibly some information conveyed via HTTP. All (or most of?) these informations are collected into that entry page.

llemeurfr · 2018-05-14T16:03:05Z

@RachelComerford maybe this issue could be closed if we can extract from it the TOC issue.

There were looong discussions last year about the representation of the reading order (ex-spine), the representation of a human-friendly Table of Contents and the need for machine readable navigation. #9 is an example of such loong threads. @HadrienGardeur 's comment also makes reference to the TOC issue. And this issue is far from being solved.

So I suggest to keep the TOC out of scope of this issue and rephrase:

Proposed Solution: The TOC being kept out of scope for now, the infoset is housed within a JSON manifest. That JSON may be embedded in the entry page if the publication has a single resource.

BigBlueHat · 2018-05-14T17:40:00Z

Good summary, @RachelComerford.

@llemeurfr this thread covers far more than the role of the "TOC issue" and has (afaict) helped clarify that "infoset" currently encompasses inert, descriptive metadata values (i.e. descriptive properties) as well as active, hypermedia-style "binding" expressions (i.e. structural properties).

Where one wants to see those things collected seems to pivot on the expected processing models of either a Web browser (+/- future publication affordances) or a more EPUB-style Reading System.

@RachelComerford your proposed solution summary is a good one:

Proposed Solution: Some information (what exactly is TBD) will be housed within the HTML, some within JSON. That JSON may or may not be a separate file.

Roughing up an example now.

BigBlueHat · 2018-05-14T18:19:13Z

I've just sent a PR with some examples (no spec changes), one of which is a minimal, <nav> fallback (as currently drafted) HTML entry page: 15682e5

iherman · 2018-05-15T03:23:27Z

@llemeurfr,

I am not sure that "restricting" the JSON expression to a script element is only relevant for a single-HTML case. I actually wonder, in view of the fact that this is the only way the JSON-LD content is digested by schema.org, whether this form should not be the preferred one.
I think due diligence would require to look at other infoset items to see whether they can have a similar handling than the ToC. It is true that the ToC is the most obvious example where we could rely on the HTML content, but there may be others (though I do not have any from the top of my head) and I would not want to pass a resolution now that would disclose that.

iherman · 2018-05-15T03:25:47Z

@BigBlueHat good idea to have moved the example to the wpub repo (I have just merged it).

Looking at https://github.com/w3c/wpub/blob/master/experiments/html-schema-org-json-ld/index.html, and comparing it with the other example, I do not see how you intend to represent the list of (other) resources, that appear in the other manifest.

HadrienGardeur · 2018-05-15T08:55:41Z

It's been my operating understanding that the primary audience for this specification is for Web browsers. I've also tried to based my proposals on "web publications" (lowercase on purpose) as they exist on the Web today. My goal has been to find ways to enhance existing web publications by adding the least amount of tech required to add key affordances ( topic:affordances ) currently unavailable to the human reading the publication.

Perhaps its just that our objectives differ?

I've also been building on top of existing Web Publications, and the example for "Why's poignant guide to Ruby" is a perfect illustration of that since I only added an entry page and a manifest instead of re-hosting and modifying the content:

Entry page available at: https://hadriengardeur.github.io/webpub-manifest/examples/why/
WAM style manifest: https://hadriengardeur.github.io/webpub-manifest/examples/why/manifest.webmanifest
RWPM style manifest: https://hadriengardeur.github.io/webpub-manifest/examples/why/manifest.json

For the primary audience, my take on this issue is that we should design something that works well for every type of UA, not just browsers.

That said, I don't think that your argument really holds up @BigBlueHat. iframe elements being referenced through fragment identifiers sounds like the worst part of EPUB (the whole id/idref mess in metadata and manifest/spine) and isn't easier to handle for a browser than a simple list in JSON.

As @iherman has pointed out, your example lacks a list of resources and some of your previous comments about this infoset item were IMO hard to understand.

I already posted a summary in a previous comment, could you confirm that I understood things correctly?

OK, let me get this straight, with your suggestion the list of resources:

would potentially be scattered across resources

would require the UA to fetch every resource in the reading order (which is expressed using URIs to fragments + <iframe> elements)

would not be explicitly listed in each resource (links), but would rely instead on the ability for the UA to intercept network requests

HadrienGardeur · 2018-05-15T10:04:41Z

I am not sure that "restricting" the JSON expression to a script element is only relevant for a single-HTML case. I actually wonder, in view of the fact that this is the only way the JSON-LD content is digested by schema.org, whether this form should not be the preferred one.

@iherman nothing is digested by schema.org, it's only the place where things are defined.

But JSON-LD with schema.org terms is indeed digested by a number of search engines. It's what Google recommends and Bing recently added support for it as well.

These search engines can only process JSON-LD contained in a <script> element for now, but it doesn't have to be static. If a script dynamically injects JSON-LD in a page, this will be processed as well.

For our use case, this might be the optimal outcome:

if you include JSON-LD statically on the entry page, only the entry page will contain the markup
if you rely on JS to check the presence of <link rel="publication"> and use this to dynamically inject JSON-LD in a <script> element, all resources of the publication could potentially contain this markup, without any risk regarding duplication/inconsistencies

The second option is often used for SEO optimizations, as it provides an easy way to serve static content with dynamic metadata.

As a side-note, once again I really think that we're having two separate discussions on this issue and it's making things more confusing than they should be:

which infoset items MAY/SHOULD/MUST be expressed in JSON vs HTML is one issue
whether we should embed JSON on the entry page or link to the manifest as a separate resource is a different issue

llemeurfr · 2018-05-15T11:24:52Z

@BigBlueHat about #193 (comment), we must now stop with generalities and prototypes and close this issue (65 comments) with a clear answer.
The spec states clearly in https://w3c.github.io/wpub/#infoset that the infoset covers both descriptive and structural properties, nothing new there.
The only part of the infoset (as currently defined) on dispute is the ToC. In your latest experiment, all other properties are in the JSON structure. So you are proving my point, thank you.

As @HadrienGardeur said somewhere, beyond the ToC, they are other lists that are required, but currently not in the infoset: page list, list of illustrations ... we have to open a new issue about these lists, add them in the infoset if agreed, and then start (again) a discussion about the location of these lists (ToC included) as JSON, HTML or both.

But we can IMO end this discussion about the serialization of all other descriptive and structural properties by stating that their natural position is in the JSON manifest.

BigBlueHat · 2018-05-15T13:17:58Z

@BigBlueHat good idea to have moved the example to the wpub repo (I have just merged it).

Oh! That'll work. I'd been thinking we'd use the PR review system to discuss them, but I guess in the repo we can still reference line numbers, so that'll work.

I do think it'll help to see actual experiments as code vs. prose.

Looking at https://github.com/w3c/wpub/blob/master/experiments/html-schema-org-json-ld/index.html, and comparing it with the other example, I do not see how you intend to represent the list of (other) resources, that appear in the other manifest.

Dependencies are gathered from the primary resources that depend on them. Just as when you GET an HTML file, the browser will subsequently GET all the JS and CSS (etc) that it references.

This does not (by design) include a more package-focused, publication-wide list of all the things that ever thing in the publication depends on. The Web is built from just-in-time referencing and retrieval. This follows that design pattern.

BigBlueHat · 2018-05-15T14:22:41Z

which infoset items MAY/SHOULD/MUST be expressed in JSON vs HTML is one issue

@HadrienGardeur good idea. I've extracted the above into #197.

For...

whether we should embed JSON on the entry page or link to the manifest as a separate resource is a different issue

Do you feel #122 handles that sufficiently? Or do we need something more narrow?

TzviyaSiegman · 2018-05-15T14:22:56Z

much discussion has moved to #197

BigBlueHat · 2018-05-15T14:33:58Z

@BigBlueHat about #193 (comment), we must now stop with generalities and prototypes and close this issue (65 comments) with a clear answer.

This Working Group's been using GitHub for discussion as well as issues, so while I agree that this one has gotten rather long (and am working to break out specific, actionable issues), I don't feel it's closable while there's still things to extract and/or discuss further.

The spec states clearly in https://w3c.github.io/wpub/#infoset that the infoset covers both descriptive and structural properties, nothing new there.

The "infoset" term does currently encompass both types of properties and #197 has been created to explore those two different groups in light of the discussions here.

The only part of the infoset (as currently defined) on dispute is the ToC. In your latest experiment, all other properties are in the JSON structure. So you are proving my point, thank you.

The dispute seems more about the use of the ToC to potentially provide the reading order and primary resource list. Additionally, there are open questions around how dependencies of primary resources are expressed:

either explicitly--as in a resource list of all the things
or implicitly--as dependencies gathered/requested as needed by the resource that depends on them.

There is much more to discuss and explore, but it doesn't need to stay on this issue, and I concur that narrower topics should be made wherever possible.

More specific issues to follow.

HadrienGardeur · 2018-05-15T16:22:56Z

Do you feel #122 handles that sufficiently? Or do we need something more narrow?

I think that #122 is good enough for that.

Dependencies are gathered from the primary resources that depend on them. Just as when you GET an HTML file, the browser will subsequently GET all the JS and CSS (etc) that it references.

This does not (by design) include a more package-focused, publication-wide list of all the things that ever thing in the publication depends on. The Web is built from just-in-time referencing and retrieval. This follows that design pattern.

Since this is not tied to #197, I'll answer here.

I think that this proposal is incredibly bad on many different levels:

conceptually, this means that the bounds of the resources are limited to the reading order (there's no way to indicate that a resource is part of the publication but not in the reading order anymore)
this could trigger the download of very large resources (HD videos) or resources that are useless for rendering (analytics scripts) that would otherwise have been excluded by the author when caching and/or packaging a publication
to cache or package a publication, you need to render every single resource from the reading order, which is going to be very slow and CPU+memory intensive (we've experimented with background rendering in Readium-2 and limit things to only 3 resources)
not all UAs may be able to intercept network requests to cache or package them, this would exclude native mobile apps for example from supporting WPs properly
UAs won't be able to do intelligent preloading on their own (for instance by loading fonts in cache in advance)

If this is really something that you want to push forward as a real proposal, you should open a separate issue, because this is not related to the serialization at all.
This is the equivalent of dropping the list of resources from our infoset.

BigBlueHat · 2018-05-15T18:55:53Z

@HadrienGardeur good summary over all, and I'm happy to open a separate issue to focus discussion. I'd say that it is serialization related in as much as HTML already has defined semantics and specifications for everything in that list (afaict).

This is the equivalent of dropping the list of resources from our infoset.

And yes. That would be the side effect of this approach. It also means less potential for out-of-sync content and manifests (i.e. one or the other having a resource that's no longer needed) and it reduces the number of changes required when adding something to the publication (i.e. the "oh, woops for got to list that in the manifest..." scenario).

I'll save further thoughts for that separate issue.

Thanks for the feedback, @HadrienGardeur.

css-meeting-bot · 2018-05-21T16:20:48Z

The Working Group just discussed https://github.com/w3c/wpub/issues/193.

The full IRC log of that discussion

<dauwhe> Topic: https://github.com//issues/193
<dauwhe> github: https://github.com//issues/193
<dauwhe> garth: I'm in the "put everything in one place" camp
<garth> “The infoset mostly resides within a JSON manifest (WP manifest). That JSON may optionally be embedded in the entry page rather than a standalone file referenced by <link> from the entry page. It may be supported to allow some infoset items to reside in HTML of the entry page, if information duplication issues can be sufficiently avoided.”
<dauwhe> ... the proposal that Ivan and I came up with in Berlin I'm pasting in
<dauwhe> ... it may not be consensus, but I hope it's in the "can live with it"
<dauwhe> garth: (reads proposed spec language above)
<Rachel> q+
<dauwhe> ... the only html info we've talked about is using nav to define primary reading order in HTML, so this leaves that as a possibility
<garth> ack dkaplan3
<garth> ack dkaplan
<dauwhe> Rachel: Hello!
<dauwhe> ... the Q I have is about language
<dauwhe> ... I don't understand "mostly"
<dauwhe> ... do you mean "primarily"?
<dauwhe> garth: that "mostly" was letting my religion show
<dauwhe> ... it means primarily
<dauwhe> ... there is a wp manifest, and it is a json thing, and most stuff should live there unless we find exceptions (like the nav thing)
<garth> q?
<ivan> +1 for primarily instead of mostly
<dauwhe> garth: does that anser the question?
<garth> ack Rachel
<dauwhe> Rachel: can we change mostly to primarily, and qualify that language with what you just said?
<dauwhe> garth: we can switch out the words in the resolution
<dauwhe> ... and we can assign to matt to make it clearer
<garth> q?
<garth> q?
<dauwhe> garth: this one I view as less controversial; we're not deviating from the existing draft much
<ivan> resolved: The infoset primarily resides within a JSON manifest (WP manifest). That JSON may optionally be embedded in the entry page rather than a standalone file referenced by <link> from the entry page. It may be supported to allow some infoset items to reside in HTML of the entry page, if information duplication issues can be sufficiently avoided.
<dauwhe> ... if everyone's in the 'can live with it" or "likes it" camp, I'm gonna assume consensus

GarthConboy · 2018-05-21T17:07:30Z

Resolved on call: "Proposed resolution: The infoset mostly resides within a JSON manifest (WP manifest). That JSON may optionally be embedded in the entry page rather than a standalone file referenced by from the entry page. It may be supported to allow some infoset items to reside in HTML of the entry page, if information duplication issues can be sufficiently avoided."

iherman added topic:manifest status:needs group decision priority:high topic:metadata general structure labels May 9, 2018

iherman mentioned this issue May 9, 2018

Reference resources that are not part of the publication #186

Closed

BigBlueHat mentioned this issue May 14, 2018

Work-In-Progress: expression experiments #196

Merged

BigBlueHat mentioned this issue May 15, 2018

Which infoset items MAY/SHOULD/MUST be expressed in JSON vs HTML? #197

Closed

BigBlueHat mentioned this issue May 15, 2018

Is an exhaustive "resource list" required to create a Web Publication? #198

Closed

dauwhe mentioned this issue May 15, 2018

Do we consider a (JSON) manifest to be part of a <script> element #122

Closed

GarthConboy closed this as completed May 21, 2018

Is it acceptable to use HTML for the serialization of some infoset items, or should it all be in separate (JSON) file? #193

Is it acceptable to use HTML for the serialization of some infoset items, or should it all be in separate (JSON) file? #193

Comments

iherman commented May 9, 2018

iherman commented May 9, 2018

llemeurfr commented May 9, 2018

HadrienGardeur commented May 9, 2018 • edited Loading

RachelComerford commented May 9, 2018

iherman commented May 9, 2018

BigBlueHat commented May 9, 2018 • edited by dauwhe Loading

BigBlueHat commented May 9, 2018

deborahgu commented May 9, 2018

iherman commented May 10, 2018

iherman commented May 10, 2018

iherman commented May 10, 2018

danielweck commented May 10, 2018

dauwhe commented May 10, 2018

mattgarrish commented May 10, 2018

llemeurfr commented May 10, 2018

mattgarrish commented May 10, 2018

iherman commented May 10, 2018

RachelComerford commented May 10, 2018

iherman commented May 10, 2018

HadrienGardeur commented May 10, 2018

BigBlueHat commented May 10, 2018

BigBlueHat commented May 10, 2018

BigBlueHat commented May 10, 2018

BigBlueHat commented May 10, 2018

llemeurfr commented May 12, 2018

HadrienGardeur commented May 12, 2018

JayPanoz commented May 13, 2018 • edited Loading

RachelComerford commented May 13, 2018

iherman commented May 14, 2018

HadrienGardeur commented May 14, 2018

BigBlueHat commented May 14, 2018

BigBlueHat commented May 14, 2018

RachelComerford commented May 14, 2018 • edited by BigBlueHat Loading

llemeurfr commented May 14, 2018

BigBlueHat commented May 14, 2018

BigBlueHat commented May 14, 2018

iherman commented May 15, 2018

iherman commented May 15, 2018

HadrienGardeur commented May 15, 2018

HadrienGardeur commented May 15, 2018

llemeurfr commented May 15, 2018 • edited Loading

BigBlueHat commented May 15, 2018

BigBlueHat commented May 15, 2018

TzviyaSiegman commented May 15, 2018

BigBlueHat commented May 15, 2018

HadrienGardeur commented May 15, 2018

BigBlueHat commented May 15, 2018

css-meeting-bot commented May 21, 2018

GarthConboy commented May 21, 2018

HadrienGardeur commented May 9, 2018 •

edited

Loading

BigBlueHat commented May 9, 2018 •

edited by dauwhe

Loading

JayPanoz commented May 13, 2018 •

edited

Loading

RachelComerford commented May 14, 2018 •

edited by BigBlueHat

Loading

llemeurfr commented May 15, 2018 •

edited

Loading