-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manifest format #7
Comments
name: Moby-Dick
id: code.google.com.epub-samples.moby-dick-basic
lang: en-US This is valid YAML, is delightfully simple, but can also handle any complexity we might need. And note that the Readium folks even use YAML to describe the LCP configuration files. Even they don't use JSON :) |
From a developer perspective, JSON is a vast improvement over XML. It's by far the top reason why JSON has become the most popular serialization format for APIs. |
Sure, I'm not disputing the value of JSON for developers, but if we really want to expand the uptake of web publications we need to make authoring as dead simple as possible. Asking the average Joe publication author to figure out when they need a comma or a brace or bracket leaves us no better off than XML on that front. If publications were a developer-only domain (like apps) it'd be fine to expect a certain level of knowledge of the implementer, but we ignore the unwashed masses at our peril. |
Frankly, I don't think that these are mutually exclusive. A lot of people prefer writing Markdown instead of HTML, but there are plenty of option to generate HTML out of Markdown. Same idea here: we could offer YAML or whatever's easier as part of an authoring toolchain, without making things harder for developers (which will definitely expect JSON from us). |
Sure, I didn't disagree above, but toolchains are just another hoop the author has to go through, and the fewer the better. If it doesn't make it past the cutting room floor, I'd hope we'd develop such an expression language and toolchain in parallel. |
Creating such a toolchain in parallel would be a good idea and should be fairly simple in Ruby/Python/Go. IMO this can't be what we use for our manifest's serialization though. Unlike developers, I don't expect consensus among content producers. That's especially true considering the scope for Web Publications (much bigger than traditional publishing). Different people, types of content and industries will result in many different toolchains and workflows. |
Sorry, @mattgarrish , but I don't think we should be developing this standard assuming that humans (of any type) will hand code it. We should expect that authors will be using some form of tool - whether it is Google Docs, InDesign, BlueGriffon or something internal. And as @HadrienGardeur points out - there are too many chains to be focused on any of them. |
{
"section": {
"p": {
"strong": "Strongly disagree",
"#text": [
". HTML and CSS triumphed",
"they were understandable (at some level) to mortals."
],
"em": "because"
}
}
} |
Oh yeah - that's understandable to humans, Dave :).
…On Tue, Jul 11, 2017 at 2:22 PM, Dave Cramer ***@***.***> wrote:
I don't think we should be developing this standard assuming that humans
(of any type) will hand code it.
{
"section": {
"p": {
"strong": "Strongly disagree",
"#text": [
". HTML and CSS triumphed",
"they were understandable (at some level) to mortals."
],
"em": "because"
}
}
}
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AE1vNdhgIthrHgbwmN_jX3ODHzg85HmDks5sM711gaJpZM4OOoTe>
.
|
Exactly what @dauwhe says in his json-y way. Tools may be the first step, but the experience from epub (and the web) is that no one can rely on them completely. We should be cognizant that authors (of very limited technical skill) will need to adjust and fix their manifests, regardless of what tool help they have. The more complicated that is to do, the less love we'll get back. But I don't disagree with making this a side project. I'm still of the belief it will help adoption. |
Why do we believe this to be true?
PDFs - one of the two most prevalent document (and publication) format - is
not hand-coded at all. (well, except by crazy people like me)
Office documents - the other prevalent set of document formats are not
hand-coded.
Even the vast majority of the HTML on the web today is *not* hand-coded.
It's authored by tools/systems (eg. Wordpress, etc.)
So why do we think ours need to support that?
…On Tue, Jul 11, 2017 at 2:35 PM, Matt Garrish ***@***.***> wrote:
Exactly what @dauwhe <https://github.com/dauwhe> says in his json-y way.
Tools may be the first step, but the experience from epub (and the web) is
that no one can rely on them completely.
We should be cognizant that authors (of very limited technical skill) will
need to adjust and fix their manifests, regardless of what tool help they
have. The more complicated that is to do, the less love we'll get back.
But I don't disagree with making this a side project. I'm still of the
belief it will help adoption.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AE1vNYwtmWekFylL69Nzm_p6zKP_36Dbks5sM8BdgaJpZM4OOoTe>
.
|
Because of the Rule of Least Power or from the W3C's own perspective (and at length): https://www.w3.org/2001/tag/doc/leastPower.html It depends more on what we're putting in this manifest, where that information comes from, and (at the end) how that manifest will be used on the receiving/rendering end. All of the document formats mentioned so far (Markdown, YAML, JSON, and even XML) can be made to be interchangeable. Picking one will depend not on Hugs, |
From @florianrivoal over in #5 . Moved here as it's about manifests... I sort of agree with many of your points, except that I would like the thing you call the manifest to be possible to open in a non-WP aware UA, and still have it make sense. Which is why I think the master-file that ties the publication together and contains the information that we normally describe as being the manifest should be an html file. Whether that file incluse the info using inline JSON, or in attributes on the HTML is a bit of a separate question, but if the main file of a WP is something that legacy UAs cannot make sense of, then WP aren't first class citizens on the web, as you cannot effectively link to the WP (and have ordinary UAs do something sensible). You can of course link to parts of the WP, but that's the status quo. HTML documents are already first class citizen on the web. We're trying to make Collections of things be first class citizen. If you cannot link to the collection (or can link to the collection in a way that makes no sense to web UAs), we've failed. The document that contains the Table of Content for human readers' benefit is uniquely well suited to be the master-file of the publication (liberally borrowing from your earlier statement): it can uniquely identifies the publication (unlike other publication resources, that could be present in multiple publications) |
Except that it assumes that every publication has a TOC - and that's certainly not the case. It's not the case today with books and certainly won't be the case as we expand to numerous other types of publications. However, I do agree with you that the TOC should be in standard HTML and not something else. |
I have a few thoughts on the discussion in this thread (and on the discussion of HTML versus JSON in general such as in issue #35):
Overall, it makes sense to me to look at many of the elements of the abstract manifest separately when you consider which format is optimal for that structure. Tables of Content are simpler to write and parse as HTML because they are going to involve rich formatting of some sort which JSON is really bad at. So even if you did do the basic structure as JSON, you still might have to use HTML for many of the strings. Metadata, which in most cases is largely a tree of relatively simple key-value pairs, is simpler in JSON than in HTML. Flat lists of uniform objects (e.g. links, secondary resources, and reading order) are equally complex to author in either format while JSON is vastly simpler to parse. But in this case, the usefulness of HTML as a fallback is limited as browsers don't do anything meaningful in publication terms with these lists. They aren't intended to be human-readable lists in either format—mostly just lists of resources with types and urls that the UA can act on. Given that browsers don't do anything with this data as is, choosing a format that is easier to parse should both aid implementation and polyfills. All of the above is just a long way to say that I find JSON manifests that link to HTML ToCs to be a solution that strikes a balance between complexity and ease of use. However… As has come up several times throughout these discussions the format might benefit a lot from specifying fallbacks for a variety of the manifest's data and structures. This would both increase reliability from the reader's perspective but also could go a long way towards making the format easier to author. If we do a decent job of specifying these fallbacks many authors may well be able to author web publications purely through HTML which the UA then heuristically converts on the fly into a proper web publication. That would be a very useful feature for many people if we could pull it off. But for that to work reliably we need to have specified the manifest proper first because fallbacks and heuristic processes generally work much better when they are specified in terms of other less ambiguous and more explicitly defined structures. That is, we probably need to specify what all of this looks like as JSON-like objects anyway so we might as well start there and then make that the basis for the format. |
@iherman, @GarthConboy, and I had a discussion with @marcoscaceres about why JSON was chosen for Web App Manifest and why other formats were NOT chosen. With his permission, we are reproducing part of it here. Although HTML has the innocent appearance of being a simple text format to structure written content, over the years it has evolved to cater for dynamic application development. Unlike formats specifically designed to structure data (e.g., JSON and XML), HTML presumes:
This makes it extremely challenging to use HTML as a metadata format: Now, XML is much better suited to the task of structuring metadata because it doesn't require all of things above (but still requires some!). However, because of DTDs and entities, namespaces, etc. it also requires a lot of "machinery" to process a document - and a high cognitive overhead for developers, particularly when it comes to working with such documents on the web. Over the last decade, in the standards community, we've seen a shift away from using XML to embracing JSON. The reason for this is quite simple: firstly, JSON's parsing rules are very simple, which sometimes seems like a weakness but are its strength ("less is more") - it's built to be "inert" by design: unlike XML or HTML, there are no dynamic parts (e.g., replaced entities, DTD loading, or scripts) or the possibility of loading things. JSON is "just JavaScript"(tm), meaning it provides basic types (UTF-8 strings, arrays, numbers, and object-literals) it integrates perfectly with the Web - which itself is built on JavaScript. This means that once JSON text is converted back to JavaScript, a developer suddenly has an extremely powerful toolset of APIs with which to manipulate that data (the rapidly evolving JavaScript standard library). JSON is also portable outside the Web, and works well with all other popular programming languages used by web developers. It also doesn't require all the heavy machinery that HTML requires, and it requires significantly less machinery (and cognitive overhead) than XML. JSON is by no means perfect: it has some silly rules, like not allowing "," after property declarations ... and it lacks a way to include comments. It's well suited for most cases where one just needs to structure simple data - and especially well suited for specialized domains where the semantics of each property are well understood (e.g., see Web Manifest - where the semantics of each element correspond to a web application). |
HTML is processed by lots of tools and applications, not just browsers. Most of those processors don't provide even half of that list of "presumptions." Also, within the context of a browser, we have them already. JSON is fabulous at "boring" data. However, it is not natively "webby"--it does not provide hypermedia affordances (links, forms, etc). Those can be added "on top" by defining new processing models and media types (i.e. For instance, there's an open issue about how images are processed within Web App Manifests: w3c/manifest#465 It has to be redefined in the context of this new format. HTML will be used for reference. |
From the context I think this commit a year ago w3c/manifest@b330c6a was intended to close this issue. Based on my reading of the spec I'm pretty sure that issue is covered in the current version by this section https://www.w3.org/TR/appmanifest/#image-object-and-its-members and this https://www.w3.org/TR/appmanifest/#installation-process Also, as I think I've mentioned before elsewhere, we definitely should specify intended behaviours using Fetch and Service Worker behaviours and features as a reference much like the Web App Manifest does currently. I think that's a given irrespective of the serialisation. Even when using HTML as a manifest format that does not remove the need to define these data structures or how they are ultimately used in the UA (e.g. see my comment at #35 (comment)) and it adds the need to define a mapping from HTML to an object format (DOM, most likely). From an implementation and specification perspective it's overall more complicated and more involved while the benefits to authors are debatable. That's without getting into the issue of 'forking' HTML to create essentially a new format with a slightly different processing model (i.e. the 'book' attribute on the root) which from what I understand is problematic enough in its own right.
The manifest infoset as proposed so far is very much "boring" data. |
decision reached in meeting on 2017-08-28 to use JSON |
See telco discussion on closure. |
One issue that has stuck with me since the NYC meeting and that I was discussing with @dauwhe is whether json is really a vast improvement over xml for the average book/publication developer. If we consider the complexity of the epub package document a failure point, should we consider the merit of allowing a simpler alternative representation so that we don't switch from complaints about the idiosyncrasies of xml to those of json? (e.g., quoting, escaping, objects, arrays)
In particular, I'm wondering if some kind of markdown-like syntax wouldn't greatly improve life for the average publication author. The user agent could translate up to json.
The one concern is that such a representation may not be sufficiently robust enough to express everything, but perhaps that's the dividing line between using a simplified syntax or the formal one.
Just food for thought, as this could also be done independently of the spec to simplify authoring.
The text was updated successfully, but these errors were encountered: