-
Notifications
You must be signed in to change notification settings - Fork 0
/
e0.html
413 lines (405 loc) · 24.3 KB
/
e0.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
<!DOCTYPE html>
<html lang="en">
<head>
<meta content="text/html; charset=UTF-8" http-equiv="content-type">
<title>Why I still don't like EPUB</title>
<style type="text/css">
.issue {
border-top-width: thin;
border-right-width: thin;
border-bottom-width: thin;
border-left-width: thin;
border-top-style: solid;
border-right-style: solid;
border-bottom-style: solid;
border-left-style: solid;
border-top-color: #e54d5b;
border-right-color: #e54d5b;
border-bottom-color: #e54d5b;
border-left-color: #e54d5b;
}
.issue::before {
background-color: #e54d5b;
color: white;
font-weight: bold;
content: attr(data-issue);
margin-right: 1em;
display: inline-block;
padding-top: 0.5em;
padding-right: 0.5em;
padding-bottom: 0.5em;
padding-left: 0.5em;
}
body {
font-family: "Arial", "Helvetica", "sans-serif";
}
pre {
background-color: #e7e7e7;
margin-left: 2em;
margin-right: 2em;
padding-top: 1em;
padding-right: 1em;
padding-bottom: 1em;
padding-left: 1em;
}
</style></head>
<body>
<p>Long looooong ago, I wrote a <a href="http://www.glazman.org/weblog/dotclear/index.php?post/2003/04/14/2880-untitled">deep
review of the XHTML 2.0 spec</a> that was one of the elements that led
to the resuming of the HTML activity at the W3C and the final dismissal of
XHTML 2.0. </p>
<p>Long ago, I started a similar effort on EPUB that led to Dave Cramer's
EPUB Zero. <a href="http://www.walrus-books.com/livre-numerique-ce-que-nous-avons-rate-pourquoi-nous-lavons-rate-et-comment-nous-allons-changer-les-choses/">It's
time</a> (fr-FR) to draw some conclusions.</p>
<p>This document is <a href="https://github.com/therealglazou/epub31review">maintained
on GitHub</a> and accepting contributions. The document can be read at <a
href="http://glazman.org/e0/e0.html">http://glazman.org/e0/e0.html</a>.</p>
<p>Daniel Glazman</p>
<h2>OCF</h2>
<ol>
<li>EPUB publications are not "just a zip". They are zips with special
constraints. I question the "<em><code>mimetype</code> file in first
uncompressed position</em>" constraint since I think the vast majority
of reading systems don't care (and can't care) because most of the
people creating EPUB at least partially by hand don't know/care. The
three last contraints (zip container fields) on the ZIP package
described in section 4.2 of the spec are usually not implemented by
Reading Systems.</li>
<li>the <code>container </code>element of the <code>META-INF/container.xml</code>
file has a <code>version</code> attribute that is always "<code>1.0</code>",
whatever the EPUB version. That forces editors, filters and reading
systems to dive into the default rendition to know the EPUB version and
that's clearly one useless expensive step too much.</li>
<li>having multiple renditions per publication reminds me of MIME <code>multipart/alternative</code>.
When Borenstein and Freed added it to the draft of RFC 1341 some 25
years ago, Mail User Agents developers (yours truly counted) envisioned
and experimented far more than alternatives between <code>text/html</code>
and <code>text/plain</code> only. I am under the impression multiple
renditions start from the same good will but fail to meet the goals for
various reasons:
<ol>
<li>each additional rendition drastically increases the size of the
publication...</li>
<li>most authoring systems, filters and converters on the market don't
deal very well with multiple renditions</li>
<li>EPUB 2 defined the default rendition as the first rendition with a
<code>application/oebps-package+xml</code> mimetype while the
EPUB 3 family of specs defines it as the first rendition in the
container</li>
<li>while a MIME-compliant Mail User Agent will let you compose a
message in <code>text/html</code> and output for you the <code>multipart/alternative</code>
between that <code>text/html</code> and its <code>text/plain</code>
serialization, each Publication rendition must be edited separately.</li>
<li>in the case of multiple renditions, each rendition has its own
metadata and it's then legitimate to think the publication needs its
own metadata too. But the <code>META-INF/metadata.xml</code> file
has always been quoted as "<em>This version of the OCF specification
does not define metadata for use in the metadata.xml file.
Container-level metadata may be defined in future versions of this
specification and in IDPF-defined EPUB extension specifications.</em>"
by all EPUB specifications. The <code>META-INF/metadata.xml</code>
should be dropped.</li>
<li>encryption, rights and signatures are per-publication resources
while they should be per-rendition resources.</li>
</ol>
</li>
<li>the <code>full-path</code> attribute on <code>rootfile</code>
elements is the only path in a publication that is relative to the
publication's directory. All other URIs (for instance in <code>href</code>
attributes) are relative to the current document instance. I think <code>full-path</code>
should be deprecated in favor of <code>href</code> here, and finally
superseded by <code>href</code> for the next major version of EPUB.</li>
<li>we don't need the <code>mimetype</code> attribute on <code>rootfile</code>
elements since the prose of EPUB 3.1 says the target of a rootfile must
be a Package Document, i.e. an OPF file... If EPUB 2 OCF could directly
target for instance a PDF file, it's not the case any more for OCF 3.</li>
<li>absolute URIs (for instance <code>/files/xhtml/index.html</code> with
a leading slash, cf. <code>path-absolute</code> construct in RFC 3986)
are harmful to EPUB+Web convergence</li>
<li>if multiple renditions are dropped, the <code>META-INF/container.xml</code>
file becomes useless and it can be dropped.</li>
<li>the prose for the <code>META-INF/manifest.xml</code> makes me wonder
why this still exists: "does not mandate a format", "MUST NOT be used".
Don't even mention it! Just say that extra unspecified files inside <code>META-INF</code>
directory must be ignored (Cf. OCF section 3.5.1) , and possibly reserve
the <code>metadata.xml</code> file name, period. Oh, and a ZIP is also
a manifest of files...</li>
<li>I am not an expert of encryption, signatures, XML-ENC Core or XML DSIG
Core, so I won't discuss <code>encryption.xml</code> and <code>signatures.xml</code>
files</li>
<li>the <code>rights.xml</code> file still has no specified format.
Strange. Cf. item 8 above.</li>
<li>Resource obfuscation is so weak and useless it's hilarious. Drop it.
It is also painful in EPUB+Web convergence.</li>
<li>RNG schemas are not enough in the OCF spec. Section 3.5.2.1 about <code>container.xml</code>
file for instance shouls have models and attribute lists for each
element, similarly to the Packages spec.</li>
<li>not sure I have ever seen <code>links</code> and <code>link</code>
elements in a <code>container.xml</code> file... (Cf. <a href="https://github.com/idpf/epub-revision/issues/374">issue
#374</a>). The way these links are processed is unspecified anyway.
Why are these elements normatively specified since extra elements are
allowed - and explicitely ignored by spec - in the container?</li>
</ol>
<h2>Packages</h2>
<ol>
<li>a Package consists of one rendition only.</li>
<li>I have never understood the need for a manifest of resources inside
the package, probably because my own publications don't use Media
Overlays or media fallbacks.</li>
<li>fallbacks are an inner mechanism also similar to multipart/alternative
for renditions. I would drop it.</li>
<li>I think the whole package should have properties identifying the
required technologies for the rendition of the Package (e.g. script),
and avoid these properties on a per-item basis that makes no real sense.
The feature is either present in the UA for all content documents or for
none.</li>
<li>the <code>spine</code> element represent the default reading order of
the package. Basically, it's a list. We have lists in html, don't we?
Why do we need a painful and complex proprietary xml format here?</li>
<li>the name of the <code>linear</code> attribute, that discriminates
between primary and supplementary content, is extremely badly chosen. I
always forget what really is <code>linear</code> because of that.</li>
<li>Reading Systems are free, per spec, to completely ignore the <code>linear</code>
attribute, making it pointless from an author's point of view.</li>
<li>I have never seen the <code>collection</code> element used and I
don't really understand why it contains <code>link</code> elements and
not <code>itemref</code> elements</li>
<li>metadata fun, as always with every EPUB spec. Implementing <code>refines</code>
in 3.0 was a bit of a hell (despite warnings to the EPUB WG...), and
it's gone from 3.1, replaced by new attributes. So no forwards
compatbility, no backwards compatibility. Yet another parser and
serializer for EPUB-compliant user agents.</li>
<li>the old OPF <code>guide</code> element is now a html <code>landmarks</code>
list, proving it's feasible to move OPF features to html</li>
<li>the Navigation Document, an html document, is mandatory... So all the
logics mentioned above could be there.</li>
<li><em>Fixed Layout delenda est</em>. Let's use CSS Fragmentation to make
sure there's no orphaned content in the document post-pagination, and if
CSS Fragmentation is not enough, make extension contributions to the CSS
WG.</li>
<li>without 3.0 <code>refines</code>, there is absolutely nothing any
more in 3.1 preventing Package's metadata to be expressed in html; in
3.0, the <code>refines</code> attribute was a blocker, implying an
extension of the model of the <code>meta</code> html element or another
ugly IDREF mechanism in html.</li>
<li>the <code>prefix</code> attribute on the package element is a good
thing and should be preserved</li>
<li>the <code>rendition-flow</code> property is weird, its values being <code>paginated</code>,
<code>scrolled-continuous</code>, <code>scrolled-doc</code> and <code>auto</code>.
Where is <code>paginated-doc</code>, the simplest paginated mode to
implement?</li>
<li>no more NCX, finally...</li>
<li>the Navigation Document is already a html document with <code>nav</code>
elements having a special <code>epub:type</code>/<code>role</code> (see
<a href="https://github.com/IDPF/epub-revision/issues/941">issue #941</a>),
that's easy to make it contain an equivalent to the spine or more.</li>
</ol>
<h2>Content Documents</h2>
<ol>
<li>let's get rid of the epub namespace, please...</li>
</ol>
<h2>Media Overlays</h2>
<ol>
<li>SMIL support across rendering engines is in very bad shape and the
SMIL polyfill does not totally help. Drop Media Overlays for the time
being and let's focus on visual content.</li>
</ol>
<h2>Alternate Style Tags</h2>
<ol>
<li>terrible spec... Needed because of Reading Systems' limitations but
still absolutely terrible spec...</li>
<li>If we get rid of backwards compatibility, we can drop it. Submit
extensions to Media Queries if needed.</li>
</ol>
<h2><em>On the fly</em> conclusions</h2>
<ol>
<li>backwards compatibility is an enormous burden on the EPUB ecosystem</li>
<li>build a new generation of EPUB that is not backwards-compatible</li>
<li>the <code>mimetype</code> file is useless</li>
<li>file extension of the publication MUST be well-defined<code></code></li>
<li>one rendition only per publication and no more <code>links</code>/<code>link</code>
elements</li>
<li>in that case, we don't need the <code>container.xml</code> file any
more</li>
<li><code>metadata.xml</code> and <code>manifest.xml</code> files removed</li>
<li>we may still need <code>encryption.xml,</code> <code>signatures.xml</code>
and <code>rights.xml </code>inside a META-INF directory (or directly
in the package's root after all) to please the industry.</li>
<li><code>application/oebps-package+xml</code> mimetype is not necessary</li>
<li>the EPUB spec evolution model is/was "we must deal with all cases in
the world, we move very fast and we fix the mistakes afterwards". I
respectfully suggest a drastic change for a next generation: "let's
start from a low-level common ground only and expand slowly but cleanly"</li>
<li>the OPF file is not needed any more. The root of the unique rendition
in a package should be the html Navigation Document.</li>
<li>the metadata of the package should be inside the <code>body</code>
element of the Navigation Document</li>
<li>the spine of the package can be a new <code>nav</code> element inside
the <code>body</code> of the Navigation Document</li>
<li>intermediary summary: let's get rid of both <code>META-INF/container.xml</code>
and the OPF file... Let's have the Navigation Document mandatorily named
<code>index.xhtml</code> so a directory browsing of the uncompressed
publication through http will render the Navigation Document.</li>
<li>let's drop the Alternate Style Tags spec for now. Submission of new
Media Queries to CSS WG if needed.</li>
<li>let's drop Media Overlays spec for now.</li>
</ol>
<h2>CONCLUSION</h2>
<p>EPUB is a monster, made to address very diverse markets and ecosystems,
too many markets and ecosystems. It's weak, complex, a bit messy,
disconnected from the reality of the Web it's supposed to be built upon
and <a href="http://www.walrus-books.com/livre-numerique-ce-que-nous-avons-rate-pourquoi-nous-lavons-rate-et-comment-nous-allons-changer-les-choses/">some
claim</a> (link in fr-FR) it's too close to real books to be disruptive
and innovative.</p>
<p>I am then suggesting to severe backwards compatibility ties and restart
almost from scratch, and entirely and purely from W3C Standards. Here's
the proposed result:</p>
<hr>
<h2>1. E0 Publication</h2>
<p>A E0 Publication is a ZIP container. Files in the ZIP must be stored as
is (no compression) or use the Deflate algorithm. File and directory names
must be encoded in UTF-8 and follow some restrictions (see EPUB 3.1
filename restrictions).</p>
<p>The file name of a E0 Publication MUST use the <code>e0</code> file
extension.</p>
<p data-issue="GH 11" class="issue">Do we really need a mandatory file
extension? edasfr thinks we don't so we can deal with zipped web sites.</p>
<p>A E0 Publication MUST contain a Navigation Document. It MAY contain files
encryption.xml, signatures.xml and rights.xml (see OCF 3.1 for more
information). All these files must be placed directly inside the root of
the E0 Publication.</p>
<p>A E0 Publication can also contain Content Documents and their resources.</p>
<p>Inside a E0 Publication, all internal references (links, anchors,
references to replaced elements, etc) MUST be strictly relative. With
respect to section 4.2 of RFC 3986, only <code>path-noscheme</code> and <code>path-empty</code>
are allowed for IRIs' <code>relative-part</code>. External references are
not restricted.</p>
<h2>2. E0 Navigation Document</h2>
<p>A E0 Navigation Document is a html document. Its file name MUST be <code>index.xhtml</code>
if the document is a XML document and <code>index.html</code> if it it is
not a XML document. A E0 Publication cannot contain both <code>index.html</code>
and <code>index.xhtml</code> files.</p>
<p data-issue="GH 13" class="issue">index.html: part of the document, or
just for metadata</p>
<p>A E0 Navigation Document contains at least one <code>header</code>
element (for metadata) and at least two <code>nav</code> html elements
(for spine and table of contents) in its <code>body</code> element.</p>
<p data-issue="GH 12" class="issue">dauwhe wants to preserve a manifest of
file...</p>
<h3>2.1. E0 Metadata in the Navigation Document</h3>
<p>E0 metadata are designed to be editable in any Wysiwyg html editor, and
potentially rendered as regular html content by any Web browser.</p>
<p>E0 metadata are expressed inside a mandatory <code>header</code> html
element inside the Navigation Document. That element must carry the "<code>metadata</code>"
ID and the <code>vocab</code> attribute with value "<code>http://www.idpf.org/2007/opf</code>".
All metadata inside that <code>header</code> element are then expressed
using html+RDFa Lite 1.1. E0 metadata reuse EPUB 3.1 metadata and
corresponding unicity rules, expressed in a different way.</p>
<p data-issue="GH 5" class="issue">edasfr wants a way to "externalize" all
of this and link rel=toc it</p>
<p data-issue="GH 7" class="issue">Do we really need @vocab?</p>
<p data-issue="GH 8" class="issue">Explain that we use RDFa Lite 1.1 only
for @vocab, @prefix and @property attributes. Explain that JSON-LD is not
Wysiwygly editable nor trivially rendered by web browsers.</p>
<p>Refinements of metadata are expressed through nesting of elements.</p>
<p>Example:</p>
<pre><header id="metadata"
vocab="http://www.idpf.org/2007/opf"><br> <h1>Reading Order</h1>
<ul>
<li>Author:
<span property="dc:creator">glazou<br> (<span property="file-as">Glazman, Daniel</span>)</span></li>
<li>Title:
<span property="dc:title">E0 Publications</span></li>
</ul>
</header></pre><p>
The mandatory <code>title</code> element of the Navigation Document,
contained in its <code>head</code> element, should have the same text
contents than the first "dc:title" metadata inside that header element.</p><p
data-issue="GH 10" class="issue">edasfr does not like the paragraph above</p><p
data-issue="GH 1" class="issue">rdeltour wants to keep Media Overlays</p><p data-issue="GH 6"
class="issue">Make that header optional. The only mandatory thing is the title and it's in the <code>title</code> element of the document.</p><p
data-issue="GH 9" class="issue">edasfr thinks special nav elements should not be identified by an ID. I agree this is restriction of the ID value namespace but I don't like his solution, using @role only.</p>
<h3>2.2. E0 Spine</h3>
<p>The spine of a E0 Publication is expressed in its Navigation Document as
a new <code>nav</code> element holding the "spine" ID. The spine <code>nav</code>
element is mandatory.</p><p data-issue="GH 2" class="issue">Dave Cramer suggested to make the element optional and use the ToC instead if the spine is not present. I like it.</p><p
data-issue="GH 3" class="issue">edasfr thinks we should drop the spine since it can be recreated from link rel=prev/next elements. This is true but expensive. And you still need the first document to render...</p>
<p data-issue="GH 14" class="issue">Explicit spine vs rel=next / rel=prev</p><p>See <a
href="http://www.idpf.org/epub/31/spec/epub-packages.html#sec-package-nav">EPUB
3.1 Navigation Document</a>.</p>
<h3>2.3. E0 Table of Contents</h3>
<p>The Table of Contents of a E0 Publication is expressed in its Navigation
Document as a nav element carrying the "toc" ID. The Table of Contents <code>nav</code>
element is mandatory. </p><p data-issue="GH 4" class="issue">edasfr thinks the ToC should be optional.. Ahem. He also wants the spine to be dropped. So how do we define the first document to read?</p>
<p data-issue="GH 15" class="issue">What to do if there are several tocs?</p><p>See <a
href="http://www.idpf.org/epub/31/spec/epub-packages.html#sec-package-nav">EPUB
3.1 Navigation Document</a>.</p>
<h3>2.4. E0 Landmarks</h3>
<p>The Landmarks of a E0 Publication is expressed in its Navigation Document
as a nav element carrying the "landmarks" ID. The Landmarks <code>nav</code>
element is optional. </p>
<p>See <a href="http://www.idpf.org/epub/31/spec/epub-packages.html#sec-package-nav">EPUB
3.1 Navigation Document</a>.</p>
<h3>2.5. Other <code>nav</code> elements</h3>
<p>The Navigation Document may include one or more <code>nav</code>
elements. These additional <code>nav</code> elements should have an <code>role</code>
attribute to provide a machine-readable semantic, and must have a
human-readable heading as their first child.</p>
<p>IDs "metadata", "spine" , 'landmarks" and "toc" are reserved in the
Navigation Document and must not be used by these extra <code>nav</code>
elements.
</p>
<h3>2.6. Example of a Navigation Document</h3>
<pre><!DOCTYPE html>
<html lang="en">
<head>
<meta content="text/html; charset=UTF-8" http-equiv="content-type">
<title>Moby-Dick</title>
</head>
<body>
<header id="metadata"
vocab="http://www.idpf.org/2007/opf">
<ul>
<li>Author:
<span property="dc:creator">Herman Melville<br> (<span property="file-as">Melville Herman</span>)</span></li>
<li>Title:
<span property="dc:title">Moby-Dick</span></li>
<li>Identifier:
<span property="dc:identifier">glazou.e0.samples.moby-dick</span></li>
<li>Language:
<span property="dc:language">en-US</span></li>
<li>Last modification:
<span property="dcterms:modified">2017-01-17T11:16:41Z</span></li>
<li>Publisher:
<span property="dc:publisher">Harper & Brothers, Publishers</span></li>
<li>Contributor:
<span property="dc:contributor">Daniel Glazman</span></li>
</ul>
</header>
<nav id="spine">
<h1>Default reading order</h1>
<ul>
<li><a href="cover.html">Cover</a></li>
<li><a href="titlepage.html">Title</a></li>
<li><a href="toc-short.html">Brief Table of Contents</a></li>
...
</ul>
</nav>
<nav id="toc" role="doc-toc">
<h1>Table of Contents</h1>
<ol>
<li><a href="titlepage.html">Moby-Dick</a></li>
<li><a href="preface_001.html">Original Transcriber’s Notes:</a></li>
<li><a href="introduction_001.html">ETYMOLOGY.</a></li>
...
</ol>
</nav>
</body>
</html></pre>
<h2>3. Directories</h2>
<p>A E0 Publication may contain any number of directories and nested
directories.</p>
<h2>4. E0 Content Documents</h2>
<p>E0 Content Documents are referenced from the Navigation Document. E0
Content Documents are html documents.</p><p>E0 Content Documents should contain <code><link rel="prev"...></code> and <code><link rel="next"...></code> elements in their <code>head</code> element conformant to the reading order of the spine present in the Navigation Document. Content Documents not present in that spine don't need such elements.</p><p>The <code>epub:type</code> attribute is superseded by the <code>role</code> attribute and must not be used.</p><h2>5. E0 Resources</h2><p>E0 Publications can contain any number of extra resources (CSS stylesheets, images, videos, etc.) referenced from either the Navigation Document or Content Documents.</p>
</body></html>