Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discourage using "out-of-container" relative URLs #1939

Merged
merged 15 commits into from
Dec 7, 2021
Merged
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 127 additions & 12 deletions epub33/core/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -6082,9 +6082,112 @@ <h4>URLs in the OCF Abstract Container</h4>
</div>

<p> In the <a>OCF Abstract Container</a>, when a file uses a URL string to reference another file in
the container, the string MUST be a <a data-cite="url#path-relative-scheme-less-url-string"
>path-relative-scheme-less-URL string</a>, optionally followed by <code>U+0023 (#)</code>
and a <a data-cite="url#url-fragment-string">URL-fragment string</a>. </p>
the container, the string MUST be a
iherman marked this conversation as resolved.
Show resolved Hide resolved
<a data-cite="url#path-relative-scheme-less-url-string">path-relative-scheme-less-URL string</a>,
optionally followed by <code>U+0023 (#)</code> and a
<a data-cite="url#url-fragment-string">URL-fragment string</a>.
</p>

<p class="note">
The properties of the <a>container root URL</a> are such that whatever the amount
of <a data-cite="url/#double-dot-path-segment">double-dot path segments</a> in a URL string (for
example, <code>../../../secret</code>), it is parsed to a <a>content URL</a> (and not "leak"
outside the container). This avoids potential run-time security issues.
Furthermore, the additional constraint and <a href="#algo-out-of-container">algorithm</a> below
ensure that such potentially problematic URLs can also be detected when checking the EPUB Document.
</p>

<p>To validate a URL string <var>url</var> found in the <a>OCF Abstract Container</a>, the following steps SHOULD return <var>true</var>:</p>
iherman marked this conversation as resolved.
Show resolved Hide resolved

<ol class="algorithm" id="algo-out-of-container">
<li>
Set the <a>container root URL</a> to <code>https://a.example.org/A/</code>.
iherman marked this conversation as resolved.
Show resolved Hide resolved
<details>
<summary>Explanation</summary>
<p class="note">
The goal of the algorithm is to detect whether <var>url</var> could be seen as
"leaking" outside the container.
To do that, the standard <a data-cite="url#concept-url-parser">URL parsing algorithm</a>
is used with an artificial root URL; the detection of the "leak" is done by
comparing the result of the parsing with the presence of the first test path
segment (<code>A</code>).
(Note that the artificial container root URL wilfully violates, for the purpose of this
algorithm, the <a href="#confreq-root-url">required properties</a> by using
that first test path segment.)
</p>
</details>
</li>

<li>
Let <var>base</var> be the <a data-cite="url#concept-base-url">base URL</a> that must be used to parse <var>url</var> as defined by the context (document or environment) where <var>url</var> is used, and according to the <a>content URL</a> of the <a>Package Document</a> (see <a href="#sec-parse-package-urls"></a>).
<details>
<summary>Explanation</summary>
<p class="note">
In the case of a URL in the package document the <var>base</var> variable is set
to the <a>content URL</a> of the <a>Package Document</a>. In the case of a
URL in an XHTML Content Document, the base URL used for parsing is defined by the
<a data-cite="html#resolving-urls">HTML standard</a>. Typically, it will be
the <a>content URL</a> of the content document (unless the <a href="#sec-xhtml-deviations-base">discouraged</a> <code>base</code> element is used).
</p>
</details>
</li>

<li>
Let <var>testURLRecord</var> be the result of applying the <a data-cite="url#concept-url-parser">URL parser</a> to <var>url</var>, with <var>base</var>.
</li>

<li>
Let <var>testURLStringA</var> be the result of applying the <a data-cite="url#concept-url-serializer">URL Serializer</a> to <var>testURLRecord</var>.
</li>

<li>
Set the <a>container root URL</a> to <code>https://b.example.org/B/</code>.
<details>
<summary>Explanation</summary>
<p class="note">The reasons to repeat the same steps twice with different, and artificial, settings of the container root URL is to avoid collision which may occur if the <var>url</var> string also includes <code>/A/</code>. Consider, for example, the case where <var>url</var> is <code>../../A/doc.xhtml</code>.</p>
</details>
</li>

<li>
Set <var>base</var> to be the <a data-cite="url#concept-base-url">base URL</a> that must be used to parse <var>url</var> as defined by the context (document or environment) where <var>url</var> is used, and according to the <a>content URL</a> of the <a>Package Document</a> (see <a href="#sec-parse-package-urls"></a>).
</li>

<li>
Set <var>testURLRecord</var> to be the result of applying the <a data-cite="url#concept-url-parser">URL parser</a> to <var>url</var>, with <var>base</var>.
</li>

<li>
Let <var>testURLStringB</var> be the result of applying the <a data-cite="url#concept-url-serializer">URL Serializer</a> to <var>testURLRecord</var>.
</li>

<li>
If <var>testURLStringA</var> does not start with <code>https://a.example.org/</code> or <var>testURLStringB</var> does not start with <code>https://b.example.org/</code>, return <var>true</var>.
<details>
<summary>Explanation</summary>
<p class="note">
If any of the result does not share the test URL host, it means that <var>url</var>, or
its base URL (for example, in HTML, if it is explicitly set with the <code>base</code>
element), was <em>absolute</em> and points outside the container. This is acceptable.
</p>
</details>
</li>

<li>
If <var>testURLStringA</var> starts with <code>https://a.example.org/A/</code> and <var>testURLStringB</var> starts with <code>https://b.example.org/B/</code>, return <var>true</var>.
<details>
<summary>Explanation</summary>
<p class="note">The presence of the first test path segments (<code>A</code>, respectively <code>B</code>) indicate that the URL doesn't leak outside the container.</p>
iherman marked this conversation as resolved.
Show resolved Hide resolved
</details>
</li>

<li>Return <var>false</var>.</li>
</ol>

<p class="note">
For better interoperability with non-conforming or legacy Reading Systems and toolchains,
EPUB Creators should not use more <a data-cite="url/#double-dot-path-segment">double-dot path segments</a>
than needed to reach the target container file.
</p>

<aside class="example" title="Referencing a file in the same directory">
<p>In this example, the file <code>image1.jpg</code> is in the same directory as the <a>XHTML
Expand All @@ -6101,14 +6204,23 @@ <h4>URLs in the OCF Abstract Container</h4>
&lt;/html></pre>
</aside>

<p class="note"> The properties of the <a>container root URL</a> are such that whatever the amount
of <a data-cite="url/#double-dot-path-segment">double-dot path segments</a> in a URL string (for
example, <code>../../../secret</code>), it will be parsed to a content URL (and not "leak"
outside the container). However, for better interoperability with non-conforming or legacy
Reading Systems, EPUB Creators should avoid using more <a
data-cite="url/#double-dot-path-segment">double-dot path segments</a> than needed to reach
the target container file. </p>
</section>
<aside class="example" title='An "out-of-container" URL'>
<p>Given the following container structure:</p>

<pre>
/
├── mimetype
├── META-INF
│   └── container.xml
└── EPUB
   └── content.xhtml
</pre>

<p>
A URL `../../../../EPUB/secret.xhtml` appearing in `content.xhtml` would be parsed by a Reading System into a <a>content URL</a> with a path `EPUB/secret.xhtml`, following the constraints on the <a>container root URL</a>. However, as the URL could be perceived as one of a resource outside the container, and create interoperability issues, it would be reported as a warning by a checker tool.
</p>
</aside>
</section>

<section id="sec-container-metainf">
<h4><code>META-INF</code> Directory</h4>
Expand Down Expand Up @@ -10556,8 +10668,11 @@ <h2>Change Log</h2>
>Working Group's issue tracker</a>.</p>

<ul>
<li>26-Nov-2021: A requirement and an algorithm to detect out-of-container URLs has been added to the
specification. See <a href="https://github.com/w3c/epub-specs/issues/1912">issue 1912</a>
</li>
<li>18-Nov-2021: Change to only disallow deprecated characters in the Tags and Variation Selectors
Supplement. See <a href="https://github.com/w3c/epub-specs/issues/1885">issue 1885</a></li>
Supplement. See <a href="https://github.com/w3c/epub-specs/issues/1885">issue 1885</a>.</li>
<li>12-Nov-2021: Change the recommendation to use SHA-1 to encrypt the obfuscation key to a requirement.
See <a href="https://github.com/w3c/epub-specs/issues/1873">issue 1873</a>.</li>
<li>12-Nov-2021: Restrict the obfuscation algorithm to fonts and add caution to use better protection
Expand Down