Discourage using "out-of-container" relative URLs #1939

iherman · 2021-11-25T15:15:28Z

This is the translation into a PR of the equilibrium point in the discussion in #1912.

Fixes #1912

rdeltour

It seems I misunderstood our equilibrium point 😅

I believe the algorithm should be normative, and the explanation ("more double-dot segments than needed") informative in a note.

I think we can tweak the algorithm so that it becomes 100% correct, even if not 100% explicit.

iherman · 2021-11-26T06:02:51Z

Hm. My initial thoughts were that if we include the base element into the determination of the current context then we may get to the right algorithm but then I got the impression that there are some other details that evade me. I am happy to be stand corrected if we can turn the algo 100% with that.

However, we indeed seem to have misunderstood one another, though, I believe, what is there mostly reflect the views of @mattgarrish and @bduga...

iherman · 2021-11-26T10:08:50Z

Note also that if we take <base> into account, then, if we want to be 100% precise, the concept of out-of-container URL strings becomes iffy. Indeed, a the url string, by itself, may not be out-of-container, it is the combination of url and <base> that becomes out-of-container. No doubt this can also be spec-d, but it would really mean a complex algorithm in the spec, and I am not sure it is worth it...

rdeltour · 2021-11-26T13:55:00Z

@iherman I was thinking something along these lines could work:

For any URL string url found in the OCF Abstract Container, the following steps should return true:

Set the container root URL to https://a.example.org/A/.
Let base be the base URL that must be used to parse url, if any, as defined by the context (document or environment) where url is used, according to the container root URL.
Let testURLRecord be the result of applying the URL parser to url, with base.
Let testURLStringA be the result of applying the URL Serializer to testURLRecord
Set the container root URL to https://b.example.org/B/.
Set base to the base URL that must be used to parse url, if any, as defined by the context (document or environment) where url is used, according to the container root URL.
Set testURLRecord to the result of applying the URL parser to url, with base.
Let testURLStringB be the result of applying the URL Serializer to testURLRecord
If testURLStringA does not start with https://a.example.org/ or testURLStringB does not start with https://b.example.org/, return true.
If testURLStringA starts with https://a.example.org/A/ and testURLStringB starts with https://b.example.org/B/, return true.
Return false.

To summarize, the idea is to test any URL string (relative or not, it doesn't matter), with two test container root URL, parse it with their host-defined base, and inspect the two resulting strings:

If any of the result does not share the test URL host, it means the test URL was absolute, or the base was absolute and outside the container (e.g. base element with href set to https://w3.org).
Otherwise, both results should start with the respective test root URL (the presence of the first test path segment —A or B—, indicates that the URL doesn't leak outside the container).

I believe this algorithm fully achieves what we want to test.
The only part subject to some level of interpretation is "the base URL that must be used to parse url, if any, as defined by the context (document or environment) where url is used". The base URL concept being dependent on the host language, I don't see how to phrase it otherwise. An companion informative note could explain that bit.

What do you think?

rdeltour · 2021-11-26T13:58:00Z

Also, note that contrary to what I initially suggested in #1912, the above does not touch the URL standard definitions (of what is a relative-URL string, specifically).

It just adds a SHOULD criteria to all URL strings, in addition to the individual syntactic criteria (MUST) that we already define elsewhere.

iherman · 2021-11-26T15:19:37Z

@rdeltour yes, that works I believe.

That will affect the preceding paragraph that should, probably, disappear or be merged with another note that gives some background of what is to be achieved here. I can figure out something and we can take it from there.

it is better than what is currently in the PR; it does look a bit funny to have an algorithm in a note like that.

iherman · 2021-11-26T16:53:55Z

Ok, I have made the changes.

I know that, from a content point of view, the constraints on the container root URL are not strictly necessary, and we could rely on the Reading System spec only. However, I believe it is better to keep them here; it gives a complete picture for the author and may be an important aspect for someone wanting to add some more complex scripts to the publication. So I kept this for now.

I also used a trick that @mattgarrish and I used in the Publication Manifest spec, namely to use the <details> element to add explanation to some of the not-so-obvious algorithmic steps. I think this is better than adding a separate, big note that may disturb the flow of the spec. I have added only a smaller note following the algorithmic part.

I have, finally, moved the example and also added the example in the original problem setting of @rdeltour to make it clearer what happens with out-of-container cases.

rdeltour · 2021-11-26T19:54:55Z

Sounds good! I’ll review as soon as possible. 👍

rdeltour

Looks good! 👍
I prefer this version to the previous in-note algorithm 😊. Also, the inline explanations are helpful.

See comment details for some editorial suggestions.

epub33/core/index.html

…com/w3c/epub-specs into editorial/out-of-container-url-1912

epub33/core/index.html

iherman · 2021-12-06T08:14:56Z

@mattgarrish @dauwhe I plan to merge this PR this (Monday) evening or tomorrow morning; I have the impression that we have now converged to an acceptable consensus (we can always come back to some details in separate issues, if we want). Any objections?

dauwhe · 2021-12-06T16:19:12Z

Trying to understand the algorithm, using Example 53.

we set container root url = https://a.example.org/A/
The path to the XHTML content file is EPUB/content.xhtml
If we use the URL Parser to parse the path EPUB/content.xhtml with the base URL https://a.example.org/A/ we get https://a.example.org/A/EPUB/content.xhtml
If we then look at the suspect URL In the content doc ../../../../EPUB/secret.xhtml, with the content doc URL as the base (https://a.example.org/A/EPUB/content.xhtml) we get https://a.example.org/EPUB/secret.xhtml
We see that our result https://a.example.org/EPUB/secret.xhtml DOES NOT contain the /A/ segment, which means that there is potentially a leak outside the container.

Is this correct?

rdeltour · 2021-12-06T16:40:54Z

Is this correct?

yes, exactly 👍

Made the changes as discussed in the issue

4082d35

iherman requested review from bduga, dauwhe, mattgarrish, rdeltour and wareid November 25, 2021 15:16

iherman added 2 commits November 25, 2021 16:20

markup error reported by check

0534d02

Doh. Still a markup error

af87825

wareid approved these changes Nov 25, 2021

View reviewed changes

rdeltour reviewed Nov 25, 2021

View reviewed changes

Used the same verion of the algorithm

ea8b94d

minor changes on the explanations

6063289

rdeltour reviewed Nov 29, 2021

View reviewed changes

w3c deleted a comment from lordt4ever Nov 30, 2021

Romain's first batch of comments

f6f6d21

rdeltour reviewed Nov 30, 2021

View reviewed changes

epub33/core/index.html Outdated Show resolved Hide resolved

epub33/core/index.html Outdated Show resolved Hide resolved

epub33/core/index.html Outdated Show resolved Hide resolved

epub33/core/index.html Outdated Show resolved Hide resolved

Second batch of romain's comments

3c95120

rdeltour approved these changes Nov 30, 2021

View reviewed changes

rdeltour mentioned this pull request Dec 1, 2021

Some clarifications needed about #1939 and related ones #1948

Closed

Minor editorial change, per Matt's comment

587f2b9

rdeltour reviewed Dec 2, 2021

View reviewed changes

epub33/core/index.html Outdated Show resolved Hide resolved

iherman and others added 2 commits December 3, 2021 17:33

Implementing the latest round of comments (#1939 (comment))

7ef5dd9

Merge branch 'main' into editorial/out-of-container-url-1912

1d40e48

iherman added 2 commits December 3, 2021 17:39

Markup error sneaked in..

a1b71a7

Merge branch 'editorial/out-of-container-url-1912' of https://github.…

6049f4d

…com/w3c/epub-specs into editorial/out-of-container-url-1912

mattgarrish reviewed Dec 3, 2021

View reviewed changes

epub33/core/index.html Outdated Show resolved Hide resolved

Improved the note

9eaa337

rdeltour reviewed Dec 3, 2021

View reviewed changes

epub33/core/index.html Outdated Show resolved Hide resolved

epub33/core/index.html Outdated Show resolved Hide resolved

Latest comments of Romain

6d003f6

Merge branch 'main' into editorial/out-of-container-url-1912

e90a01f

iherman merged commit 81f25bd into main Dec 7, 2021

mattgarrish deleted the editorial/out-of-container-url-1912 branch February 4, 2022 12:59

rdeltour mentioned this pull request Mar 2, 2022

Clarify that resources must be present for relative paths #2024

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discourage using "out-of-container" relative URLs #1939

Discourage using "out-of-container" relative URLs #1939

iherman commented Nov 25, 2021 •

edited by pr-preview bot

Loading

rdeltour left a comment

iherman commented Nov 26, 2021

iherman commented Nov 26, 2021

rdeltour commented Nov 26, 2021 •

edited

Loading

rdeltour commented Nov 26, 2021

iherman commented Nov 26, 2021

iherman commented Nov 26, 2021

rdeltour commented Nov 26, 2021

rdeltour left a comment

iherman commented Dec 6, 2021

dauwhe commented Dec 6, 2021

rdeltour commented Dec 6, 2021

Discourage using "out-of-container" relative URLs #1939

Discourage using "out-of-container" relative URLs #1939

Conversation

iherman commented Nov 25, 2021 • edited by pr-preview bot Loading

rdeltour left a comment

Choose a reason for hiding this comment

iherman commented Nov 26, 2021

iherman commented Nov 26, 2021

rdeltour commented Nov 26, 2021 • edited Loading

rdeltour commented Nov 26, 2021

iherman commented Nov 26, 2021

iherman commented Nov 26, 2021

rdeltour commented Nov 26, 2021

rdeltour left a comment

Choose a reason for hiding this comment

iherman commented Dec 6, 2021

dauwhe commented Dec 6, 2021

rdeltour commented Dec 6, 2021

iherman commented Nov 25, 2021 •

edited by pr-preview bot

Loading

rdeltour commented Nov 26, 2021 •

edited

Loading