Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discourage using "out-of-container" relative URLs #1939

Merged
merged 15 commits into from
Dec 7, 2021

Conversation

iherman
Copy link
Member

@iherman iherman commented Nov 25, 2021

This is the translation into a PR of the equilibrium point in the discussion in #1912.

Fixes #1912


Preview | Diff

Copy link
Member

@rdeltour rdeltour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I misunderstood our equilibrium point 😅

I believe the algorithm should be normative, and the explanation ("more double-dot segments than needed") informative in a note.

I think we can tweak the algorithm so that it becomes 100% correct, even if not 100% explicit.

@iherman
Copy link
Member Author

iherman commented Nov 26, 2021

Hm. My initial thoughts were that if we include the base element into the determination of the current context then we may get to the right algorithm but then I got the impression that there are some other details that evade me. I am happy to be stand corrected if we can turn the algo 100% with that.

However, we indeed seem to have misunderstood one another, though, I believe, what is there mostly reflect the views of @mattgarrish and @bduga...

@iherman
Copy link
Member Author

iherman commented Nov 26, 2021

Note also that if we take <base> into account, then, if we want to be 100% precise, the concept of out-of-container URL strings becomes iffy. Indeed, a the url string, by itself, may not be out-of-container, it is the combination of url and <base> that becomes out-of-container. No doubt this can also be spec-d, but it would really mean a complex algorithm in the spec, and I am not sure it is worth it...

@rdeltour
Copy link
Member

rdeltour commented Nov 26, 2021

@iherman I was thinking something along these lines could work:


For any URL string url found in the OCF Abstract Container, the following steps should return true:

  1. Set the container root URL to https://a.example.org/A/.
  2. Let base be the base URL that must be used to parse url, if any, as defined by the context (document or environment) where url is used, according to the container root URL.
  3. Let testURLRecord be the result of applying the URL parser to url, with base.
  4. Let testURLStringA be the result of applying the URL Serializer to testURLRecord
  5. Set the container root URL to https://b.example.org/B/.
  6. Set base to the base URL that must be used to parse url, if any, as defined by the context (document or environment) where url is used, according to the container root URL.
  7. Set testURLRecord to the result of applying the URL parser to url, with base.
  8. Let testURLStringB be the result of applying the URL Serializer to testURLRecord
  9. If testURLStringA does not start with https://a.example.org/ or testURLStringB does not start with https://b.example.org/, return true.
  10. If testURLStringA starts with https://a.example.org/A/ and testURLStringB starts with https://b.example.org/B/, return true.
  11. Return false.

To summarize, the idea is to test any URL string (relative or not, it doesn't matter), with two test container root URL, parse it with their host-defined base, and inspect the two resulting strings:

  • If any of the result does not share the test URL host, it means the test URL was absolute, or the base was absolute and outside the container (e.g. base element with href set to https://w3.org).
  • Otherwise, both results should start with the respective test root URL (the presence of the first test path segment —A or B—, indicates that the URL doesn't leak outside the container).

I believe this algorithm fully achieves what we want to test.
The only part subject to some level of interpretation is "the base URL that must be used to parse url, if any, as defined by the context (document or environment) where url is used". The base URL concept being dependent on the host language, I don't see how to phrase it otherwise. An companion informative note could explain that bit.

What do you think?

@rdeltour
Copy link
Member

Also, note that contrary to what I initially suggested in #1912, the above does not touch the URL standard definitions (of what is a relative-URL string, specifically).

It just adds a SHOULD criteria to all URL strings, in addition to the individual syntactic criteria (MUST) that we already define elsewhere.

@iherman
Copy link
Member Author

iherman commented Nov 26, 2021

@rdeltour yes, that works I believe.

That will affect the preceding paragraph that should, probably, disappear or be merged with another note that gives some background of what is to be achieved here. I can figure out something and we can take it from there.

it is better than what is currently in the PR; it does look a bit funny to have an algorithm in a note like that.

@iherman
Copy link
Member Author

iherman commented Nov 26, 2021

Ok, I have made the changes.

I know that, from a content point of view, the constraints on the container root URL are not strictly necessary, and we could rely on the Reading System spec only. However, I believe it is better to keep them here; it gives a complete picture for the author and may be an important aspect for someone wanting to add some more complex scripts to the publication. So I kept this for now.

I also used a trick that @mattgarrish and I used in the Publication Manifest spec, namely to use the <details> element to add explanation to some of the not-so-obvious algorithmic steps. I think this is better than adding a separate, big note that may disturb the flow of the spec. I have added only a smaller note following the algorithmic part.

I have, finally, moved the example and also added the example in the original problem setting of @rdeltour to make it clearer what happens with out-of-container cases.

@rdeltour
Copy link
Member

Sounds good! I’ll review as soon as possible. 👍

Copy link
Member

@rdeltour rdeltour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 👍
I prefer this version to the previous in-note algorithm 😊. Also, the inline explanations are helpful.

See comment details for some editorial suggestions.

epub33/core/index.html Outdated Show resolved Hide resolved
epub33/core/index.html Outdated Show resolved Hide resolved
epub33/core/index.html Outdated Show resolved Hide resolved
epub33/core/index.html Show resolved Hide resolved
epub33/core/index.html Outdated Show resolved Hide resolved
epub33/core/index.html Outdated Show resolved Hide resolved
epub33/core/index.html Outdated Show resolved Hide resolved
epub33/core/index.html Show resolved Hide resolved
epub33/core/index.html Outdated Show resolved Hide resolved
epub33/core/index.html Outdated Show resolved Hide resolved
@w3c w3c deleted a comment from lordt4ever Nov 30, 2021
epub33/core/index.html Outdated Show resolved Hide resolved
epub33/core/index.html Outdated Show resolved Hide resolved
epub33/core/index.html Outdated Show resolved Hide resolved
epub33/core/index.html Outdated Show resolved Hide resolved
epub33/core/index.html Outdated Show resolved Hide resolved
epub33/core/index.html Outdated Show resolved Hide resolved
epub33/core/index.html Outdated Show resolved Hide resolved
epub33/core/index.html Outdated Show resolved Hide resolved
@iherman
Copy link
Member Author

iherman commented Dec 6, 2021

@mattgarrish @dauwhe I plan to merge this PR this (Monday) evening or tomorrow morning; I have the impression that we have now converged to an acceptable consensus (we can always come back to some details in separate issues, if we want). Any objections?

@dauwhe
Copy link
Contributor

dauwhe commented Dec 6, 2021

Trying to understand the algorithm, using Example 53.

  1. we set container root url = https://a.example.org/A/

  2. The path to the XHTML content file is EPUB/content.xhtml

  3. If we use the URL Parser to parse the path EPUB/content.xhtml with the base URL https://a.example.org/A/ we get https://a.example.org/A/EPUB/content.xhtml

  4. If we then look at the suspect URL In the content doc ../../../../EPUB/secret.xhtml, with the content doc URL as the base (https://a.example.org/A/EPUB/content.xhtml) we get https://a.example.org/EPUB/secret.xhtml

  5. We see that our result https://a.example.org/EPUB/secret.xhtml DOES NOT contain the /A/ segment, which means that there is potentially a leak outside the container.

Is this correct?

@rdeltour
Copy link
Member

rdeltour commented Dec 6, 2021

Is this correct?

yes, exactly 👍

@iherman iherman merged commit 81f25bd into main Dec 7, 2021
@mattgarrish mattgarrish deleted the editorial/out-of-container-url-1912 branch February 4, 2022 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

should we disallow "out-of-container" relative URLs?
5 participants