Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI normalization #1459

Closed
lexaknyazev opened this issue Oct 2, 2018 · 10 comments
Closed

URI normalization #1459

lexaknyazev opened this issue Oct 2, 2018 · 10 comments
Assignees

Comments

@lexaknyazev
Copy link
Member

From the other discussion:

Speaking of URIs, should we also disallow referencing parent files (e.g. using ../../..)? This was brought to my attention during a security pass.

We should definitely add a note about security concerns (e.g., loading an asset with parent paths from untrusted source). I'm not sure about disallowing such URIs on spec level.

Also related: should we encourage normalization on export (e.g., dir1/./dir2/../tex.png to dir1/tex.png)?

In other words, canonicalized paths. I say yes, especially if caching is happening using this URI.

Possible action items:

  • Require / endorse canonicalization
  • Discourage / disallow parent dir references

/cc @donmccurdy @bghgary @emackey

@lexaknyazev lexaknyazev added specification needs discussion Issue or PR requires working group discussion to resolve. labels Oct 2, 2018
@lexaknyazev lexaknyazev self-assigned this Oct 3, 2018
@donmccurdy
Copy link
Contributor

Does the term "canonicalization" have a formal definition? Or simply disallowing . and ..? Trying to do a web search and getting the unrelated SEO definition of canonical UR(L)s...

@zellski
Copy link
Contributor

zellski commented Oct 9, 2018

The term should be pretty unambiguous, right? The idea of a canonical form of some expression is pretty well understood. It IS frustrating that someone who's just innocently trying to read the spec, and plugs that phrase into a search bar, will see those SEO-related hits on top... which yeah are mostly a red herring in this context.

@donmccurdy
Copy link
Contributor

donmccurdy commented Oct 9, 2018

Yeah, I think the concept of a canonical URI is unambiguous, but from an implementation perspective I'm hoping we can either point to a definition or enumerate the possible non-canonical edge cases and how to solve them.

  1. unwrap ..
  2. remove .
  3. de-duplicate // path separators
  4. alphabetize query and fragment parameters
  5. ...?

the node.js path.normalize() function will do 1–3 but not 4, for example.

@lexaknyazev
Copy link
Member Author

The point 4 looks unnecessary to me.

@donmccurdy
Copy link
Contributor

donmccurdy commented Oct 9, 2018

I don't especially think we should require (4) in the spec, but it's certainly necessary if you want caching to work.

Looks like I got (3) wrong. // is a distinct path from / in the URI spec, but functions like Node.js' path.normalize() will de-duplicate it all the same. https://stackoverflow.com/a/38623852/1314762

@andreasplesch
Copy link
Contributor

andreasplesch commented Oct 11, 2018

The term normalization for comparison purposes is explained in this 2005 RFC:

https://tools.ietf.org/html/rfc3986#page-38

Indeed, MS references this RFC in

https://msdn.microsoft.com/en-us/library/dd541267.aspx

in addition to RFC2616 section 3.2.3 ( http://www.rfc-editor.org/rfc/rfc2616.txt ) which only talks about character to character comparisons.

@pjcozzi
Copy link
Member

pjcozzi commented Oct 31, 2018

@lexaknyazev do you still plan to add the implementation note linking to IETF?

@lexaknyazev
Copy link
Member Author

Yes, will open a PR soon.

@pjcozzi
Copy link
Member

pjcozzi commented Nov 6, 2018

@lexaknyazev any update on the PR? 😄

@lexaknyazev lexaknyazev removed the needs discussion Issue or PR requires working group discussion to resolve. label Nov 7, 2018
@pjcozzi
Copy link
Member

pjcozzi commented Nov 14, 2018

Yes, will open a PR soon.

@lexaknyazev any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants