Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url.format() encodes '#' inconsistently #8064

Closed
georgecrawford opened this issue Aug 11, 2016 · 12 comments
Closed

url.format() encodes '#' inconsistently #8064

georgecrawford opened this issue Aug 11, 2016 · 12 comments
Assignees
Labels
url Issues and PRs related to the legacy built-in url module.

Comments

@georgecrawford
Copy link

Version: affects at least Node 4.4.7, 5.7.1, 6.3.0
Platform: Darwin Kernel Version 15.5.0; root:xnu-3248.50.21~8/RELEASE_X86_64 x86_64
OS: Mac OS X 10.11.5

For a reason I can't understand, a # symbol in the search parameter of a URL is formatted inconsistently. The first occurrence (whatever the position) seems to be url-encoded, and subsequent occurrences are not:

$ node
> require('url').format({search:'foo=1#2#3#'})
'?foo=1%232#3#'
> require('url').format({search:'foo=###'})
'?foo=%23##'
> require('url').format({search:'###'})
'?%23##'

This seems to affect Node 4-6 at least, in my testing. It's causing problems in sindresorhus/normalize-url#26.

@imyller
Copy link
Member

imyller commented Aug 11, 2016

Yes, because at https://github.com/nodejs/node/blob/master/lib/url.js#L627

search = search.replace('#', '%23');

and String.prototype.replace() documentation states that:

Only the first occurrence will be replaced.

@imyller
Copy link
Member

imyller commented Aug 11, 2016

If all occurrences of # must be replaced, the line should be:

search = search.replace(/#/g, '%23');

@imyller
Copy link
Member

imyller commented Aug 11, 2016

I'd be happy to submit a PR if someone can confirm that all occurrences of # should be url-encoded in URL search part.

@georgecrawford
Copy link
Author

Looks like this has already been noticed: https://github.com/nodejs/node/pull/2303/files#diff-bbc5176adff1bbc01acd61bfc85d049fR973. I'm not sure what the status of the various URL improvements is. There seem to be some merges, reverts, and open PRs with conflicts.

Any chance of getting a quick fix in for this?

@georgecrawford
Copy link
Author

georgecrawford commented Aug 11, 2016

Right - I also don't know what's desired. I just see that a number of different PRs to refactor this code seem to add the /g fix.

@imyller
Copy link
Member

imyller commented Aug 11, 2016

Yes, hash-URIs are tricky.

Anything After the First # is a considered Fragment Identifier, and it is not immediately clear what should be done with the #'s inside Fragment Identifier.

RFC3986 states that:

A fragment identifier component is indicated by the presence of a
number sign ("#") character and terminated by the end of the URI.

And formatting of fragment is context dependent:

   The fragment's format and resolution is therefore
   dependent on the media type [RFC2046] of a potentially retrieved
   representation, even though such a retrieval is only performed if the
   URI is dereferenced.  If no such representation exists, then the
   semantics of the fragment are considered unknown and are effectively
   unconstrained. 

So, encoding only the first # might or might not be considered a bug.

@georgecrawford
Copy link
Author

But I'm explicitly setting this string as the search property of the URL. So I'd expect this to be treated as a 'query string', and for all hashes (#) to be url-encoded so they're not mistaken for a Fragment identifier.

@MylesBorins
Copy link
Contributor

/cc @jasnell

@MylesBorins MylesBorins added the url Issues and PRs related to the legacy built-in url module. label Aug 11, 2016
@jasnell
Copy link
Member

jasnell commented Aug 11, 2016

Definitely a bug, in my opinion. I'll look at this a bit later today and see about getting a PR opened.

@jasnell jasnell self-assigned this Aug 11, 2016
@jasnell
Copy link
Member

jasnell commented Aug 11, 2016

(unless you want to go ahead with a PR @imyller :-) ..)

@imyller
Copy link
Member

imyller commented Aug 11, 2016

I'll open PR soon.

@addaleax addaleax modified the milestone: 7.0.0 Aug 11, 2016
imyller added a commit to imyller/node that referenced this issue Aug 11, 2016
This commit fixes an error where only the first occurrence of `#` in
`search` parameter is URL encoded, and subsequent occurrences are not.

Also added a test for the case.

Fixes: nodejs#8064

Signed-off-by: Ilkka Myller <[email protected]>
Fishrock123 pushed a commit to Fishrock123/node that referenced this issue Sep 8, 2016
This commit fixes an error where only the first occurrence of `#` in
`search` parameter is URL encoded, and subsequent occurrences are not.

Also added a test for the case.

Fixes: nodejs#8064
PR-URL: nodejs#8072
Reviewed-By: James M Snell <[email protected]>

 Conflicts:
	test/parallel/test-url.js
Fishrock123 pushed a commit that referenced this issue Sep 9, 2016
This commit fixes an error where only the first occurrence of `#` in
`search` parameter is URL encoded, and subsequent occurrences are not.

Also added a test for the case.

Fixes: #8064
PR-URL: #8072
Reviewed-By: James M Snell <[email protected]>

 Conflicts:
	test/parallel/test-url.js
MylesBorins pushed a commit that referenced this issue Sep 30, 2016
This commit fixes an error where only the first occurrence of `#` in
`search` parameter is URL encoded, and subsequent occurrences are not.

Also added a test for the case.

Fixes: #8064
PR-URL: #8072
Reviewed-By: James M Snell <[email protected]>
@MylesBorins
Copy link
Contributor

the fix for this has been backported to v4.x-staging in 2069ba9

MylesBorins pushed a commit that referenced this issue Sep 30, 2016
This commit fixes an error where only the first occurrence of `#` in
`search` parameter is URL encoded, and subsequent occurrences are not.

Also added a test for the case.

Fixes: #8064
PR-URL: #8072
Reviewed-By: James M Snell <[email protected]>
MylesBorins pushed a commit that referenced this issue Oct 10, 2016
This commit fixes an error where only the first occurrence of `#` in
`search` parameter is URL encoded, and subsequent occurrences are not.

Also added a test for the case.

Fixes: #8064
PR-URL: #8072
Reviewed-By: James M Snell <[email protected]>
rvagg pushed a commit that referenced this issue Oct 18, 2016
This commit fixes an error where only the first occurrence of `#` in
`search` parameter is URL encoded, and subsequent occurrences are not.

Also added a test for the case.

Fixes: #8064
PR-URL: #8072
Reviewed-By: James M Snell <[email protected]>
MylesBorins pushed a commit that referenced this issue Oct 26, 2016
This commit fixes an error where only the first occurrence of `#` in
`search` parameter is URL encoded, and subsequent occurrences are not.

Also added a test for the case.

Fixes: #8064
PR-URL: #8072
Reviewed-By: James M Snell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
url Issues and PRs related to the legacy built-in url module.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants