Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not decode plus signs #7

Closed
pschultz opened this issue Jan 23, 2017 · 16 comments
Closed

Does not decode plus signs #7

pschultz opened this issue Jan 23, 2017 · 16 comments

Comments

@pschultz
Copy link

This test fails:

describe('#parse', function () {
  it('will decode plus signs', function () {
    var obj = qs.parse('foo+bar=baz+qux');

    assume(obj).is.a('object');
    assume(obj["foo bar"]).equals('baz qux');
  });
});
@lpinca
Copy link
Member

lpinca commented Jan 23, 2017

It can probably be fixed by replacing '+' with ' ' here

result[decodeURIComponent(part[1])] = decodeURIComponent(part[2])
.

@3rd-Eden
Copy link
Member

Technically, our implmentation is correct because the name of the parameter is indeed foo+bar if it was encoded by the browsers it would have been %20 instead of + so that has been added manually or is done by a transformation process.

@3rd-Eden
Copy link
Member

Or am I missing something obvious here.

@lpinca
Copy link
Member

lpinca commented Jan 23, 2017

@3rd-Eden

$ node
> querystring.parse('foo+bar=baz+qux')
{ 'foo bar': 'baz qux' }

patch ready:

diff --git a/index.js b/index.js
index e1a2684..0e9df10 100644
--- a/index.js
+++ b/index.js
@@ -3,6 +3,17 @@
 var has = Object.prototype.hasOwnProperty;
 
 /**
+ * Decode a URI encoded string.
+ *
+ * @param {String} input The URI encoded string.
+ * @returns {String} The decoded string.
+ * @api private
+ */
+function decode(input) {
+  return decodeURIComponent(input).replace(/\+/g, ' ');
+}
+
+/**
  * Simple query string parser.
  *
  * @param {String} query The query string that needs to be parsed.
@@ -21,7 +32,7 @@ function querystring(query) {
   //
   for (;
     part = parser.exec(query);
-    result[decodeURIComponent(part[1])] = decodeURIComponent(part[2])
+    result[decode(part[1])] = decode(part[2])
   );
 
   return result;
diff --git a/test.js b/test.js
index 06d8cb2..754c9a5 100644
--- a/test.js
+++ b/test.js
@@ -33,7 +33,7 @@ describe('querystringify', function () {
 
     it('works with object keys with empty string values', function () {
       assume(qs.stringify({ foo: '' })).equals('foo=');
-    })
+    });
 
     it('works with nulled objects', function () {
       var obj = Object.create(null);
@@ -70,6 +70,13 @@ describe('querystringify', function () {
       assume(obj.foo).equals('');
       assume(obj.bar).equals('');
       assume(obj.shizzle).equals('mynizzle');
-    })
+    });
+
+    it('decodes plus signs', function () {
+      var obj = qs.parse('foo+bar=baz+qux');
+
+      assume(obj).is.a('object');
+      assume(obj['foo bar']).equals('baz qux');
+    });
   });
 });

lpinca added a commit that referenced this issue Jan 23, 2017
lpinca added a commit that referenced this issue Jan 23, 2017
@m59peacemaker
Copy link

I see no reason to turn + into a space.

@lpinca
Copy link
Member

lpinca commented Mar 19, 2017

If we assume that our input is correctly encoded then yes it doesn't make sense.
This is a super simple parser so it's acceptable imo but any popular parser converts plus signs into spaces.

$ node
> querystring.parse('foo+bar=baz+qux')
{ 'foo bar': 'baz qux' }
> new url.URLSearchParams('foo+bar=baz+qux')
URLSearchParams { 'foo bar' => 'baz qux' }

@m59peacemaker
Copy link

m59peacemaker commented Mar 19, 2017

I'm interested in fleshing this out all the way. You are correct that most parsers convert + to a space. That is based on RFC 1738 from 1994. Under 1738, spaces are encoded as +, so + is parsed back as a space. In 2005, RFC 3986 came along, under which spaces are encoded as %20 and + is encoded as %2B. That being, there can't be a + in the URL if it has been encoded according to the current standard, 3986.

I can only see converting + to space if the uri has not been encoded according to 3986. If that is the case, then the spec has been ignored already, so why worry about standards? The only answer I can think of is to support URIs that predate the standard, which would date back over a decade. On the other hand, you can put a not-encoded + in the uri anyway and then there is ambiguity as to what it means. Either, it should mean a space, or it's up to the sender and receiver what it means.

Pros of + as a space:

  • backwards compatible with 1738 (in some sense at least)
  • entirely negligible to implement
  • is the only meaning that can be derived at all from any spec, ever (kinda negates ambiguity)
  • everyone else is doing it

Cons:

  • imposes an opinion that is not actually standardized (more ambiguous, but less restrictive)
  • the behavior could be surprising. If I put a + in a string and then parsed the string according to RFC 3986. I'd expect a + back.
  • documenting the behavior is odd and probably confusing, as evidenced by there not being a clear cut answer as to why or whether to do it and our resulting discussion

At this point, I can't form an opinion in favor either way.

@pschultz
Copy link
Author

pschultz commented Mar 20, 2017

You are assuming that there are no more (modern) URL producers that encode with plus signs. At least Golang does: https://play.golang.org/p/g_TP0hegvE

Python 3 does too:

>>> import urllib.parse
>>> urllib.parse.urlencode({'foo bar':'baz qux'})                                                                                                                                                                                                                                 
'foo+bar=baz+qux'

As does PHP 7:

$ php -r 'echo http_build_query(['foo bar' => 'baz qux']), "\n";'
foo+bar=baz+qux

So there are many opportunities for querystringify to be confronted with such URLs.

@m59peacemaker
Copy link

Wow. You're correct. I think my assumption was fair - I'm surprised that this is being practiced despite being contrary to the spec for the last 12 years! Why are they encoding spaces as + rather than %20 like the spec says? It's hard to believe that was just an arbitrary decision. There must be something I don't know. Anyway, that settles it. If things are still often encoded that way, it should be supported.

@lpinca
Copy link
Member

lpinca commented Mar 20, 2017

I'll go ahead and merge #8.
@3rd-Eden can then release a new version when he has some spare time.

@lpinca lpinca closed this as completed in #8 Mar 20, 2017
lpinca added a commit that referenced this issue Mar 20, 2017
@3rd-Eden
Copy link
Member

Published as major 1.0.0

@lpinca
Copy link
Member

lpinca commented Mar 20, 2017

@3rd-Eden I can't see 1.0.0 on npm. Forgot to publish?

@3rd-Eden
Copy link
Member

Nah, publish failed because it went in to my private npm installation instead of public due to npmconfig missmatch.

@3rd-Eden
Copy link
Member

It's there now btw.

@pschultz
Copy link
Author

Thanks all. @m59peacemaker, as far as I can see RFC 3986 does not disallow or even obsolete encoding space with plus. Presumably it is preferred over %20 for legibility. That is in fact why I noticed that querystringify didn't support it; status=open+assigned just looks nicer.

@m59peacemaker
Copy link

I t does look nice. If you care to discuss a little further - I'm not sure you see the way I'm thinking about it (and it appears @3rd-Eden had the same thought). The best that I can understand it, 3986 says that the URI should be percent-encoded except the characters which have particular meaning to the scheme. There are characters which a scheme is allowed to use as delimiters to express whatever kind of thing it needs to. + is one of those allowed characters, but it is to be percent-encoded by default unless the scheme considers it a delimiter.

I discovered that application/x-www-form-urlencoded is the only scheme that uses + in its spec and in that, + represents a space.

Things get frustrating in my mind from there - what if there were new kinds of schemes and one said that + should mean something else? Do all the query string utilities have to add more options to that + can be a space or something else? Complex stuff! As it is, I still conclude that + has no meaning to every other scheme and technically shouldn't change it, but then you have to specifically support the one scheme that wants it to be a space. I'm mostly just curious about all this. There's not a good discussion anywhere about it that I've found. Don't spend precious time thinking about it if it's a bother!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants