-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do away with using URLComponents for urlURL #199
Conversation
Tested this with: |
We're basically reverting 46fbee0 and a few other commits that fixed regressions of this. Also, the ~2% perf gain by using |
@@ -329,7 +329,11 @@ public class ClientRequest { | |||
if thePath.first != "/" { | |||
thePath = "/" + thePath | |||
} | |||
path = thePath.addingPercentEncoding(withAllowedCharacters: NSCharacterSet.urlQueryAllowed) ?? thePath | |||
if thePath.contains("%") == false { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%
is not a permissible character in any of the URL components. It can be safely assumed that if a URL contains %
it is already percent-encoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's really quite dangerous though, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weissi ClientRequest
accepts the path as a String, which may or may not be percent encoded. If we don't make this assumption, it is difficult to handle both the cases. Do you see any other unhandled cases here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pushkarnk what if the path contains a %
sign? Then that also needs to be percent encoded to %25
. For example:
$ curl http://httpbin.org/anything/%%
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx</center>
</body>
</html>
vs.
$ curl http://httpbin.org/anything/%25%25
{
"args": {},
"data": "",
"files": {},
"form": {},
"headers": {
"Accept": "*/*",
"Host": "httpbin.org",
"User-Agent": "curl/7.54.0"
},
"json": null,
"method": "GET",
"origin": "86.190.4.221, 86.190.4.221",
"url": "https://httpbin.org/anything/%25%25"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Johannes makes a good point, it's not safe to assume the presence of a % character means the string is escaped. It seems to me that there needs to be a contract here: either the user must percent-escape parameters before passing them, or they must not do so. At the moment we don't indicate which we are expecting.
Even parsing a string to detect whether there are any invalid escape sequences isn't safe - what if my path was to a resource named %25
? I'd want the result to be %2525
, but there's no way for us to mechanically tell whether the resource is really %
or %25
without the user indicating whether the string should be escaped or has been already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to detect whether a string is already percent encoded, wouldn't it be possible to percent decode it and compare to the original string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's not actually possible to decide if a String is already percent escaped. For example
https://httpbin.org/anything/%25%25
could be percent escaped or not. Maybe the user meant https://httpbin.org/anything/%%
(in which case https://httpbin.org/anything/%25%25
was percent escaped) or maybe the user literally meant https://httpbin.org/anything/%25%25
in which case https://httpbin.org/anything/%25%25
was not percent escaped and the percent escaped form would be https://httpbin.org/anything/%2525%2525
.
So in the end we can only ever do a heuristic or (much better) let the user tell us if they have or have not previously percent escaped the string, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. But I wonder how libcurl decides this. I mentioned an example in the previous comment, libcurl seems to percent encode https://httpbin.org/anything/%
but it doesn't percent-encode https://httpbin.org/anything/%%
. And Kitura-net's ClientRequest
delegates this decision (%-encode or not) to libcurl.
So, the ClientRequest
API hasn't been designed with this percent-encoding consideration. Now, if we decide to add an extra Bool parameter to the initialisers to indicate if a passed URL string is percent-encoded, a couple of issues crop up:
-
What's the default parameter value - true or false? Some existing users might have assumed ClientRequest does the %-encoding, others might have assumed it doesn't. Their assumptions may be based on how libcurl and hence ClientRequest behaved. Their code might break if we assume any one default parameter value.
-
If we don't have a default value for the new parameter, we're anyway breaking existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that behaviour of libcurl surprising, and on that basis I don't think we should attempt to emulate it. We do need predictable behaviour, that is appropriate by default for the majority, with the ability to cater for minority cases.
Rather than a boolean true/false, what about an enum of always
and automatic
, where the default is automatic
:
always
indicates that the values need to be percent-escaped, so a literal%
should translate into%25
,automatic
means that percent-escaping should be applied to the values if they do not already contain a%
character.
To my mind, leaving the default on automatic
ought to cover the vast majority of uses, and was the original line of thinking for this PR. Users who know that the data they are passing in is definitely not escaped, and which may contain a literal %
, can select always
to handle it.
I was initially going to suggest a never
case, but that's equivalent to automatic
when the content has already been escaped. There might be a slight performance benefit from not having to search the value for the %
symbol which might make it worth including, in cases where you know that escaping is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On closer inspection, I misunderstood your current solution. Checking for %
and that the URL is valid is sufficient to cover almost every case: A URL that needs escaping (even if it happens to contain a %
) will be escaped, and an already valid one will be left as-is.
The only case that it doesn't allow for, then, is when the URL being provided is valid, but should still be escaped (as in the case where %25
is intended literally, not as an escape sequence). This would require an additional option that the user can set.
But this is already an improvement on Kitura-net, I think the chances of encountering such a case are low in practice, and such a parameter could be added in the future if required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
`HTTPServerRequest.urlURL` is currently built using `URLComponents`. Though convenient, this approach has proven to be convoluted because of percent encoding restrictions. The alternative is to fall back to building a URL String , something that we did before adopting URLComponents. We can chose to keep the percent encoding in HTTPServerRequest. An additional change is in the initializer of ClientRequest. Though we need to add percent encoding wherever necessary, we should not percent encode an percent-encoded path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with this fix for now, and we can easily revisit if there's a real demand for user control of percent-escaping.
HTTPServerRequest.urlURL
is currently built usingURLComponents
. Thoughconvenient, this approach has proven to be convoluted because of percent
encoding restrictions. The alternative is to fall back to building a URL String
, something that we did before adopting URLComponents. We can chose to keep
the percent encoding in HTTPServerRequest.
An additional change is in the initializer of ClientRequest. Though we need
to add percent encoding wherever necessary, we should not percent encode
an percent-encoded path.